2-Level Page Tables (2-LPT): A Building Block for Efficient Address Translation in Virtualized Environments

Baviskar, Akshay Suresh

View/Open

Thesis full text (1.791Mb)

Author

Baviskar, Akshay Suresh

Metadata

Show full item record

Abstract

Efficient address translation mechanisms are gaining more and more attention as the virtual address range of the processors keeps expanding and the demand for machine virtualization increases with cloud and data center-based services. Traditional radix-tree based address translations can incur significant overheads in big data applications, particularly under virtualization, due to multi-level tree walks and nested translation. The overheads stem primarily from the unnecessary generality — ability to support several hundreds of thousands of virtual memory regions in the virtual address space — supported by current processors. We observe that in the common case, however, a process’s virtual address space contains only a few contiguously allocated sections, which can be efficiently translated using a shallow tree with two levels. We propose such a compact structure, called 2-Level Page Table(2-LPT), which serves as a key building block for address translation in virtualized environment. A key advantage of 2-LPT is that it maintains two levels of page tables irrespective of the size of the virtual address space. Translating a virtual address using 2-LPT is fast. A walk on a 2-LPT requires up to two memory accesses. In practice, however, the root level table is well cached in the Page Walk Caches, thus, single memory access is sufficient. Under native execution, 2-LPT reduces the page walk latency by up to 20.9% (9.38% on average) and improves performance by up to 10.1% (1.66% on average) over the conventional four-level radix tree page tables, on a set of memory-intensive applications. 2-LPT is more beneficial under virtualization. A naive extension of 2-LPT reduces the cost of nested page walk from 24 to 8 memory accesses. To achieve further reduction, we propose two optimizations: (i) Enhanced Partial Shadow Paging (ePSP) which employs a limited form of shadow paging for the root-level of 2-LPT, and (ii) Host PTE Mirroring (HPM) which allows accessing the host page table entry without performing host page table walk. These optimizations reduce the number of memory accesses to just one, on average, while avoiding slow VM exits. 2-LPT speeds up applications by 5.6%-50.9% (24.6%, on average) over the baseline nested page walks. Compared to the best performing state-of-the-art proposal, 2-LPT achieves an average reduction of 41.5% and 15.7% in page walk latency and execution cycles

URI

https://etd.iisc.ac.in/handle/2005/5684

Collections

Computer Science and Automation (CSA) [396]