dc.description.abstract | Efficient address translation mechanisms are gaining more and more attention as the virtual
address range of the processors keeps expanding and the demand for machine virtualization
increases with cloud and data center-based services. Traditional radix-tree based address translations
can incur significant overheads in big data applications, particularly under virtualization,
due to multi-level tree walks and nested translation. The overheads stem primarily from the
unnecessary generality — ability to support several hundreds of thousands of virtual memory
regions in the virtual address space — supported by current processors.
We observe that in the common case, however, a process’s virtual address space contains
only a few contiguously allocated sections, which can be efficiently translated using a shallow
tree with two levels. We propose such a compact structure, called 2-Level Page Table(2-LPT),
which serves as a key building block for address translation in virtualized environment. A key
advantage of 2-LPT is that it maintains two levels of page tables irrespective of the size of the
virtual address space. Translating a virtual address using 2-LPT is fast. A walk on a 2-LPT
requires up to two memory accesses. In practice, however, the root level table is well cached in
the Page Walk Caches, thus, single memory access is sufficient. Under native execution, 2-LPT
reduces the page walk latency by up to 20.9% (9.38% on average) and improves performance by
up to 10.1% (1.66% on average) over the conventional four-level radix tree page tables, on a set
of memory-intensive applications.
2-LPT is more beneficial under virtualization. A naive extension of 2-LPT reduces the cost
of nested page walk from 24 to 8 memory accesses. To achieve further reduction, we propose
two optimizations: (i) Enhanced Partial Shadow Paging (ePSP) which employs a limited form
of shadow paging for the root-level of 2-LPT, and (ii) Host PTE Mirroring (HPM) which
allows accessing the host page table entry without performing host page table walk. These
optimizations reduce the number of memory accesses to just one, on average, while avoiding slow
VM exits. 2-LPT speeds up applications by 5.6%-50.9% (24.6%, on average) over the baseline
nested page walks. Compared to the best performing state-of-the-art proposal, 2-LPT achieves
an average reduction of 41.5% and 15.7% in page walk latency and execution cycles | en_US |