Integrating Read-Copy-Update Synchronization and Memory Allocation

Prasad, Aravinda

View/Open

Thesis full text (4.831Mb)

Author

Prasad, Aravinda

Metadata

Show full item record

Abstract

The evolution of multicore systems with thousands of cores has led to the exploration of non-traditional procrastination-based synchronization techniques such as Read-Copy- Update (RCU). Deferred destruction is the fundamental technique used in such tech- niques where writers in order to synchronize with the readers defer the freeing of the objects until the completion of all pre-existing readers. This writer-wait time period is referred to as a grace period (GP). The readers, as a consequence, need not explicitly synchronize with the writers resulting in low overhead, wait free read-side synchroniza- tion primitives. We observe that the deferred destruction of objects leads to newer and complex forms of interactions between the synchronization technique and the memory allocator. We study and analyze the impact of such interactions in the operating system kernels for enterprise workloads, high-performance computing environments, idle systems and virtu- alized environments. We explore different solutions to efficiently handle deferred destruc- tions where our general solution integrates synchronization technique with the memory allocator. Our general solution further exploits interaction between the synchronization technique and memory allocator to optimize both of them. In the first part we analyze the implication of deferred destruction in enterprise envi- ronments. We observe that RCU determines when the deferred object is safe to reclaim and when it is actually reclaimed. As a result, the memory reclamation of the deferred objects are completely oblivious of the memory allocator state leading to poor memory allocator performance. Furthermore, we observe that the deferred objects provide hints about the future that inform memory regions that are about to be freed. Although useful, hints are not exploited as the deferred objects are not \visible" to memory allo- cators. We design Prudence, a new dynamic memory allocator, that is tightly integrated with RCU to ensure visibility of deferred objects to the memory allocator. Prudence exploits optimizations based on the hints about the future during important state tran- sitions. Our evaluation in the Linux kernel shows that Prudence performs 3.9 to 28 better in micro-benchmarks compared to SLUB allocator. It also improves the overall performance perceptibly (4%-18%) for a mix of widely used synthetic and application benchmarks. In the second part we analyze the implication of deferred destruction in idle and High- performance computing (HPC) environments where the amount of memory waiting for reclamation in a grace period is negligible due to limited OS kernel activity. The default grace period computation is not only futile but also detrimental as the CPU cycles consumed to compute a grace period leads to jitter in HPC and frequent CPU wake-ups in idle environments. We design a frugal approach to reduce RCU grace period overhead that reduces the number of grace periods by 68% to 99% and the CPU time consumed by grace periods by 39% to 99% for NAS parallel benchmarks and idle systems. Finally, we analyze the implication of deferred destruction in a virtualized environ- ment. Preemption of RCU-readers can cause multi-second latency spikes and can in- crease peak memory footprint inside VMs which in turn can negate the server con- solidation bene fits of virtualization. Although preemption of lock holders in VMs has been well-studied, the corresponding solutions do not apply to RCU due to its exceed- ingly lightweight read-side primitives. We present the first evaluation of RCU-reader preemption in a virtualized environment. Our evaluation shows 50% increase in the peak memory footprint and 155% increase in fragmentation for a microbenchmark. We propose Conflux, a confluence of three mutually independent solutions that enhances the RCU synchronization technique, memory allocator and the hypervisor to efficiently handle the RCU-reader preemption problem.

URI

https://etd.iisc.ac.in/handle/2005/5359

Collections

Computer Science and Automation (CSA) [395]