Integrating Read-Copy-Update Synchronization and Memory Allocation
Abstract
The evolution of multicore systems with thousands of cores has led to the exploration
of non-traditional procrastination-based synchronization techniques such as Read-Copy-
Update (RCU). Deferred destruction is the fundamental technique used in such tech-
niques where writers in order to synchronize with the readers defer the freeing of the
objects until the completion of all pre-existing readers. This writer-wait time period is
referred to as a grace period (GP). The readers, as a consequence, need not explicitly
synchronize with the writers resulting in low overhead, wait free read-side synchroniza-
tion primitives.
We observe that the deferred destruction of objects leads to newer and complex forms
of interactions between the synchronization technique and the memory allocator. We
study and analyze the impact of such interactions in the operating system kernels for
enterprise workloads, high-performance computing environments, idle systems and virtu-
alized environments. We explore different solutions to efficiently handle deferred destruc-
tions where our general solution integrates synchronization technique with the memory
allocator. Our general solution further exploits interaction between the synchronization
technique and memory allocator to optimize both of them.
In the first part we analyze the implication of deferred destruction in enterprise envi-
ronments. We observe that RCU determines when the deferred object is safe to reclaim
and when it is actually reclaimed. As a result, the memory reclamation of the deferred
objects are completely oblivious of the memory allocator state leading to poor memory
allocator performance. Furthermore, we observe that the deferred objects provide hints
about the future that inform memory regions that are about to be freed. Although
useful, hints are not exploited as the deferred objects are not \visible" to memory allo-
cators. We design Prudence, a new dynamic memory allocator, that is tightly integrated
with RCU to ensure visibility of deferred objects to the memory allocator. Prudence exploits optimizations based on the hints about the future during important state tran-
sitions. Our evaluation in the Linux kernel shows that Prudence performs 3.9 to 28
better in micro-benchmarks compared to SLUB allocator. It also improves the overall
performance perceptibly (4%-18%) for a mix of widely used synthetic and application
benchmarks.
In the second part we analyze the implication of deferred destruction in idle and High-
performance computing (HPC) environments where the amount of memory waiting for
reclamation in a grace period is negligible due to limited OS kernel activity. The default
grace period computation is not only futile but also detrimental as the CPU cycles
consumed to compute a grace period leads to jitter in HPC and frequent CPU wake-ups
in idle environments. We design a frugal approach to reduce RCU grace period overhead
that reduces the number of grace periods by 68% to 99% and the CPU time consumed
by grace periods by 39% to 99% for NAS parallel benchmarks and idle systems.
Finally, we analyze the implication of deferred destruction in a virtualized environ-
ment. Preemption of RCU-readers can cause multi-second latency spikes and can in-
crease peak memory footprint inside VMs which in turn can negate the server con-
solidation bene fits of virtualization. Although preemption of lock holders in VMs has
been well-studied, the corresponding solutions do not apply to RCU due to its exceed-
ingly lightweight read-side primitives. We present the first evaluation of RCU-reader
preemption in a virtualized environment. Our evaluation shows 50% increase in the
peak memory footprint and 155% increase in fragmentation for a microbenchmark. We
propose Conflux, a confluence of three mutually independent solutions that enhances
the RCU synchronization technique, memory allocator and the hypervisor to efficiently
handle the RCU-reader preemption problem.