Migrating VM Workloads to Containers: Issues and Challenges
Abstract
Modern day enterprises are adopting virtualization to leverage the benefits of improved server utilization through workload consolidation. Server consolidation provides this benefit to enterprise applications as many of these do not exercise all the system resources to their full capacities, all the time. But co-hosting multiple applications together leads to several challenges including regulating resource sharing, enforcing isolation, minimizing interference etc. To address these challenges, several solutions emerged namely hypervisor based system virtual machines (VMs) and process virtualization mechanisms like containers. Both of these system virtualization techniques have their advantages and disadvantages. Hypervisors abstracts the ISA (Instruction Set Architecture) layer and allow multiple guest operating systems to run simultaneously in isolated environments called virtual machines. On the other hand process virtual machines like container technology abstract operating system layer and use the operating system kernel features such as namespaces, cgroups and apparmor to control resource sharing and provide isolation. Containers offer better workload consolidation than VMs as they have lower memory footprint and faster provisioning time. This enables data centers to handle more workload with existing hardware for applications that can be consolidated using containers.
This work explores containers as constructs for workload consolidation and explores issues and challenges in this space. It also exposes concerns while moving workloads from VMs to containers. Containerisation of workloads throws up several challenges that need to be addressed while moving from VMs. Containers share the host OS kernel and hence only workloads with same OS dependency can be co-hosted together. Further, sharing of OS resources such as kernel space data structures, process ids, file descriptors and network stack by co-hosted containers often results in interference and performance hit for applications. In the first part we use OS level micro-benchmarks to identify the cause and symptoms of the bottlenecks and interference visible on the applications co-hosted inside containers. We also identify the key metrics that can be used to measure such concerns with a view to monitor the changes in workload requirements and dynamically place containers to achieve desired isolation and performance. This study is carried out using real-life retail e-commerce workloads of M/s Flipkart hosted in their private data center. Key advantage for such private cloud workloads is that majority of these applications are naturally developed on same OS platforms which gives a strong motivation to use containers for consolidating them.
In the second part of the work we look at constructs for managing the elastic scaling of the containerised workloads. It is observed that majority (more than 70\%) of the Flipkart's workloads are stateless which allows seamless cloning of containers across data centers. We leverage this capability through in-kernel load balancing and horizontal scaling to adjust to dynamic workload variation. E-commerce workloads in Flipkart data center exhibit seasonality and show similar workload variations everyday. These variations can be predicted using Seasonal ARIMA model with minimal errors. Containers are light weight in terms of resource footprint as compared to VMs and can be subjected to frequent vertical scaling without any overhead. Vertical scaling offers benefits of increased performance without loss of service or migration overhead and thus provide better elasticity. However, vertical scaling is feasible only based on idle resources available on the platform on which the container is hosted. With adaptive container placement strategies we can identify potential containers that can be dynamically migrated for creating necessary idle resources to enable vertical scaling for the desired containers. It compliments vertical scaling by filling the gaps to utilize idle resources or vacate the required resources for enabling vertical scaling. Exploiting seasonality, offered by workloads, resources can be provisioned proactively. We observe that predictive scaling reduces SLA violations as compared to reactive scaling by allocating required resources in advance. While in arbitrarily varying workloads, when future requirements cannot be predicted, proactive scaling cannot be used. We show that adjusting to variation in workloads dynamically by provisioning and de-provisioning resources automatically, allocated to containers, reduced average resource requirement as compared to fixed resource allocation. It enables us to consolidate more applications on the existing capacity.