Effective optimization techniques for a parallel file system
Abstract
Effective Optimization Techniques for a Parallel File System
by Raghvendran M
Significant work has been done in evolving parallel I/O architectures, I/O interfaces, and other programming techniques. However, only a few mechanisms currently exist that bridge the gap between the I/O architectures and the programming abstractions. A Parallel File System is the prime mechanism to deliver high-performance parallel I/O on multiprocessor machines for a wide class of scientific and engineering applications.
With the evolution of commodity clusters (called High Performance Computation or HPC clusters) as a cost-effective computing platform for parallel computing, it is necessary to have an optimized and portable parallel file system to satisfy applications' I/O needs. The existing parallel I/O mechanisms on such clusters, based on NFS, provide dismal I/O performance due to architectural limitations in disallowing de-clustering of file data as well as due to the heavyweight nature of the protocol. Several other current I/O architectures, based on shared or cluster file systems, also perform poorly in the cluster-based parallel computing environment due to mismatched semantics between the application I/O characteristics and I/O architectural features. The parallel file system represents an appropriate split in the semantics in the parallel application I/O path, where parallel I/O mechanisms and other optimization techniques could be implemented at the I/O platform level and exported through feature-rich, platform-independent interfaces.
In spite of significant research in parallel I/O techniques, portable parallel file systems do not incorporate these findings and are not commonly used. Many of the optimization techniques for parallel I/O in the literature, such as prefetching, have not had any general-purpose implementations nor have been validated for a wide class of application workloads or access patterns. There are many issues (such as timeliness) that need investigation for prefetching to be effective. The incorporation of parallel I/O optimization techniques in the commodity cluster setup has not been satisfactory.
We establish the parallel file system as the right abstraction for parallel I/O on a commodity cluster from the performance and management perspectives. We also evaluate various optimization techniques for parallel file systems on a commodity cluster with the objective of providing a fast scratch space on a real cluster-based supercomputer such as the C-DAC PARAM Padma (ranked 171st in the July 2003 edition of the TOP500 list).
We extend a data prefetching technique for the parallel file system architecture and demonstrate its effectiveness with a policy-based feedback loop. Other optimization techniques for improving a parallel file system are investigated to enhance its performance. This thesis makes contributions in the areas of analysis and design of these optimization techniques for a parallel file system, such as an online predictive prefetching mechanism with adaptive policy control, an adaptive flow control mechanism for supporting collective calls from the architectural perspective, and techniques for managing large data structures and efficient file processing in the file system design.
A parallel file system incorporating the above-stated optimizations has been implemented on C-DAC's PARAM Padma, a one-teraflop 54-node cluster-based parallel processing system. These optimizations show significant improvement for the targeted application I/O workloads on this cluster.

