| dc.description.abstract | Current high-end systems used in web and database servers utilize large storage capacities. Despite the annual decline in disk drive prices, the cost of the storage system now accounts for about 30% to 40% of the total system cost. Furthermore, system administrators must strike a fine balance between the need to provision storage months before actual demand and the need to minimize idle storage. Catering to near-term future storage needs is further complicated by the mismatch between the cumulative growth rate in disk areal densities and the growth in actual storage needs. This mismatch leads to increased numbers of disk drives/LUNs and switch ports. These additional resources cause disk management problems such as choosing the number and size of disk arrays and determining data placement. These disk management issues affect the Quality of Service (QoS) offered to customers for parameters such as performance, availability, reliability, and security—all of which depend on the cost and capacity of the storage system. Therefore, there is a need to reduce the cost/performance and cost/capacity ratios.
There are various situations where storage is used to optimize criteria other than cost, such as latency reduction, reliability enhancement, disk array load balancing, and hot spot removal techniques, to cater to the large performance-sensitive market. A wide variety of performance-enhancing techniques—like striping, mirroring, and replication of data within a track to improve rotational delays—are employed. These techniques reduce access latency by scaling the number of disks with varying performance impact. Due to multiple disks being commonplace in many installations, reliability considerations are no longer optional. Therefore, in modern disk array systems, reliability and latency reduction considerations trade capacity for performance and reliability. Often, the capacity traded is not proportional to the performance gained. In large-capacity storage systems, the extra storage used constitutes a significant cost—not just in terms of disk costs, but also in engineering costs.
It is evident that whether due to increased storage requirements from yearly growth or the need to trade capacity for better performance and reliability, large storage capacities must be managed. As a consequence, the complex problem of storage management must be addressed by system administrators to achieve better QoS. Since the storage management problem is directly proportional to the number of disk drives/LUNs and switch ports, this thesis attempts to use storage organizations judiciously, so that the final storage capacity required in a system is reduced with acceptable performance implications.
Fortunately, in any given time window, not all data are equally likely to be accessed. Data can be grouped as cold, warm, and hot, based on access frequency. This has prompted the need for a three-tiered secondary storage hierarchy to house the data. Hot and warm data could be located in the latency-sensitive top two layers, while it makes economic sense to store compressed data in the third layer.
The introduction of a compressed data layer is bound to introduce a performance penalty. Fine-grained migration of data between RAID levels based on access patterns is used to minimize this penalty. Simple static rule-based migration algorithms and compute-intensive predictive models such as the Prediction by Partial Match Unbounded (PPM)* algorithm are used to migrate near-time data of the access stream into the top layers. Rule-based migrations have employed "promote on partial writes" and access-count-based promotion algorithms. All promotion algorithms use a modified five-minute rule for demotion of cold stripes.
The study confirms the feasibility of using such a hierarchy, where the performance penalty incurred due to the presence of a compressed RAID 5 level is a function of its size relative to the application's working set and the type of migration algorithm employed. Rule-based algorithms are adequate in limiting overhead when the compressed RAID 5 level size is less than or equal to 20%. With higher percentages of compressed RAID 5 levels, rule-based algorithms remain effective, though significant overhead is observed. Use of a predictive migration algorithm such as PPM has reduced the overhead to acceptable levels even with larger compressed RAID 5 sizes.
Experimental results using two trace environments show that the overhead incurred is less than 8% for a 20% compressed RAID 5 level and below 18% at 40% compressed RAID 5 level. The storage saved is 15% with a 20% compressed RAID 5 size and 30% with a 40% compressed RAID 5 size, irrespective of the promotion algorithm used. |  |