Codes With Locality For Distributed Data Storage

Moorthy, Prakash Narayana

dc.contributor.advisor	Vijay Kumar, P
dc.contributor.author	Moorthy, Prakash Narayana
dc.date.accessioned	2017-07-26T12:04:41Z
dc.date.accessioned	2018-07-31T04:48:54Z
dc.date.available	2017-07-26T12:04:41Z
dc.date.available	2018-07-31T04:48:54Z
dc.date.issued	2017-07-26
dc.date.submitted	2015
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/2648
dc.identifier.abstract	https://etd.iisc.ac.in/static/etd/abstracts/3454/G26735-Abs.pdf	en_US
dc.description.abstract	This thesis deals with the problem of code design in the setting of distributed storage systems consisting of multiple storage nodes, storing many different data les. A primary goal in such systems is the efficient repair of a failed node. Regenerating codes and codes with locality are two classes of coding schemes that have recently been proposed in literature to address this goal. While regenerating codes aim to minimize the amount of data-download needed to carry out node repair, codes with locality seek to minimize the number of nodes accessed during node repair. Our focus here is on linear codes with locality, which is a concept originally introduced by Gopalan et al. in the context of recovering from a single node failure. A code-symbol of a linear code C is said to have locality r, if it can be recovered via a linear combination of r other code-symbols of C. The code C is said to have (i) information-symbol locality r, if all of its message symbols have locality r, and (ii) all-symbol locality r, if all the code-symbols have locality r. We make the following three contributions to the area of codes with locality. Firstly, we extend the notion of locality, in two directions, so as to permit local recovery even in the presence of multiple node failures. In the first direction, we consider codes with \local error correction" in which a code-symbol is protected by a local-error-correcting code having local-minimum-distance 3, and thus allowing local recovery of the code-symbol even in the presence of 2 other code-symbol erasures. In the second direction, we study codes with all-symbol locality that can recover from two erasures via a sequence of two local, parity-check computations. When restricted to the case of all-symbol locality and two erasures, the second approach allows, in general, for design of codes having larger minimum distance than what is possible via the rst approach. Under both approaches, by studying the generalized Hamming weights of the dual codes, we derive tight upper bounds on their respective minimum distances. Optimal code constructions are identified under both approaches, for a class of code parameters. A few interesting corollaries result from this part of our work. Firstly, we obtain a new upper bound on the minimum distance of concatenated codes and secondly, we show how it is always possible to construct the best-possible code (having largest minimum distance) of a given dimension when the code's parity check matrix is partially specified. In a third corollary, we obtain a new upper bound for the minimum distance of codes with all-symbol locality in the single erasure case. Secondly, we introduce the notion of codes with local regeneration that seek to combine the advantages of both codes with locality as well as regenerating codes. These are vector-alphabet analogues of codes with local error correction in which the local codes themselves are regenerating codes. An upper bound on the minimum distance is derived when the constituent local codes have a certain uniform rank accumulation (URA) property. This property is possessed by both the minimum storage regenerating (MSR) and the minimum bandwidth regenerating (MBR) codes. We provide several optimal constructions of codes with local regeneration, where the local codes are either the MSR or the MBR codes. The discussion here is also extended to the case of general vector-linear codes with locality, in which the local codes do not necessarily have the URA property. Finally, we evaluate the efficacy of two specific coding solutions, both possessing an inherent double replication of data, in a practical distributed storage setting known as Hadoop. Hadoop is an open-source platform dealing with distributed storage of data in which the primary aim is to perform distributed computation on the stored data via a paradigm known as Map Reduce. Our evaluation shows that while these codes have efficient repair properties, their vector-alphabet-nature can negatively a affect Map Reduce performance, if they are implemented under the current Hadoop architecture. Specifically, we see that under the current architecture, the choice of number processor cores per node and Map-task scheduling algorithm play a major role in determining their performance. The performance evaluation is carried out via a combination of simulations and actual experiments in Hadoop clusters. As a remedy to the problem, we also pro-pose a modified architecture in which one allows erasure coding across blocks belonging to different les. Under the modified architecture, the new coding solutions will not suffer from any Map Reduce performance-loss as seen in the original architecture, while retaining all of their desired repair properties	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G26735	en_US
dc.subject	Distributed Storage Coding	en_US
dc.subject	Regeneration Codes	en_US
dc.subject	Local Repair Codes	en_US
dc.subject	Linear Codes	en_US
dc.subject	Information Theory	en_US
dc.subject	Error Correcting Codes	en_US
dc.subject	Encoding	en_US
dc.subject	Vector Codes	en_US
dc.subject	Minimum Storage Regenerating (MSR) Codes	en_US
dc.subject	Minimum Bandwith Regenerating (MBR) Codes	en_US
dc.subject	Coding Theory	en_US
dc.subject	Hadoop	en_US
dc.subject	Distributed Data Storage	en_US
dc.subject.classification	Computer Science	en_US
dc.title	Codes With Locality For Distributed Data Storage	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.discipline	Faculty of Engineering	en_US

Files in this item

Name:: G26735.pdf
Size:: 2.383Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electrical Communication Engineering (ECE) [524]

Show simple item record