Show simple item record

dc.contributor.advisorVijay Kumar, P
dc.contributor.authorMoorthy, Prakash Narayana
dc.date.accessioned2017-07-26T12:04:41Z
dc.date.accessioned2018-07-31T04:48:54Z
dc.date.available2017-07-26T12:04:41Z
dc.date.available2018-07-31T04:48:54Z
dc.date.issued2017-07-26
dc.date.submitted2015
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/2648
dc.identifier.abstracthttp://etd.iisc.ac.in/static/etd/abstracts/3454/G26735-Abs.pdfen_US
dc.description.abstractThis thesis deals with the problem of code design in the setting of distributed storage systems consisting of multiple storage nodes, storing many different data les. A primary goal in such systems is the efficient repair of a failed node. Regenerating codes and codes with locality are two classes of coding schemes that have recently been proposed in literature to address this goal. While regenerating codes aim to minimize the amount of data-download needed to carry out node repair, codes with locality seek to minimize the number of nodes accessed during node repair. Our focus here is on linear codes with locality, which is a concept originally introduced by Gopalan et al. in the context of recovering from a single node failure. A code-symbol of a linear code C is said to have locality r, if it can be recovered via a linear combination of r other code-symbols of C. The code C is said to have (i) information-symbol locality r, if all of its message symbols have locality r, and (ii) all-symbol locality r, if all the code-symbols have locality r. We make the following three contributions to the area of codes with locality. Firstly, we extend the notion of locality, in two directions, so as to permit local recovery even in the presence of multiple node failures. In the first direction, we consider codes with \local error correction" in which a code-symbol is protected by a local-error-correcting code having local-minimum-distance 3, and thus allowing local recovery of the code-symbol even in the presence of 2 other code-symbol erasures. In the second direction, we study codes with all-symbol locality that can recover from two erasures via a sequence of two local, parity-check computations. When restricted to the case of all-symbol locality and two erasures, the second approach allows, in general, for design of codes having larger minimum distance than what is possible via the rst approach. Under both approaches, by studying the generalized Hamming weights of the dual codes, we derive tight upper bounds on their respective minimum distances. Optimal code constructions are identified under both approaches, for a class of code parameters. A few interesting corollaries result from this part of our work. Firstly, we obtain a new upper bound on the minimum distance of concatenated codes and secondly, we show how it is always possible to construct the best-possible code (having largest minimum distance) of a given dimension when the code's parity check matrix is partially specified. In a third corollary, we obtain a new upper bound for the minimum distance of codes with all-symbol locality in the single erasure case. Secondly, we introduce the notion of codes with local regeneration that seek to combine the advantages of both codes with locality as well as regenerating codes. These are vector-alphabet analogues of codes with local error correction in which the local codes themselves are regenerating codes. An upper bound on the minimum distance is derived when the constituent local codes have a certain uniform rank accumulation (URA) property. This property is possessed by both the minimum storage regenerating (MSR) and the minimum bandwidth regenerating (MBR) codes. We provide several optimal constructions of codes with local regeneration, where the local codes are either the MSR or the MBR codes. The discussion here is also extended to the case of general vector-linear codes with locality, in which the local codes do not necessarily have the URA property. Finally, we evaluate the efficacy of two specific coding solutions, both possessing an inherent double replication of data, in a practical distributed storage setting known as Hadoop. Hadoop is an open-source platform dealing with distributed storage of data in which the primary aim is to perform distributed computation on the stored data via a paradigm known as Map Reduce. Our evaluation shows that while these codes have efficient repair properties, their vector-alphabet-nature can negatively a affect Map Reduce performance, if they are implemented under the current Hadoop architecture. Specifically, we see that under the current architecture, the choice of number processor cores per node and Map-task scheduling algorithm play a major role in determining their performance. The performance evaluation is carried out via a combination of simulations and actual experiments in Hadoop clusters. As a remedy to the problem, we also pro-pose a modified architecture in which one allows erasure coding across blocks belonging to different les. Under the modified architecture, the new coding solutions will not suffer from any Map Reduce performance-loss as seen in the original architecture, while retaining all of their desired repair propertiesen_US
dc.language.isoen_USen_US
dc.relation.ispartofseriesG26735en_US
dc.subjectDistributed Storage Codingen_US
dc.subjectRegeneration Codesen_US
dc.subjectLocal Repair Codesen_US
dc.subjectLinear Codesen_US
dc.subjectInformation Theoryen_US
dc.subjectError Correcting Codesen_US
dc.subjectEncodingen_US
dc.subjectVector Codesen_US
dc.subjectMinimum Storage Regenerating (MSR) Codesen_US
dc.subjectMinimum Bandwith Regenerating (MBR) Codesen_US
dc.subjectCoding Theoryen_US
dc.subjectHadoopen_US
dc.subjectDistributed Data Storageen_US
dc.subject.classificationComputer Scienceen_US
dc.titleCodes With Locality For Distributed Data Storageen_US
dc.typeThesisen_US
dc.degree.namePhDen_US
dc.degree.levelDoctoralen_US
dc.degree.disciplineFaculty of Engineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record