Enhancing Privacy and Efficiency of Distributed Encryption

Chakrabarti, Anirban

View/Open

Thesis full text (2.653Mb)

Author

Chakrabarti, Anirban

Metadata

Show full item record

Abstract

The primary objective of a distributed system is to eliminate reliance on a central authority. In a distributed encryption scheme, users should be able to generate their key materials independently while securely sharing encrypted messages without compromising privacy. In advanced systems, users communicate via broadcast channels or through an untrusted server (curator) that holds no secrets, thereby removing the need for private point-to-point channels. We introduce both formal definitions and practical constructions for a deduplication scheme tailored to this setting. Encryption schemes with stronger security guarantees have been studied, and the main aim of such schemes is to leak as little information about the underlying plain text as possible. On the other hand, to store data efficiently, one needs to store all duplicate data only once, without making copies of the same. This requires information about the underlying data to be available. These two goals are at odds with each other. Thus secure deduplication might appear to be an oxymoron. A situation in which one wants to achieve both goals seems impossible. On one hand, security demands that the adversary has no idea that the underlying message is the same, in other words, necessitates that the data be encrypted. On the other hand, deduplication requires us to be able to find data duplication on such encrypted data. It is a trade-off between the encryption’s security guarantee and the deduplication’s efficiency. The problem of “secure deduplication” still remains as one of the most challenging problems in the modern data driven world. At Eurocrypt 2013, Bellare, Keelveedhi, and Ristenpart initiated a rigorous study of the important, practically-motivated problem of “Secure Deduplication”. Even though deduplication and privacy might appear to be mutually incompatible, Bellare et al. offer a meaningful definition as well as theoretical and practical constructions for the same. Our work considers the setting of Secure Fuzzy Deduplication where deduplication of nearby–and not equal–files is desired. Using our scheme, users can share ciphertext corresponding only to redundant portions common across similar files they possess, while preserving the privacy of their unique content. We provide a rigorous definition of fuzzy message locked encryption as well as fuzzy deduplication. We build constructions that satisfy our definitions in the random oracle model. We propose a general paradigm of data-adaptive clustering, where the clusters, which are intrinsic to fuzzy deduplication, are created dynamically during encryption based on observed data patterns. We believe this paradigm yields better deduplication efficiency (for typical use-cases). Finally, we evaluate our scheme’s efficiency across Hamming, set difference, and edit distance metrics using real-world datasets, including images and genomic sequences

URI

https://etd.iisc.ac.in/handle/2005/7470

Collections

Computer Science and Automation (CSA) [516]