Enhancing Privacy and Efficiency of Distributed Encryption
Abstract
The primary objective of a distributed system is to eliminate reliance on a central authority. In
a distributed encryption scheme, users should be able to generate their key materials independently
while securely sharing encrypted messages without compromising privacy. In advanced
systems, users communicate via broadcast channels or through an untrusted server (curator)
that holds no secrets, thereby removing the need for private point-to-point channels. We introduce
both formal definitions and practical constructions for a deduplication scheme tailored
to this setting. Encryption schemes with stronger security guarantees have been studied, and
the main aim of such schemes is to leak as little information about the underlying plain text
as possible. On the other hand, to store data efficiently, one needs to store all duplicate data
only once, without making copies of the same. This requires information about the underlying
data to be available. These two goals are at odds with each other. Thus secure deduplication
might appear to be an oxymoron. A situation in which one wants to achieve both goals seems
impossible. On one hand, security demands that the adversary has no idea that the underlying
message is the same, in other words, necessitates that the data be encrypted. On the other
hand, deduplication requires us to be able to find data duplication on such encrypted data.
It is a trade-off between the encryption’s security guarantee and the deduplication’s efficiency.
The problem of “secure deduplication” still remains as one of the most challenging problems in
the modern data driven world.
At Eurocrypt 2013, Bellare, Keelveedhi, and Ristenpart initiated a rigorous study of the
important, practically-motivated problem of “Secure Deduplication”. Even though deduplication
and privacy might appear to be mutually incompatible, Bellare et al. offer a meaningful
definition as well as theoretical and practical constructions for the same. Our work considers
the setting of Secure Fuzzy Deduplication where deduplication of nearby–and not equal–files is
desired. Using our scheme, users can share ciphertext corresponding only to redundant portions
common across similar files they possess, while preserving the privacy of their unique content.
We provide a rigorous definition of fuzzy message locked encryption as well as fuzzy deduplication.
We build constructions that satisfy our definitions in the random oracle model. We
propose a general paradigm of data-adaptive clustering, where the clusters, which are intrinsic
to fuzzy deduplication, are created dynamically during encryption based on observed data patterns.
We believe this paradigm yields better deduplication efficiency (for typical use-cases).
Finally, we evaluate our scheme’s efficiency across Hamming, set difference, and edit distance
metrics using real-world datasets, including images and genomic sequences

