Privacy in Information Retrieval, Delivery and Distributed Computing
Abstract
The aim of our work is to address the challenges of privacy in information retrieval, information
delivery, and in distributed computing systems. We focus on reducing the download and upload
costs and decreasing the subpacketization level while maintaining user privacy.
Firstly we consider private information retrieval (PIR) schemes for cache-aided content
delivery networks. In private information retrieval (PIR) problem, a user wishes to retrieve
a file from a set of files replicated across multiple non-colluding servers while revealing no
information about the identity of the desired file to any of the servers. In “The Capacity of
Private Information Retrieval” (IEEE Transactions on Information Theory, 2017), Sun et al.
propose a PIR scheme that achieves optimal download cost in the single-user PIR setup. We
consider an extension to the single-user PIR, known as the cache-aided multi-user PIR (muPIR).
In cache-aided multi-user PIR (muPIR), N unit-sized files are replicated across B ≥ 2 non-
colluding servers and K users. Each user wants to retrieve a file from the servers without
revealing any information about the requested file to the servers. In a dedicated-cache-aided
muPIR problem, every user is equipped with a cache that can store M units. Another variation,
known as multi-access cache-aided muPIR, includes K users and C ≤ K helper caches, each
with a capacity to store M units. Every user can access multiple cache nodes, and each cache
node can be accessed by multiple users. Before the users make their requests, the cache nodes are
filled with the content of the files. Then, each user selects a file index, and users cooperatively
generate and send queries to the servers to retrieve their desired files privately. To ensure user
privacy, none of the servers should get any information about users’ demands from the queries
they receive. Upon receiving their respective queries, the servers generate answers that are
a function of the queries they receive and the files they are storing. These answers are then
broadcasted to the users. Every user should retrieve its requested file using these answers and
the cache content it has access to. The main objective is to minimize the size of broadcasted
data by the servers, known as download cost, reduce the size of user queries, known as upload
cost, and decrease the subpacketization level, which refers to the number of subfiles each file
has to be divided into.
We present muPIR schemes that reduce the download and upload costs and the sub-
packetization level. Our focus is on both dedicated cache and multi-access cache-aided net-
works. In multi-access networks, we explore “generalized combinatorial topology” and “cyclic
wraparound” multi-access networks. For the generalized combinatorial topology, we propose
an order optimal scheme in terms of download cost with uncoded cache placement. For ded-
icated cache-aided muPIR, we utilize placement delivery arrays (PDAs) to create achievable
schemes. With PDAs, we can significantly reduce upload cost and subpacketization level while
only incurring a slight increase in download cost. We show that the proposed schemes are order
optimal in terms of download cost for a specific class of PDA known as MAN-PDA for uncoded
and coded cache placement. For certain special cases, the upload cost and subpacketization
level are also demonstrated to be optimal.
In addition to private information retrieval, we also focus on achieving privacy in informa-
tion delivery networks. The problem of private information delivery (PID) was introduced in
“Private Information Delivery” (IEEE Transactions on Information Theory, 2020) by H. Sun.
In PID, N independent and unit size files W1, · · · , WN are stored across B servers. There is
one user, and the servers want to convey one of the files, say file WD, to the user. But servers
don’t want the user to know the identity of the file, i.e. the user should not get any information
about the index D.
We proposed an achievable PID scheme for the PID problem where each server can encode
and store M ≤ N units in its storage, and each server is associated with exactly L files. And
for the special case where each file is associated with the same number of servers, we show that
our achieved download cost is optimal.
Lastly, we focus on privacy in distributed computing systems, where a master node wants
to perform a computation by distributing the computation among several worker nodes. K.
Ramchandran et al., in “Speeding Up Distributed Machine Learning using Codes,” (IEEE
Transactions on Information Theory, 2018) show that coding theory can be used to speed up
data shuffling and matrix multiplication in distributed machine learning algorithms. Specifi-
cally, we consider the scenario where a master node aims to compute a matrix multiplication,
ABD, by distributing the computation among several worker nodes. Here, the matrix A is
owned by the master node, while the other matrix BD is in the library of matrices B shared
by the workers, B = {B1, · · · , BD, · · · BN}. Our objective is to design a distributed computing
scheme that guarantees data and demand privacy of the master against any set of T colluding
workers. None of the worker nodes should get any information about the index D, referred
to as demand privacy, while also preventing any information leakage about the master’s ma-
trix A, referred to as data privacy. Furthermore, our scheme addresses the issues of straggler
mitigation (i.e. unresponsive or slow workers) and security against malicious workers (i.e. the
workers that return incorrect computation).
In summary, our work proposes efficient and practical schemes for achieving privacy in
various distributed systems. Our proposed schemes reduce the cost of privacy and maintain
communication and computation performance. Our study contributes to the development of
privacy-preserving distributed systems and helps in the better adoption of online privacy in
society.