Privacy in Information Retrieval, Delivery and Distributed Computing

Vaidya, Kanishak

View/Open

Thesis full text (1.663Mb)

Author

Vaidya, Kanishak

Metadata

Show full item record

Abstract

The aim of our work is to address the challenges of privacy in information retrieval, information delivery, and in distributed computing systems. We focus on reducing the download and upload costs and decreasing the subpacketization level while maintaining user privacy. Firstly we consider private information retrieval (PIR) schemes for cache-aided content delivery networks. In private information retrieval (PIR) problem, a user wishes to retrieve a file from a set of files replicated across multiple non-colluding servers while revealing no information about the identity of the desired file to any of the servers. In “The Capacity of Private Information Retrieval” (IEEE Transactions on Information Theory, 2017), Sun et al. propose a PIR scheme that achieves optimal download cost in the single-user PIR setup. We consider an extension to the single-user PIR, known as the cache-aided multi-user PIR (muPIR). In cache-aided multi-user PIR (muPIR), N unit-sized files are replicated across B ≥ 2 non- colluding servers and K users. Each user wants to retrieve a file from the servers without revealing any information about the requested file to the servers. In a dedicated-cache-aided muPIR problem, every user is equipped with a cache that can store M units. Another variation, known as multi-access cache-aided muPIR, includes K users and C ≤ K helper caches, each with a capacity to store M units. Every user can access multiple cache nodes, and each cache node can be accessed by multiple users. Before the users make their requests, the cache nodes are filled with the content of the files. Then, each user selects a file index, and users cooperatively generate and send queries to the servers to retrieve their desired files privately. To ensure user privacy, none of the servers should get any information about users’ demands from the queries they receive. Upon receiving their respective queries, the servers generate answers that are a function of the queries they receive and the files they are storing. These answers are then broadcasted to the users. Every user should retrieve its requested file using these answers and the cache content it has access to. The main objective is to minimize the size of broadcasted data by the servers, known as download cost, reduce the size of user queries, known as upload cost, and decrease the subpacketization level, which refers to the number of subfiles each file has to be divided into. We present muPIR schemes that reduce the download and upload costs and the sub- packetization level. Our focus is on both dedicated cache and multi-access cache-aided net- works. In multi-access networks, we explore “generalized combinatorial topology” and “cyclic wraparound” multi-access networks. For the generalized combinatorial topology, we propose an order optimal scheme in terms of download cost with uncoded cache placement. For ded- icated cache-aided muPIR, we utilize placement delivery arrays (PDAs) to create achievable schemes. With PDAs, we can significantly reduce upload cost and subpacketization level while only incurring a slight increase in download cost. We show that the proposed schemes are order optimal in terms of download cost for a specific class of PDA known as MAN-PDA for uncoded and coded cache placement. For certain special cases, the upload cost and subpacketization level are also demonstrated to be optimal. In addition to private information retrieval, we also focus on achieving privacy in informa- tion delivery networks. The problem of private information delivery (PID) was introduced in “Private Information Delivery” (IEEE Transactions on Information Theory, 2020) by H. Sun. In PID, N independent and unit size files W1, · · · , WN are stored across B servers. There is one user, and the servers want to convey one of the files, say file WD, to the user. But servers don’t want the user to know the identity of the file, i.e. the user should not get any information about the index D. We proposed an achievable PID scheme for the PID problem where each server can encode and store M ≤ N units in its storage, and each server is associated with exactly L files. And for the special case where each file is associated with the same number of servers, we show that our achieved download cost is optimal. Lastly, we focus on privacy in distributed computing systems, where a master node wants to perform a computation by distributing the computation among several worker nodes. K. Ramchandran et al., in “Speeding Up Distributed Machine Learning using Codes,” (IEEE Transactions on Information Theory, 2018) show that coding theory can be used to speed up data shuffling and matrix multiplication in distributed machine learning algorithms. Specifi- cally, we consider the scenario where a master node aims to compute a matrix multiplication, ABD, by distributing the computation among several worker nodes. Here, the matrix A is owned by the master node, while the other matrix BD is in the library of matrices B shared by the workers, B = {B1, · · · , BD, · · · BN}. Our objective is to design a distributed computing scheme that guarantees data and demand privacy of the master against any set of T colluding workers. None of the worker nodes should get any information about the index D, referred to as demand privacy, while also preventing any information leakage about the master’s ma- trix A, referred to as data privacy. Furthermore, our scheme addresses the issues of straggler mitigation (i.e. unresponsive or slow workers) and security against malicious workers (i.e. the workers that return incorrect computation). In summary, our work proposes efficient and practical schemes for achieving privacy in various distributed systems. Our proposed schemes reduce the cost of privacy and maintain communication and computation performance. Our study contributes to the development of privacy-preserving distributed systems and helps in the better adoption of online privacy in society.

URI

https://etd.iisc.ac.in/handle/2005/6367

Collections

Electrical Communication Engineering (ECE) [405]