MPCLeague: Robust MPC Platform for Privacy-Preserving Machine Learning
MetadataShow full item record
In the modern era of computing, machine learning tools have demonstrated their potential in vital sectors, such as healthcare and finance, to derive proper inferences. The sensitive and confidential nature of the data in such sectors raises genuine concerns for data privacy. This motivated the area of Privacy-preserving Machine Learning (PPML), where privacy of data is guaranteed. Typically, machine learning techniques require significant computing power, which leads clients with limited infrastructure to rely on the method of Secure Outsourced Computation (SOC). In the SOC setting, the computation is outsourced to a set of specialized and powerful cloud servers and the service is availed on a pay-per-use basis. In this thesis, we design an efficient platform, MPCLeague, for PPML in the SOC setting using Secure Multi-party Computation (MPC) techniques. MPC, the holy-grail problem of secure distributed computing, enables a set of n mutually distrusting parties to perform joint computation on their private inputs in a way that no coalition of t parties can learn more information than the output (privacy) or affect the true output of the computation (correctness). While MPC, in general, has been a subject of extensive research, the area of MPC with a small number of parties has drawn popularity of late mainly due to its application to real-time scenarios, efficiency and simplicity. This thesis focuses on designing efficient MPC frameworks for 2, 3 and 4 parties, with at most one corruption and supports ring structures. Our platform aims at achieving the most substantial security notion of robustness, where the honest parties are guaranteed to obtain the output irrespective of the behaviour of the corrupt parties. A robust protocol prevents the corrupt parties from repeatedly causing the computations to rerun, thereby upholding the trust in the system. While on the roadmap to attain robustness, our frameworks also demonstrate constructions with improved performance that achieve relaxed notions of security: security with abort and fairness. A fair protocol enforces the restriction that either all parties or none of them receive the output. On the other hand, honest parties may not receive the output while corrupt parties do for the case of security with abort. The general structure of the computation involves the execution of the protocol steps once the participating parties have supplied their inputs. Finally, the output is distributed to all the parties. However, to enhance practical efficiency, many recent works resort to the preprocessing paradigm, which splits the computation into two phases; a preprocessing phase where input-independent (but function-dependent), computationally heavy tasks can be computed, followed by a fast online phase. Since the same functions in ML are evaluated several times, this paradigm naturally fits the case of PPML, where the ML algorithm is known beforehand. At the heart of this thesis are four frameworks - ASTRA, SWIFT, Tetrad, ABY2.0 - catered to different settings. -- ASTRA: We begin with the setting of 3 parties (3PC), which forms the base case for honest majority. If a majority of the participating parties are honest, then the setting is deemed an honest majority setting. In the set of 3 parties, at most one party can be corrupt, and this framework tackles semi-honest corruption, where the corrupt party follows the protocol steps but tries to glean more information from the computation. ASTRA acts as a stepping stone towards achieving a stronger security guarantee against active corruption. Our protocol requires communication of 2 ring elements per multiplication gate during the online phase, attaining a per-party cost of less than one element. This is achieved for the first time in the regime of 3PC. -- SWIFT: Designed for 3 parties, this framework tackles one active corruption where the corrupt party can arbitrarily deviate from the computation. Building on ASTRA, SWIFT provides a multiplication that improves the communication by at least 3x over state of the art, besides improving security from abort to robustness. In the regime of malicious 3PC, SWIFT is the first robust and efficient PPML framework. It achieves a dot product protocol with communication independent of the vector size for the first time. -- Tetrad: Designed for 4 parties in the honest majority, the fair multiplication protocol in Tetrad requires communication of only 5 ring elements instead of 6 in the state-of-the-art. The fair framework is then extended to provide robustness without inflating the costs. A notable contribution is the design of the multiplication protocol that supports on-demand applications where the function to be computed is not known in advance. -- ABY2.0: Moving on to the stronger corruption model where a majority of the parties can be corrupt, we explore the base case of 2 parties (2PC). Since we aim to achieve robustness which is proven to be impossible in active corruption, we restrict ourselves to semi-honest corruption. The prime contribution of this framework is the scalar product for which the online communication is two ring elements irrespective of the vector dimension. This is a feature achieved for the first time in the 2PC literature. Our frameworks provide the following contributions in addition to the ones mentioned above. First, we support multi-input multiplication for arithmetic and boolean worlds, improving the online phase in rounds and communication. Second, all our frameworks except SWIFT, incorporate truncation without incurring any overhead. Finally, we introduce efficient instantiation of garbled-world, tailor-made for the mixed-protocol framework for the first time. The mixed-protocol approach, combining arithmetic, boolean and garbled style computations, has demonstrated its potential in several practical use-cases like PPML. To facilitate the computation, we also provide the conversion mechanisms to switch between the computation styles. The practicality of our framework is argued through improvements in the benchmarking of widely used ML algorithms -- Linear Regression, Logistic Regression, Neural Networks, and Support Vector Machines. We propose two variants for each of our frameworks, with one variant aiming to minimise the execution time while the other focuses on the monetary cost. The concrete efficiency gains of our frameworks coupled with the stronger security guarantee of robustness make our platform an ideal choice for a real-time deployment of privacy-preserving machine learning techniques.