Towards Effcient Privacy-Preserving Two-Party k-Means Clustering Protocol

Chim, Sonali

View/Open

Thesis full text (779.6Kb)

Author

Chim, Sonali

Metadata

Show full item record

Abstract

Two-party data mining is a win-win game if played with a guarantee of data privacy from each other. This guarantee is provided by the use of cryptographic techniques in designing the two-party protocol. The need to obtain collaborative data mining results is growing and so is the need for privacy-preserving data mining protocols. Clustering is one of the data mining techniques and one of the popular clustering algorithms is k-means clustering. We studied the recent work for the secure two-party k-means clustering by Bunn and Ostrovosky and found that the protocol is inefficient for practical purposes. The protocol requires communication rounds which are linear in security parameter for the center initialization step and are quadratic in security parameter for an iterative Lloyd's step of the k-means clustering algorithm. The challenge in the secure two-party k-means clustering is the exorbitant communication cost occurring due to the high number of interactions between the parties for performing computations on the data. Our work attempts to resolve this problem of inefficiency in k-means clustering protocol in a two-party setting by proposing some modifications. We have come up with two comparison protocols that are required in the k-means clustering protocol. One of the protocols is to find a minimum of two shared numbers which runs in constant communication rounds. Using this protocol as a building block, another protocol is designed to find a minimum of n shared numbers, which runs in O(n) communication rounds. We have also improved a protocol that selects a random value from a domain oblivious to both parties. Apart from this, the idea to avoid the two-party integer division altogether is incorporated in the k-means clustering protocol. With these improvements, we propose a two-party k-means clustering protocol for which the initialization step requires communication rounds linear in security parameter and Lloyd's step requires communications rounds that are independent of the security parameter. The protocol provides a security guarantee in the semi-honest model except for some minor information leakage. We argue that this leakage in the protocol can be tolerated considering the substantial gain in the communication cost We have verified the gain in performance of the modified protocol by implementing both the k-means clustering protocols and running their instances in the same set-up.

URI

https://etd.iisc.ac.in/handle/2005/5180

Collections

Computer Science and Automation (CSA) [551]