Towards Effcient Privacy-Preserving Two-Party k-Means Clustering Protocol

Chim, Sonali

dc.contributor.advisor	Chatterjee, Sanjit
dc.contributor.author	Chim, Sonali
dc.date.accessioned	2021-07-02T04:46:07Z
dc.date.available	2021-07-02T04:46:07Z
dc.date.submitted	2021
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5180
dc.description.abstract	Two-party data mining is a win-win game if played with a guarantee of data privacy from each other. This guarantee is provided by the use of cryptographic techniques in designing the two-party protocol. The need to obtain collaborative data mining results is growing and so is the need for privacy-preserving data mining protocols. Clustering is one of the data mining techniques and one of the popular clustering algorithms is k-means clustering. We studied the recent work for the secure two-party k-means clustering by Bunn and Ostrovosky and found that the protocol is inefficient for practical purposes. The protocol requires communication rounds which are linear in security parameter for the center initialization step and are quadratic in security parameter for an iterative Lloyd's step of the k-means clustering algorithm. The challenge in the secure two-party k-means clustering is the exorbitant communication cost occurring due to the high number of interactions between the parties for performing computations on the data. Our work attempts to resolve this problem of inefficiency in k-means clustering protocol in a two-party setting by proposing some modifications. We have come up with two comparison protocols that are required in the k-means clustering protocol. One of the protocols is to find a minimum of two shared numbers which runs in constant communication rounds. Using this protocol as a building block, another protocol is designed to find a minimum of n shared numbers, which runs in O(n) communication rounds. We have also improved a protocol that selects a random value from a domain oblivious to both parties. Apart from this, the idea to avoid the two-party integer division altogether is incorporated in the k-means clustering protocol. With these improvements, we propose a two-party k-means clustering protocol for which the initialization step requires communication rounds linear in security parameter and Lloyd's step requires communications rounds that are independent of the security parameter. The protocol provides a security guarantee in the semi-honest model except for some minor information leakage. We argue that this leakage in the protocol can be tolerated considering the substantial gain in the communication cost We have verified the gain in performance of the modified protocol by implementing both the k-means clustering protocols and running their instances in the same set-up.	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	k-means clustering	en_US
dc.subject	privacy preserving clustering	en_US
dc.subject	two party clustering	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science::Computer science::Cryptography	en_US
dc.title	Towards Effcient Privacy-Preserving Two-Party k-Means Clustering Protocol	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: thesis.pdf
Size:: 779.6Kb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [561]

Show simple item record