Algorithms for Fair Clustering

Allabadi, Swati

dc.contributor.advisor	Louis, Anand
dc.contributor.advisor	Khan, Arindam
dc.contributor.author	Allabadi, Swati
dc.date.accessioned	2022-05-03T05:45:10Z
dc.date.available	2022-05-03T05:45:10Z
dc.date.submitted	2021
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5709
dc.description.abstract	Many decisions today are taken by various machine learning algorithms, hence it is crucial to accommodate fairness in such algorithms to remove/reduce any kind of bias in the decision. We incorporate fairness in the problem of clustering. Clustering is a classical machine learning problem in which the task is to partition the data points into various groups such that the data points belonging to one group are more similar to each other than the data points belonging to some other group in the partition. In our model, each data point belongs to one or more number of categories. We define fairness in terms of two constraints, restricted dominance and minority protection. While ensuring fairness in the clustering, we consider each data point in only one of the categories from the set of categories it belongs to. Our model ensures that no category is either in minority or in dominance in any of the clusters. Representation of a category in a cluster is considered not in absolute terms but in proportion to its presence in the whole dataset. We give bi-criteria approximation for fair clustering whose objective is to minimise Lp-norm. Here, the Lp-norm is defined as Lp(V, ϕ) = X v∈V d(v, ϕ(v))p !1/p , where V is the dataset, C is the set of centers chosen for clustering, Φ : V → C is the assignment which minimises the cost of clustering while satisfying the fairness constraints and p can take any positive integral value. Our solution violates the fairness constraints by an additive violation of at most 2. We implement this algorithm and do experiments to compare it with the stateof- the-art. For any ϵ > 0, we give a (1 + ϵ)-approximate algorithm for fair clustering for points lying in Euclidean space whose objective is to minimise L1-norm (or L2-norm). This algorithm also violates the fairness constraints by an additive violation of at most 2. For points lying in Rd, the run time of this algorithm for L2-norm is O nd · 2˜O(k/ϵ) + poly(n) · 2˜O (k/ϵ), where n represents the size of the dataset. For L1-norm, the run time of this algorithm is O nd · 2˜O(k/ϵO(1)) +poly(n) · 2˜O(k/ϵO(1)). Given a γ-perturbation resilient instance of clustering in the metric space (V, d), we also give a bi-criteria approximation for the fair clustering of the same instance while changing its metric to d′. Here, d′ is any metric which is a γ-perturbation of (V, d). This solution also violates the fairness constraints by an additive violation of at most 2.	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Clustering	en_US
dc.subject	machine learning algorithms	en_US
dc.subject	fairness constraints	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science	en_US
dc.title	Algorithms for Fair Clustering	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: Swati-allabadi-mtech-res-2021- ...
Size:: 1.359Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [394]

Show simple item record