Algorithms for Fair Clustering

Allabadi, Swati

View/Open

Thesis full text (1.359Mb)

Author

Allabadi, Swati

Metadata

Show full item record

Abstract

Many decisions today are taken by various machine learning algorithms, hence it is crucial to accommodate fairness in such algorithms to remove/reduce any kind of bias in the decision. We incorporate fairness in the problem of clustering. Clustering is a classical machine learning problem in which the task is to partition the data points into various groups such that the data points belonging to one group are more similar to each other than the data points belonging to some other group in the partition. In our model, each data point belongs to one or more number of categories. We define fairness in terms of two constraints, restricted dominance and minority protection. While ensuring fairness in the clustering, we consider each data point in only one of the categories from the set of categories it belongs to. Our model ensures that no category is either in minority or in dominance in any of the clusters. Representation of a category in a cluster is considered not in absolute terms but in proportion to its presence in the whole dataset. We give bi-criteria approximation for fair clustering whose objective is to minimise Lp-norm. Here, the Lp-norm is defined as Lp(V, ϕ) = X v∈V d(v, ϕ(v))p !1/p , where V is the dataset, C is the set of centers chosen for clustering, Φ : V → C is the assignment which minimises the cost of clustering while satisfying the fairness constraints and p can take any positive integral value. Our solution violates the fairness constraints by an additive violation of at most 2. We implement this algorithm and do experiments to compare it with the stateof- the-art. For any ϵ > 0, we give a (1 + ϵ)-approximate algorithm for fair clustering for points lying in Euclidean space whose objective is to minimise L1-norm (or L2-norm). This algorithm also violates the fairness constraints by an additive violation of at most 2. For points lying in Rd, the run time of this algorithm for L2-norm is O nd · 2˜O(k/ϵ) + poly(n) · 2˜O (k/ϵ), where n represents the size of the dataset. For L1-norm, the run time of this algorithm is O nd · 2˜O(k/ϵO(1)) +poly(n) · 2˜O(k/ϵO(1)). Given a γ-perturbation resilient instance of clustering in the metric space (V, d), we also give a bi-criteria approximation for the fair clustering of the same instance while changing its metric to d′. Here, d′ is any metric which is a γ-perturbation of (V, d). This solution also violates the fairness constraints by an additive violation of at most 2.

URI

https://etd.iisc.ac.in/handle/2005/5709

Collections

Computer Science and Automation (CSA) [546]