dc.description.abstract | Many decisions today are taken by various machine learning algorithms, hence it is crucial to
accommodate fairness in such algorithms to remove/reduce any kind of bias in the decision.
We incorporate fairness in the problem of clustering. Clustering is a classical machine learning
problem in which the task is to partition the data points into various groups such that the data
points belonging to one group are more similar to each other than the data points belonging to
some other group in the partition.
In our model, each data point belongs to one or more number of categories. We define fairness
in terms of two constraints, restricted dominance and minority protection. While ensuring
fairness in the clustering, we consider each data point in only one of the categories from the
set of categories it belongs to. Our model ensures that no category is either in minority or in
dominance in any of the clusters. Representation of a category in a cluster is considered not
in absolute terms but in proportion to its presence in the whole dataset. We give bi-criteria
approximation for fair clustering whose objective is to minimise Lp-norm. Here, the Lp-norm
is defined as
Lp(V, ϕ) =
X
v∈V
d(v, ϕ(v))p
!1/p
,
where V is the dataset, C is the set of centers chosen for clustering, Φ : V → C is the assignment
which minimises the cost of clustering while satisfying the fairness constraints and p can take any
positive integral value. Our solution violates the fairness constraints by an additive violation
of at most 2. We implement this algorithm and do experiments to compare it with the stateof-
the-art. For any ϵ > 0, we give a (1 + ϵ)-approximate algorithm for fair clustering for
points lying in Euclidean space whose objective is to minimise L1-norm (or L2-norm). This
algorithm also violates the fairness constraints by an additive violation of at most 2. For points
lying in Rd, the run time of this algorithm for L2-norm is O
nd · 2˜O(k/ϵ)
+ poly(n) · 2˜O (k/ϵ),
where n represents the size of the dataset. For L1-norm, the run time of this algorithm is
O
nd · 2˜O(k/ϵO(1))
+poly(n) · 2˜O(k/ϵO(1)). Given a γ-perturbation resilient instance of clustering
in the metric space (V, d), we also give a bi-criteria approximation for the fair clustering of the
same instance while changing its metric to d′. Here, d′ is any metric which is a γ-perturbation
of (V, d). This solution also violates the fairness constraints by an additive violation of at most
2. | en_US |