• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Algorithms for Fair Clustering

    View/Open
    Thesis full text (1.359Mb)
    Author
    Allabadi, Swati
    Metadata
    Show full item record
    Abstract
    Many decisions today are taken by various machine learning algorithms, hence it is crucial to accommodate fairness in such algorithms to remove/reduce any kind of bias in the decision. We incorporate fairness in the problem of clustering. Clustering is a classical machine learning problem in which the task is to partition the data points into various groups such that the data points belonging to one group are more similar to each other than the data points belonging to some other group in the partition. In our model, each data point belongs to one or more number of categories. We define fairness in terms of two constraints, restricted dominance and minority protection. While ensuring fairness in the clustering, we consider each data point in only one of the categories from the set of categories it belongs to. Our model ensures that no category is either in minority or in dominance in any of the clusters. Representation of a category in a cluster is considered not in absolute terms but in proportion to its presence in the whole dataset. We give bi-criteria approximation for fair clustering whose objective is to minimise Lp-norm. Here, the Lp-norm is defined as Lp(V, ϕ) = X v∈V d(v, ϕ(v))p !1/p , where V is the dataset, C is the set of centers chosen for clustering, Φ : V → C is the assignment which minimises the cost of clustering while satisfying the fairness constraints and p can take any positive integral value. Our solution violates the fairness constraints by an additive violation of at most 2. We implement this algorithm and do experiments to compare it with the stateof- the-art. For any ϵ > 0, we give a (1 + ϵ)-approximate algorithm for fair clustering for points lying in Euclidean space whose objective is to minimise L1-norm (or L2-norm). This algorithm also violates the fairness constraints by an additive violation of at most 2. For points lying in Rd, the run time of this algorithm for L2-norm is O nd · 2˜O(k/ϵ) + poly(n) · 2˜O (k/ϵ), where n represents the size of the dataset. For L1-norm, the run time of this algorithm is O nd · 2˜O(k/ϵO(1)) +poly(n) · 2˜O(k/ϵO(1)). Given a γ-perturbation resilient instance of clustering in the metric space (V, d), we also give a bi-criteria approximation for the fair clustering of the same instance while changing its metric to d′. Here, d′ is any metric which is a γ-perturbation of (V, d). This solution also violates the fairness constraints by an additive violation of at most 2.
    URI
    https://etd.iisc.ac.in/handle/2005/5709
    Collections
    • Computer Science and Automation (CSA) [393]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV