人工智能培训

搜索

机器学习论文:用于数据汇总的公平K-中心聚类(Fair k-Center Clustering for Data Summarization)

[复制链接]
amwufhvk 发表于 2019-1-28 12:02:43 | 显示全部楼层 |阅读模式
amwufhvk 2019-1-28 12:02:43 331 0 显示全部楼层
机器学习论文:用于数据汇总的公平K-中心聚类(Fair k-Center Clustering for Data Summarization)在数据汇总中,我们想要选择k个原型来总结adata集。我们研究了一个设置,其中数据集包括几个人口统计组,我们仅限于选择属于组i的k_i原型。在没有公平约束的情况下,Acommon解决问题的方法是优化基于acentroid的聚类目标,例如k-center。一个自然的扩展是将公平约束结合到聚类目标中。这样做的现有算法在数据集的大小中以超二次方式运行。这与可以在线性时间内近似优化的标准k中心物镜形成对比。在本文中,我们通过在公平性约束下提供k-中心问题的简单近似算法来解决这一差距,其中运行时间线性在数据集的大小和k。如果人口统计组的数量很小,则算法的近似保证。只会产生恒定因子开销。我们证明了我们的算法在合成和真实数据集上的适用性。
In data summarization we want to choose k prototypes in order to summarize adata set.We study a setting where the data set comprises several demographicgroups and we are restricted to choose k_i prototypes belonging to group i.Acommon approach to the problem without the fairness constraint is to optimize acentroid-based clustering objective such as k-center.A natural extension thenis to incorporate the fairness constraint into the clustering objective.Existing algorithms for doing so run in time super-quadratic in the size of thedata set.This is in contrast to the standard k-center objective that can beapproximately optimized in linear time.In this paper, we resolve this gap byproviding a simple approximation algorithm for the k-center problem under thefairness constraint with running time linear in the size of the data set and k.If the number of demographic groups is small, the approximation guarantee ofour algorithmonly incurs a constant-factor overhead.We demonstrate theapplicability of our algorithm on both synthetic and real data sets.机器学习论文:用于数据汇总的公平K-中心聚类(Fair k-Center Clustering for Data Summarization) GPK5KSsfvvfyyo22.jpg
URL地址:https://arxiv.org/abs/1901.08628     ----pdf下载地址:https://arxiv.org/pdf/1901.08628    ----机器学习论文:用于数据汇总的公平K-中心聚类(Fair k-Center Clustering for Data Summarization)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

amwufhvk当前离线
新手上路

查看:331 | 回复:0

快速回复 返回顶部 返回列表