人工智能培训

搜索

人工智能论文:通过分形扩展实现可扩展的现实推荐数据集(Scalable Realistic Recommendation Datasets through Fr

[复制链接]
376156679 发表于 2019-1-28 12:15:42 | 显示全部楼层 |阅读模式
376156679 2019-1-28 12:15:42 806 0 显示全部楼层
人工智能论文:通过分形扩展实现可扩展的现实推荐数据集(Scalable Realistic Recommendation Datasets through Fractal Expansions)推荐系统研究目前因学术数据集的大小与工业生产系统的规模之间的脱节而受到影响。为了弥合这一差距,我们建议通过扩展预先存在的公共数据集来生成更大规模的用户/迭代数据集。用户/ itemincidence矩阵将用户和给定平台上的项之间的交互记录为大型稀疏矩阵,其行对应于用户,其列对应于项。我们的技术将这些矩阵扩展为更大的行(用户),列(项)和非零值(交互),同时保留关键的高阶统计属性。我们将Kronecker图论应用于用户/项目入射矩阵,并表明相应的分形扩展保留了用户/迭代矩阵的使用重要性,项目流行度和奇异值谱的胖尾分布。保留这些属性是构建大型现实合成数据集的关键,这些数据集又可以可靠地用于推荐系统和用于训练它们的系统。我们提供算法来产生这样的扩展并将它们应用于MovieLens20万数据集,其中包括由138K用户提供的2000万27K电影的评级。由此产生的扩展数据集在其较小的版本中有100亿个评级,200万个项目和864K用户,并且可以扩大规模或者。较大的版本具有6,550万个评级,700万个项目和1700万用户。
Recommender System research suffers currently from a disconnect between thesize of academic data sets and the scale of industrial production systems.Inorder to bridge that gap we propose to generate more massive user/iteminteraction data sets by expanding pre-existing public data sets.User/itemincidence matrices record interactions between users and items on a givenplatform as a large sparse matrix whose rows correspond to users and whosecolumns correspond to items.Our technique expands such matrices to largernumbers of rows (users), columns (items) and non zero values (interactions)while preserving key higher order statistical properties.We adapt theKronecker Graph Theory to user/item incidence matrices and show that thecorresponding fractal expansions preserve the fat-tailed distributions of userengagements, item popularity and singular value spectra of user/iteminteraction matrices.Preserving such properties is key to building largerealistic synthetic data sets which in turn can be employed reliably tobenchmark Recommender Systems and the systems employed to train them.Weprovide algorithms to produce such expansions and apply them to the MovieLens20 million data set comprising 20 million ratings of 27K movies by 138K users.The resulting expanded data set has 10 billion ratings, 2 million items and864K users in its smaller version and can be scaled upor down.A largerversion features 655 billion ratings, 7 million items and 17 million users.人工智能论文:通过分形扩展实现可扩展的现实推荐数据集(Scalable Realistic Recommendation Datasets through Fractal Expansions) fFWxmrgRIZZHP7rw.jpg
URL地址:https://arxiv.org/abs/1901.08910     ----pdf下载地址:https://arxiv.org/pdf/1901.08910    ----人工智能论文:通过分形扩展实现可扩展的现实推荐数据集(Scalable Realistic Recommendation Datasets through Fractal Expansions)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

376156679当前离线
新手上路

查看:806 | 回复:0

快速回复 返回顶部 返回列表