人工智能培训

搜索

论文代码开源:来自白色至上论坛的讨厌语音数据集(Hate Speech Dataset from a White Supremacy Forum)

[复制链接]
admin 发表于 2018-9-15 10:23:42 | 显示全部楼层 |阅读模式
admin 2018-9-15 10:23:42 705 0 显示全部楼层
人工智能论文代码开源:来自白色至上论坛的讨厌语音数据集(Hate Speech Dataset from a White Supremacy Forum)请注意该人工智能论文代码开源在github,大部分是python写的,框架可能是tensorflow或者pytorch。我们为搜索广告中的ClickPrediction提供了批量在线学习(OL)的统一框架。机器学习模型一旦部署,由于模型的不确定性,随着时间的推移显示出微不足道的精度和校准降级。因此,有必要定期更新模型,并进行自动更新。本文介绍了两种批量在线学习的范例,一种是通过早期停止机制逐步更新模型参数,另一种是通过近端正则化来实现。如何在这些方案中自然地在新旧数据之间进行权衡。 Wethen在理论上和经验上表明这两个看似不同的方案密切相关。通过大量实验,我们证明了我们的OL框架的实用性;两个OL方案如何相互关联,以及它们如何在新数据和历史数据之间进行权衡。然后,我们将batchOL与完整模型重新训练进行比较,并展示在线学习如何对数据问题更加健壮。我们还展示了在线学习的长期影响,初始模型在OL中的作用,更新延迟的影响,以及最终与在生产中部署realworld在线学习系统的一些实现细节和挑战相关。虽然本文主要关注点击预测在搜索广告中的应用,但我们希望这里学到的知识可以转移到其他问题领域。
We present a unified framework for Batch Online Learning (OL) for ClickPrediction in Search Advertisement.Machine Learning models once deployed, shownon-trivial accuracy and calibration degradation over time due to modelstaleness.It is therefore necessary to regularly update models, and do soautomatically.This paper presents two paradigms of Batch Online Learning, onewhich incrementally updates the model parameters via an early stoppingmechanism, and another which does so through a proximal regularization.Weargue how both these schemes naturally trade-off between old and new data.Wethen theoretically and empirically show that these two seemingly differentschemes are closely related.Through extensive experiments, we demonstrate theutility of of our OL framework;how the two OL schemes relate to each other andhow they trade-off between the new and historical data.We then compare batchOL to full model retrains, and show how online learning is more robust to dataissues.We also demonstrate the long term impact of Online Learning, the roleof the initial Models in OL, the impact of delays in the update, and finallyconclude with some implementation details and challenges in deploying a realworld online learning system in production.While this paper mostly focuses onapplication of click prediction for search advertisement, we hope that thelessons learned here can be carried over to other problem domains.论文代码开源:来自白色至上论坛的讨厌语音数据集(Hate Speech Dataset from a White Supremacy Forum) SwG9CqRRSZ89S7V4.jpg
URL地址:https://arxiv.org/abs/1809.04673v1     ----pdf下载地址:https://arxiv.org/pdf/1809.04673v1    ----         ----github下载地址:https://github.com/rishabhk108/jensen-ol    ----    论文代码开源:来自白色至上论坛的讨厌语音数据集(Hate Speech Dataset from a White Supremacy Forum)请注意该人工智能论文代码开源在github,大部分是python写的,框架可能是tensorflow或者pytorch,keras,至于具体是哪一个没有完全测试。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

admin当前离线
管理员

查看:705 | 回复:0

快速回复 返回顶部 返回列表