人工智能培训

搜索

深度学习论文:优化同时打击随机和对抗性半匪(Beating Stochastic and Adversarial Semi-bandits Optimally

[复制链接]
fanjz 发表于 2019-1-28 11:23:03 | 显示全部楼层 |阅读模式
fanjz 2019-1-28 11:23:03 118 0 显示全部楼层
深度学习论文:优化同时打击随机和对抗性半匪(Beating Stochastic and Adversarial Semi-bandits Optimally and  Simultaneously)我们开发了第一个通用的半强盗算法,同时为随机环境提供$ \ mathcal {O}(\ log T)$后悔,并且在没有政权知识的情况下为对抗环境提供$ \ mathcal {O}(\ sqrt {T})$ regret或轮数$ T $。我们界限中的主要问题依赖常数不仅在最前面研究的最坏情况下是最优的,而且对于半匪问题的两个具体实例也是最优的。我们的算法和分析扩展了(Zimmert& Seldin,2019)最近的工作,针对多臂强盗的特殊情况,但重要的是需要一种专门为半强盗设计的新型混合正则化器。合成数据的实验结果表明我们的算法确实在不同的环境中均匀地执行。我们最终将我们的结果初步扩展到完整的强盗反馈。
We develop the first general semi-bandit algorithm that simultaneouslyachieves $\mathcal{O}(\log T)$ regret for stochastic environments and$\mathcal{O}(\sqrt{T})$ regret for adversarial environments without knowledgeof the regimeor the number of rounds $T$.The leading problem-dependentconstants of our bounds are not only optimal in some worst-case sense studiedpreviously, but also optimal for two concrete instances of semi-banditproblems.Our algorithm and analysis extend the recent work of (Zimmert &Seldin, 2019) for the special case of multi-armed bandit, but importantlyrequires a novel hybrid regularizer designed specifically for semi-bandit.Experimental results on synthetic data show that our algorithm indeedperformswell uniformly over different environments.We finally provide a preliminaryextension of our results to the full bandit feedback.深度学习论文:优化同时打击随机和对抗性半匪(Beating Stochastic and Adversarial Semi-bandits Optimally and  Simultaneously) m0i7qrkozHG7ORit.jpg
URL地址:https://arxiv.org/abs/1901.08779     ----pdf下载地址:https://arxiv.org/pdf/1901.08779    ----深度学习论文:优化同时打击随机和对抗性半匪(Beating Stochastic and Adversarial Semi-bandits Optimally and  Simultaneously)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

fanjz当前离线
新手上路

查看:118 | 回复:0

快速回复 返回顶部 返回列表