人工智能培训

搜索

深度学习论文:用单峰汤普森采样解决伯努利排名第一的土匪(Solving Bernoulli Rank-One Bandits with Unimodal Tho

[复制链接]
xibao 发表于 2019-12-9 14:37:00 | 显示全部楼层 |阅读模式
xibao 2019-12-9 14:37:00 669 0 显示全部楼层
深度学习论文:用单峰汤普森采样解决伯努利排名第一的土匪(Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling)随机等级一匪(Katarya等人,(2017a,b))是一个简单的框架,用于解决等级为1的武器矩阵的后悔最小化问题。最初提出的算法被证明具有对数遗憾,但与该问题的现有下限不匹配。我们通过首先证明一级土匪是单峰土匪的一个特殊实例,然后提供一种新的分析方法来弥补这一差距,该分析最初是由Paladino等人(2017年)提出的。我们证明了UTS经常性后悔的渐近最优后悔,并且我们的主张得到了支持,并进行了仿真,显示与最新技术相比,我们的方法有了显着改进。
Stochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple frameworkfor regret minimization problems over rank-one matrices of arms.The initiallyproposed algorithms are proved to have logarithmic regret, but do not match theexisting lower bound for this problem.We close this gap by first proving thatrank-one bandits are a particular instance of unimodal bandits, and thenproviding a new analysis of Unimodal Thompson Sampling (UTS), initiallyproposed by Paladino et al (2017).We prove an asymptotically optimal regretbound on the frequentist regret of UTS and we support our claims withsimulations showing the significant improvement of our method compared to thestate-of-the-art.深度学习论文:用单峰汤普森采样解决伯努利排名第一的土匪(Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling)
URL地址:https://arxiv.org/abs/1912.03074     ----pdf下载地址:https://arxiv.org/pdf/1912.03074    ----深度学习论文:用单峰汤普森采样解决伯努利排名第一的土匪(Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

xibao当前离线
新手上路

查看:669 | 回复:0

快速回复 返回顶部 返回列表