人工智能培训

搜索

人工智能论文:动态BOLTZMANN SOFTMAX更新的强化学习(Reinforcement Learning with Dynamic Boltzmann

[复制链接]
wynrefer 发表于 2019-3-15 12:40:07 | 显示全部楼层 |阅读模式
wynrefer 2019-3-15 12:40:07 204 0 显示全部楼层
人工智能论文:动态BOLTZMANN SOFTMAX更新的强化学习(Reinforcement Learning with Dynamic Boltzmann Softmax Updates)价值函数估计是强化学习中的一项重要任务,即预测。在Q学习中用于预测的常用运算符是硬最大运算符,其总是根据当前估计承诺最大动作值。这种“硬”更新方案导致纯粹的开发,并且可能由于随机环境中的噪声而导致不良行为。因此,平衡探索和开发价值函数估计至关重要。 Boltzmann softmax算子在探索潜在的动作值方面具有更大的能力。但是,它不满足非扩展属性,并且即使在值迭代中它的直接使用也可能无法收敛。在本文中,我们提出用值函数估计中的动态Boltzmann softmax(DBS)算子更新值函数,在规划和学习的设置中具有良好的收敛性。此外,我们证明动态Boltzmann softmax更新可以消除由此引入的估计现象。最大的硬操作员。 GridWorld上的实验结果表明,DBS算子能够在价值函数估计中实现勘探和开发之间的收敛和更好的权衡。最后,我们通过推广深度Q网络中的动态Boltzmann softmax更新来提出DBS-DQN算法,其在49个Atari游戏中的40个中基本上优于DQN。
Value function estimation is an important task in reinforcement learning,i.e., prediction.The commonly used operator for prediction in Q-learning isthe hard max operator, which always commits to the maximum action-valueaccording to current estimation.Such `hard' updating scheme results in pureexploitation and may lead to misbehavior due to noise in stochasticenvironments.Thus, it is critical to balancing exploration and exploitation invalue function estimation.The Boltzmann softmax operator has a greatercapability in exploring potential action-values.However, it does not satisfythe non-expansion property, and its direct use may fail to converge even invalue iteration.In this paper, we propose to update the value function withdynamic Boltzmann softmax (DBS) operator in value function estimation, whichhas good convergence property in the setting of planning and learning.Moreover, we prove that dynamic Boltzmann softmax updates can eliminate theoverestimation phenomenon introduced bythe hard max operator.Experimentalresults on GridWorld show that the DBS operator enables convergence and abetter trade-off between exploration and exploitation in value functionestimation.Finally, we propose the DBS-DQN algorithm by generalizing thedynamic Boltzmann softmax update in deep Q-network, which outperforms DQNsubstantially in 40 out of 49 Atari games.人工智能论文:动态BOLTZMANN SOFTMAX更新的强化学习(Reinforcement Learning with Dynamic Boltzmann Softmax Updates) ro64pPzm0PM6pTG4.jpg
URL地址:https://arxiv.org/abs/1903.05926     ----pdf下载地址:https://arxiv.org/pdf/1903.05926    ----人工智能论文:动态BOLTZMANN SOFTMAX更新的强化学习(Reinforcement Learning with Dynamic Boltzmann Softmax Updates)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

wynrefer当前离线
新手上路

查看:204 | 回复:0

快速回复 返回顶部 返回列表