人工智能培训

搜索

人工智能论文:窦迪朱的组合Q学习(Combinational Q-Learning for Dou Di Zhu)

[复制链接]
wjx003006 发表于 2019-1-28 11:30:35 | 显示全部楼层 |阅读模式
wjx003006 2019-1-28 11:30:35 298 0 显示全部楼层
人工智能论文:窦迪朱的组合Q学习(Combinational Q-Learning for Dou Di Zhu)深层强化学习(DRL)近年来受到了很多关注,并且已经被证明能够在Atari游戏和Go等人或者超人的水平上进行游戏。然而,假设这些游戏具有少量固定数量的动作并且可以用简单的CNN网络训练。在本文中,我们研究了一类特殊的亚洲流行纸牌游戏,叫做斗地珠,其中两个对等的代理人群体必须在每个步骤考虑多种卡片组合,导致大量的动作。我们提出了一种处理组合动作的新方法,我们称之为组合Q学习(CQL)。我们使用两阶段网络来减少操作空间,并利用顺序不变的最大池操作来提取原始操作之间的关系。结果表明,我们的方法胜过最先进的Q学习和A3C等最先进的方法。我们开发了一个易于使用的纸牌游戏环境,并对所有代理人进行对抗,只需知道游戏规则并验证我们的代理商与人类比较。我们的代码可以在线获取所有报告的结果。
Deep reinforcement learning (DRL) has gained a lot of attention in recentyears, and has been proven to be able to play Atari games and Go at or abovehuman levels.However, those games are assumed to have a small fixed number ofactions and could be trained with a simple CNN network.In this paper, we studya special class of Asian popular card games called Dou Di Zhu, in which twoadversarial groups of agents must consider numerous card combinations at eachtime step, leading to huge number of actions.We propose a novel method tohandle combinatorial actions, which we call combinational Q-learning (CQL).Weemploy a two-stage network to reduce action space and also leverageorder-invariant max-pooling operations to extract relationships betweenprimitive actions.Results show that our method prevails over state-of-the artmethods like naive Q-learning and A3C.We develop an easy-to-use card gameenvironments and train all agents adversarially from sractch, with onlyknowledge of game rules and verify that our agents are comparative to humans.Our code to reproduce all reported results will be available online.人工智能论文:窦迪朱的组合Q学习(Combinational Q-Learning for Dou Di Zhu) mG95054g3kzgu5DB.jpg
URL地址:https://arxiv.org/abs/1901.08925     ----pdf下载地址:https://arxiv.org/pdf/1901.08925    ----人工智能论文:窦迪朱的组合Q学习(Combinational Q-Learning for Dou Di Zhu)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

wjx003006当前离线
新手上路

查看:298 | 回复:0

快速回复 返回顶部 返回列表