人工智能培训

搜索

深度学习论文:强化学习中的观察过度拟合(Observational Overfitting in Reinforcement Learning)

[复制链接]
zhq8008 发表于 2019-12-9 13:23:55 | 显示全部楼层 |阅读模式
zhq8008 2019-12-9 13:23:55 144 0 显示全部楼层
深度学习论文:强化学习中的观察过度拟合(Observational Overfitting in Reinforcement Learning)在无模型强化学习(RL)中过度拟合的一个主要部分涉及以下情况:代理人可能会根据马尔可夫决策过程(MDP)产生的观察结果将奖励与某些虚假特征错误地关联起来。我们提供了一个用于分析这种情况的通用框架,该框架用于仅通过修改MDP的观察空间来设计多个综合基准。即使基础MDP动态是固定的,当一个代理人过分适合不同的观察空间时,我们也称这种观察过拟合。我们的实验特别针对隐式正则化公开了有趣的属性,也证实了RL泛化和监督学习(SL)先前工作的结果。
A major component of overfitting in model-free reinforcement learning (RL)involves the case where the agent may mistakenly correlate reward with certainspurious features from the observations generated by the Markov DecisionProcess (MDP).We provide a general framework for analyzing this scenario,which we use to design multiple synthetic benchmarks from only modifying theobservation space of an MDP.When an agent overfits to different observationspaces even if the underlying MDP dynamics is fixed, we term this observationaloverfitting.Our experiments expose intriguing properties especially withregards to implicit regularization, and also corroborate results from previousworks in RL generalization and supervised learning (SL).深度学习论文:强化学习中的观察过度拟合(Observational Overfitting in Reinforcement Learning)
URL地址:https://arxiv.org/abs/1912.02975     ----pdf下载地址:https://arxiv.org/pdf/1912.02975    ----深度学习论文:强化学习中的观察过度拟合(Observational Overfitting in Reinforcement Learning)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

zhq8008当前离线
新手上路

查看:144 | 回复:0

快速回复 返回顶部 返回列表