人工智能培训

搜索

人工智能论文:动态等距和LSTM和GRU的平均场理论(Dynamical Isometry and a Mean Field Theory of LSTMs a

[复制链接]
liuye 发表于 2019-1-28 11:48:11 | 显示全部楼层 |阅读模式
liuye 2019-1-28 11:48:11 216 0 显示全部楼层
人工智能论文:动态等距和LSTM和GRU的平均场理论(Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs)训练关于长序列任务的递归神经网络(RNN)由于信号在向前或向后传播通过网络时指数爆炸或信号消失而产生困难。已经提出许多技术来改善这些问题,包括各种算法和架构修改。两个最成功的RNN架构,即LSTM和GRU,确实对thevanilla RNN细胞进行了适度的改进,但是当训练非常长的序列时它们仍然存在不稳定性。在这项工作中,我们开发了LSTM和GRU中的信号传播的平均场理论,使我们能够计算出信号传播的时间尺度以及状态到状态雅可比的光谱特性。通过根据初始化超参数优化这些量,我们推导出一种消除或减少训练不稳定性的新型初始化方案。我们证明了我们的初始化方案对多个序列任务的有效性,在这个任务上,它可以实现成功的训练,而标准初始化要么完全失败,要么慢几个数量级。我们还观察到使用这种新的初始化对广义化性能的有益影响。
Training recurrent neural networks (RNNs) on long sequence tasks is plaguedwith difficulties arising from the exponential explosion or vanishing ofsignals as they propagate forward or backward through the network.Manytechniques have been proposed to ameliorate these issues, including variousalgorithmic and architectural modifications.Two of the most successful RNNarchitectures, the LSTM and the GRU, do exhibit modest improvements overvanilla RNN cells, but they still suffer from instabilities when trained onvery long sequences.In this work, we develop a mean field theory of signalpropagation in LSTMs and GRUs that enables us to calculate the time scales forsignal propagation as well as the spectral properties of the state-to-stateJacobians.By optimizing these quantities in terms of the initializationhyperparameters, we derive a novel initialization scheme that eliminates orreduces training instabilities.We demonstrate the efficacy of ourinitialization scheme on multiple sequence tasks, on which it enablessuccessful training while a standard initialization either fails completely oris orders of magnitude slower.We also observe a beneficial effect ongeneralization performance using this new initialization.人工智能论文:动态等距和LSTM和GRU的平均场理论(Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs) h7c6EHmQ6QoOXG7c.jpg
URL地址:https://arxiv.org/abs/1901.08987     ----pdf下载地址:https://arxiv.org/pdf/1901.08987    ----人工智能论文:动态等距和LSTM和GRU的平均场理论(Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

liuye当前离线
新手上路

查看:216 | 回复:0

快速回复 返回顶部 返回列表