人工智能培训

搜索

人工智能论文:密集关系字幕:基于关系字幕的三流网络(Dense Relational Captioning: Triple-Stream Networks fo

[复制链接]
xc133280 发表于 2019-3-15 12:30:06 | 显示全部楼层 |阅读模式
xc133280 2019-3-15 12:30:06 564 0 显示全部楼层
人工智能论文:密集关系字幕:基于关系字幕的三流网络(Dense Relational Captioning: Triple-Stream Networks for  Relationship-Based Captioning)我们在这项工作中的目标是训练一个图像字幕模型,该模型可以生成更密集和信息丰富的字幕。我们引入了“关系字幕”,这是一种新的图像字幕任务,旨在根据图像中对象之间的关系信息生成多个字幕。关系字幕是一种在多样性和信息量方面都有利的框架,可以根据关系导致图像理解。可以将部分语音(POS,即主题 - 对象 - 谓词类别)标签分配给每个英语单词。我们利用POS作为先行来指导字幕中单词的正确序列。为此,我们提出了一个多任务三线网络(MTTSNet),它由三个用于相应POS的重复单元组成,并共同执行POS预测和字幕。我们针对几个基线和竞争方法展示了由拟议模型生成的更多样化和更丰富的表示。
Our goal in this work is to train an image captioning model that generatesmore dense and informative captions.We introduce "relational captioning," anovel image captioning task which aims to generate multiple captions withrespect to relational information between objects in an image.Relationalcaptioning is a framework that is advantageous in both diversity and amount ofinformation, leading to image understanding based on relationships.Part-ofspeech (POS, i.e. subject-object-predicate categories) tags can be assigned toevery English word.We leverage the POS as a prior to guide the correctsequence of words in a caption.To this end, we propose a multi-tasktriple-stream network (MTTSNet) which consists of three recurrent units for therespective POS and jointly performs POS prediction and captioning.Wedemonstrate more diverse and richer representations generated by the proposedmodel against several baselines and competing methods.人工智能论文:密集关系字幕:基于关系字幕的三流网络(Dense Relational Captioning: Triple-Stream Networks for  Relationship-Based Captioning) T9gqZgRQwSgNLNGQ.jpg
URL地址:https://arxiv.org/abs/1903.05942     ----pdf下载地址:https://arxiv.org/pdf/1903.05942    ----人工智能论文:密集关系字幕:基于关系字幕的三流网络(Dense Relational Captioning: Triple-Stream Networks for  Relationship-Based Captioning)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

xc133280当前离线
新手上路

查看:564 | 回复:0

快速回复 返回顶部 返回列表