人工智能培训

搜索

人工智能教程:视频说明:方法,数据集和评估指标调查(Video Description: A Survey of Methods, Datasets and E

[复制链接]
liwei906666 发表于 2018-6-4 08:17:39 | 显示全部楼层 |阅读模式
liwei906666 2018-6-4 08:17:39 398 0 显示全部楼层
人工智能教程:视频说明:方法,数据集和评估指标调查(Video Description: A Survey of Methods, Datasets and Evaluation Metrics)自动视频描述对于帮助视障人士,人机交互,机器人和视频索引非常有用。近几年来,由于深度学习在计算机视觉和自然语言处理方面取得了前所未有的成功,因此在这一领域出现了大量的研究兴趣。文献中提出了大量的方法,数据集和评估措施,提出需要进行全面调查以更好地关注在这个蓬勃发展的方向进行研究。本文通过调查包括深度学习模型在内的现有技术方法来精确地回答这一需求;根据其领域,类的数量和库的大小比较基准数据集;并确定BLEU,ROUGE,METEOR,CIDEr,SPICE和WMD等各种评估指标的优缺点。我们的调查显示,视频描述研究在匹配人类表现之前还有很长的路要走,而这种不足的主要原因是双重的。首先,现有数据集不能充分代表开放视域和复杂语言结构的多样性。其次,目前的评估措施不符合人的判断。例如,相同的视频广告有非常不同但正确的描述。我们得出结论认为,评估措施以及数据集在尺寸,多样性和注释准确性方面有改进的余地,因为它们直接影响着更好的视频描述模型的发展。从算法的角度来看,描述质量的诊断是具有挑战性的,因为难以评估视觉特征的贡献水平,而不是从采用语言模型自然产生的偏见。
Automatic video description is useful for assisting the visually impaired,human computer interaction, robotics and video indexing.The past few yearshave seen a surge of research interest in this area due to the unprecedentedsuccess of deep learning in computer vision and natural language processing.Numerous methods, datasets and evaluation measures have been proposed in theliterature calling the need for a comprehensive survey to better focusresearchefforts in this flourishing direction.This paper answers exactly to this needby surveying state of the art approaches including deep learning models;comparing benchmark datasets in terms of their domain, number of classes, andrepository size;and identifying the pros and cons of various evaluationmetrics such as BLEU, ROUGE, METEOR, CIDEr, SPICE and WMD.Our survey showsthat video description research has a long way to go before it can match humanperformance and that the main reasons for this shortfall are twofold.Firstly,existing datasets do not adequately represent the diversity in open domainvideos and complex linguistic structures.Secondly, current measures ofevaluation are not aligned with human judgement.For example, the same videocan have very different, yet correct descriptions.We conclude that there is aneed for improvement in evaluation measures as well as datasets in terms ofsize, diversity and annotation accuracy because they directly influence thedevelopment of better video description models.From an algorithmic point ofview, diagnosis of the description quality is challenging because of thedifficultly to assess the level of contribution from visual features comparedto the bias that comes naturally from the language model adopted.人工智能教程:视频说明:方法,数据集和评估指标调查(Video Description: A Survey of Methods, Datasets and Evaluation Metrics) WIi7w9wHYMvqu0yi.jpg
URL地址:https://arxiv.org/abs/1806.00186     ----pdf下载地址:http://arxiv.org/pdf/1806.00186    ----人工智能教程:视频说明:方法,数据集和评估指标调查(Video Description: A Survey of Methods, Datasets and Evaluation Metrics)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

liwei906666当前离线
新手上路

查看:398 | 回复:0

快速回复 返回顶部 返回列表