人工智能培训

搜索

人工智能论文:CONFUSION2VEC:向具有表征歧义的富集向量空间词表示(Confusion2Vec: Towards Enriching Vector S

[复制链接]
hhtonyhh 发表于 5 天前 | 显示全部楼层 |阅读模式
hhtonyhh 5 天前 201 0 显示全部楼层
人工智能论文:CONFUSION2VEC:向具有表征歧义的富集向量空间词表示(Confusion2Vec: Towards Enriching Vector Space Word Representations with  Representational Ambiguities)单词矢量表示是自然语言处理(NLP)和人机交互的关键部分。在本文中,我们提出了一种新的wordvector表示,Confusion2Vec,源于人类语音生成和感知,编码表示歧义。人类使用声学相似性线索和上下文线索来解码信息,并且我们专注于包含两种信息源的模型。声学的表现模糊性表现在单词混淆中,通常由人和机器通过语境来解决。在声学感知的各个领域中可以出现一系列代表性歧义,例如形态变换,对机器翻译等NLP任务的释义等。在这项工作中,我们提出了应用于自动语音识别(ASR)的案例研究,其中单词混淆与声学相似性有关。我们提出了几种训练声学感知相似性表示歧义的技术。我们称之为Confusion2Vec,并从ASR混淆网络或类似格子的结构中学习无监督生成的数据。除了语义 - 句法和单词相似度评估之外,还制定了对Confusion2Vec进行适当评估以评估声学相似性。 TheConfusion2Vec能够有效地模拟单词混淆,而不会影响语义 - 句法单词关系,从而有效地利用额外的任务相关模糊信息丰富单词向量空间。我们使用嵌入和主成分分析对二维混淆2Vec空间进行直观探索。涉及语义,句法和声学关系。通过与ASR误差校正相关的小样本,证明了Confusion2Vec在晶格中存在的不确定性的可能性。
Word vector representations are a crucial part of Natural Language Processing(NLP) and Human Computer Interaction.In this paper, we propose a novel wordvector representation, Confusion2Vec, motivated from the human speechproduction and perception that encodes representational ambiguity.Humansemploy both acoustic similarity cues and contextual cues to decode informationand we focus on a model that incorporates both sources of information.Therepresentational ambiguity of acoustics, which manifests itself in wordconfusions, is often resolved by both humans and machines through contextualcues.A range of representational ambiguities can emerge in various domainsfurther to acoustic perception, such as morphological transformations,paraphrasing for NLP tasks like machine translation etc. In this work, wepresent a case study in application to Automatic Speech Recognition (ASR),where the word confusionsare related to acoustic similarity.We presentseveral techniques to train an acoustic perceptual similarity representationambiguity.We term this Confusion2Vec and learn on unsupervised-generated datafrom ASR confusion networks or lattice-like structures.Appropriate evaluationsfor the Confusion2Vec are formulated for gauging acoustic similarity inaddition to semantic-syntactic and word similarity evaluations.TheConfusion2Vec is able to model word confusions efficiently, withoutcompromising on the semantic-syntactic word relations, thus effectivelyenriching the word vector space with extra task relevant ambiguity information.We provide an intuitive exploration of the 2-dimensional Confusion2Vec spaceusing Principal Component Analysis of the embedding andrelate to semantic,syntactic and acoustic relationships.The potential of Confusion2Vec in theutilization of uncertainty present in lattices is demonstrated through smallexamples relating to ASR error correction.人工智能论文:CONFUSION2VEC:向具有表征歧义的富集向量空间词表示(Confusion2Vec: Towards Enriching Vector Space Word Representations with  Representational Ambiguities) D1qQQ6R0NWVn5C90.jpg
URL地址:https://arxiv.org/abs/1811.03199     ----pdf下载地址:https://arxiv.org/pdf/1811.03199    ----人工智能论文:CONFUSION2VEC:向具有表征歧义的富集向量空间词表示(Confusion2Vec: Towards Enriching Vector Space Word Representations with  Representational Ambiguities)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

hhtonyhh当前离线
新手上路

查看:201 | 回复:0

快速回复 返回顶部 返回列表