人工智能培训

搜索

机器学习论文:BIOBERT:用于生物医学文本挖掘的预先训练的生物医学语言表示模型(BioBERT: pre-trained biomedical langua

[复制链接]
aiwen 发表于 2019-1-28 12:08:09 | 显示全部楼层 |阅读模式
aiwen 2019-1-28 12:08:09 351 0 显示全部楼层
机器学习论文:BIOBERT:用于生物医学文本挖掘的预先训练的生物医学语言表示模型(BioBERT: pre-trained biomedical language representation model for  biomedical text mining)随着生物医学文献数量的快速增长,生物医学文本挖掘变得越来越重要。随着机器学习的进步,从生物医学文献中提取有价值的信息已经获得了研究者的普遍性,深度学习正在推动有效的生物医学文本挖掘模型的发展。然而,由于深度学习模型需要大量的训练数据,由于生物医学领域中训练数据集的规模较小,生物医学文本挖掘与深度学习通常会失败。最近对从文本语料库中学习情境化语言表示模型的研究揭示了利用大量未注释的生物医学文本语料库的可能性。我们介绍了BioBERT(生物医学文本挖掘变换器的双向编码器表示),这是一种在大规模生物医学语料库中预先训练的领域特定语言表示模型。基于BERT架构,BioBERT可以有效地将大量生物医学文本的知识转移到生物医学文本挖掘模型中。虽然BERT也显示出与先前最先进模型的竞争性表现,但BioBERT明显优于三个代表性的生物医学文本挖掘任务,包括生物医学名称识别(1.86%绝对改善),生物医学关系提取(3.33%绝对改善)和生物医学问题通过最少的任务特定架构修改来回答(9.61%绝对改进)。我们在此https URL中免费提供经过预先训练的BioBERT权重,以及此https URL中的微调模型的源代码。
Biomedical text mining has become more important than ever as the number ofbiomedical documents rapidly grows.With the progress of machine learning,extracting valuable information from biomedical literature has gainedpopularity among researchers, and deep learning is boosting the development ofeffective biomedical text mining models.However, as deep learning modelsrequire a large amount of training data, biomedical text mining with deeplearning often fails due to the small sizes of training datasets in biomedicalfields.Recent researches on learning contextualized language representationmodels from text corpora shed light on the possibility of leveraging a largenumber of unannotated biomedical text corpora.We introduce BioBERT(Bidirectional Encoder Representations from Transformers for Biomedical TextMining), which is a domain specific language representation model pre-trainedon large-scale biomedical corpora.Based on the BERT architecture, BioBERTeffectively transfers the knowledge of large amount of biomedical texts intobiomedical text mining models.While BERT also shows competitive performanceswith previous state-of-the-art models, BioBERT significantly outperforms themon three representative biomedical text mining tasks including biomedical namedentity recognition (1.86% absolute improvement), biomedical relation extraction(3.33% absolute improvement), and biomedical questionanswering (9.61% absoluteimprovement) with minimal task-specific architecture modifications.We makepre-trained weights of BioBERT freely available inthis https URL, and source codes of fine-tunedmodels in this https URL.机器学习论文:BIOBERT:用于生物医学文本挖掘的预先训练的生物医学语言表示模型(BioBERT: pre-trained biomedical language representation model for  biomedical text mining) T7ge3jlBSg6a319U.jpg
URL地址:https://arxiv.org/abs/1901.08746     ----pdf下载地址:https://arxiv.org/pdf/1901.08746    ----机器学习论文:BIOBERT:用于生物医学文本挖掘的预先训练的生物医学语言表示模型(BioBERT: pre-trained biomedical language representation model for  biomedical text mining)
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则 返回列表 发新帖

aiwen当前离线
新手上路

查看:351 | 回复:0

快速回复 返回顶部 返回列表