Extracting Outcomes from Articles Reporting Randomized Controlled Trials Using Pre-Trained Deep Language Representations
EasyChair Preprint 2940
13 pages•Date: March 11, 2020Abstract
Objective:
Outcomes are the variables monitored during clinical trials to assess the impact of the intervention studied on the subjects’ health. Automatic extraction of trial outcomes is essential for automating systematic review process and for checking the completeness and coherence of reporting to avoid bias and spin. In this work, we provide an overview of the state-of-the art for outcome extraction, introduce a new freely available corpus with annotations for two types of outcomes – declared (primary) and reported – and present a deep learning approach to outcome extraction.
Dataset:
We manually annotated a corpus of 2,000 sentences with declared (primary) outcomes and 1,940 sentences with reported outcomes.
Methods:
We used deep neural word embeddings derived from the publicly available BERT (Bidirectional Encoder Representations from Transformers) pre-trained language representations to extract trial outcomes from the section defining the primary outcome and the section reporting reporting the results for an outcome. We compared a simple fine-tuning approach and an approach using CRF and Bi-LSTM. We assessed the performance of several pre-trained language models: general domain (BERT), biomedical (BioBERT) and scientific (SciBERT).
Results:
Our algorithm achieved the token-level F-measure of 88.52% for primary outcomes and 79,42% for reported outcomes.
Conclusion:
Fine-tuning of language models pre-trained on large domain-specific corpora show operational performance for automatic outcome extraction.
Keyphrases: Deep Neural Networks, Natural Language Processing, Outcome extraction, Pre-trained Language Representations, randomized controlled trials