Download PDFOpen PDF in browser

Implementation of BERT based machine learning model to extract cancer-miRNA relationship from research literature

8 pagesPublished: October 4, 2021

Abstract

In the world of information technology, text mining is a widely popular methodology to extract the desired information out of the given pile of text data. Currently there are thousands of research papers/literatures published in the field of medical science related to the study of how microRNAs (miRNAs) can assist or impede the development of various types of cancer. mirCancer is a repository offered by East Carolina University to access details of cancer-miRNA association from more than 7000 research papers retrieved using rule based text mining technique. It would be a good value if we can create a machine learning model to extract the cancer-miRNA association details from the title and abstract content of these medical research papers. In this research paper, we have proposed a machine learning model which is designed and implemented using the open source NLP framework – BERT, provided by Google, to identify the cancer-miRNA relationship in the given abstract content of the research papers. We have also prepared the dataset required to train and validate the proposed model. The model developed by us performed with an overall accuracy of 90.3% in retrieving the required information from the research literatures of the test dataset and it can be useful for retrieving cancer-miRNA association information from future research literatures.

Keyphrases: BERT, cancer, machine learning, miRNA, text mining

In: Frederick C. Harris Jr, Rui Wu and Alexander Redei (editors). Proceedings of ISCA 30th International Conference on Software Engineering and Data Engineering, vol 77, pages 33--40

Links:
BibTeX entry
@inproceedings{SEDE2021:Implementation_of_BERT_based,
  author    = {Arunprasad Sundharam and Qin Ding},
  title     = {Implementation of BERT based machine learning model to extract cancer-miRNA relationship from research literature},
  booktitle = {Proceedings of ISCA 30th International Conference on Software Engineering and Data Engineering},
  editor    = {Frederick Harris and Rui Wu and Alex Redei},
  series    = {EPiC Series in Computing},
  volume    = {77},
  pages     = {33--40},
  year      = {2021},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-7340},
  url       = {https://easychair.org/publications/paper/52lN},
  doi       = {10.29007/6p8g}}
Download PDFOpen PDF in browser