Download PDFOpen PDF in browser

Application of Machine Learning in Analysis of Transcriptomic Data Derived from Next Generation Sequencing

EasyChair Preprint 1985

5 pagesDate: November 18, 2019

Abstract

Tobacco Mosaic Virus, the most studied plant virus, could infect over 100 species of plants and over 550 species of flowering plants, causing enormous loss of economy at home and abroad. Microarray, an important analytic tool of Genomics and Genetics, enables researchers to analyze massive gene expression simultaneously. To find out the genes related to replication of the Tobacco Mosaic Virus, the material of this research is gene expression of the cell of Arbidopsis infected by Tobacco Mosaic Virus, which recorded in 5 time points (30 min, 4hr, 6hr, 18hr and 24hr) and made by Next Generation Sequencing. The research analyzes the time-series raw data and adapts the Fast Correlation-Based Filter (FCBF) and the Wrapper algorithms for gene selection. The selected genes are validated by the C4.5 algorithm and Multi-Layer Perceptron. Results show that genes selected by Wrapper algorithm with average accuracy 75%, average true positive rate (classified accuracy of control group) 77.5%, true negative rate (classified accuracy of experiment group) 72.5%, average F-measure 74.85% and average AUC 07965, perform better overall than genes selected by other algorithms. 

Keyphrases: Tobacco mosaic virus, arbidopsis, machine learning, next generation sequencing

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:1985,
  author    = {Meng-Hsiun Tsai and Shien-Chung Huang and Hsin-Hung Yeh and An-Yuan Chu},
  title     = {Application of Machine Learning in Analysis of Transcriptomic Data Derived from Next Generation Sequencing},
  howpublished = {EasyChair Preprint 1985},
  year      = {EasyChair, 2019}}
Download PDFOpen PDF in browser