Download PDFOpen PDF in browser

Classifying Protein Families with Learned Compressed Representations

11 pagesPublished: May 1, 2023

Abstract

Classifying proteins into families is an important task when studying newly discovered proteins. If we can identify the family a protein belongs to, we can predict features without knowing the exact structure of such a protein.
However, this grouping process is challenging. We propose a two-stage algorithm that classifies proteins into families by combining a dimensionality reduction technique using a variational autoencoder with learned fingerprint representations using a Convolutional Neural Network (CNN). Our models use fewer parameters than existing methods but perform better, with our variational autoencoder achieving 94% accuracy in reconstructing the most common amino acid in a sequence alignment, and the neural network provides 98-100% accuracy in classifying protein families. We developed a software framework to access our algorithms. All code and data are publicly available at https://github.com/ramindehghanpoor/CLI.

Keyphrases: CNN, machine learning, neural network, Protein family classification, VAE

In: Hisham Al-Mubaid, Tamer Aldwairi and Oliver Eulenstein (editors). Proceedings of International Conference on Bioinformatics and Computational Biology (BICOB-2023), vol 92, pages 47--57

Links:
BibTeX entry
@inproceedings{BICOB-2023:Classifying_Protein_Families_with,
  author    = {Ramin Dehghanpoor and Fatemeh Afrasiabi and Charles Fogel and Tung Dao and Suman Gautam and Aanab Nehela and Ahmad Nehela and Daniel Haehn and Nurit Haspel},
  title     = {Classifying Protein Families with Learned Compressed Representations},
  booktitle = {Proceedings of International Conference on Bioinformatics and Computational Biology (BICOB-2023)},
  editor    = {Hisham Al-Mubaid and Tamer Aldwairi and Oliver Eulenstein},
  series    = {EPiC Series in Computing},
  volume    = {92},
  pages     = {47--57},
  year      = {2023},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-7340},
  url       = {https://easychair.org/publications/paper/VR1H},
  doi       = {10.29007/qzzf}}
Download PDFOpen PDF in browser