| ||||
| ||||
![]() Title:End-to-end Transformer-based architecture for disease prediction from metagenomic data Conference:JOBIM2025 Tags:Disease classification, DNA sequence embedding, Gut microbiome, Metagenomics and Transformer models Abstract: Microbial ecosystems constitute complex yet information-rich environments whose characterization is crucial for understanding host health and disease. Among them, the human gut microbiome has emerged as a key ”super-integrator”, owing to its dense interactions with host physiology and its established associations with a wide spectrum of pathologies. Driven by advances in high-throughput sequencing technologies and the continuous decline in associated costs, metagenomic studies have expanded exponentially, generating massive amounts of sequencing data and opening new avenues for data-driven disease. modeling. Conventional approaches to microbiome analysis predominantly rely on the alignment of DNA sequencing reads against reference databases to infer microbial composition at the species level. While effective, these methods are inherently constrained by reference bias and limited taxonomic resolution. Recent advances in artificial intelligence—particularly in Natural Language Processing (NLP)—offer new methodological perspectives for metagenomic data representation. In this study, we present MetagenBERT, a Transformer-based framework that relies on the foundational models DNABERT-2 and DNABERT-S for the embedding of DNA sequencing reads. Our approach encodes gut microbiome metagenome in a species-agnostic manner, enabling direct downstream application to disease classification tasks. We show that MetagenBERT attains similar performance to state-of-the-art abundance-based models for cirrhosis prediction and surpasses them in the more challenging context of type 2 diabetes detection. Furthermore, we introduce an alternative representation of metagenomic profiles based on read-level embeddings aggregated into abundance vectors, demonstrating their complementarity with conventional species-level abundance metrics. End-to-end Transformer-based architecture for disease prediction from metagenomic data ![]() End-to-end Transformer-based architecture for disease prediction from metagenomic data | ||||
Copyright © 2002 – 2025 EasyChair |