Program for Tuesday, July 8th

PROGRAM FOR TUESDAY, JULY 8TH

Days:

next day

all days

View: session overview talk overview

12:30-13:30 Session 1: Registration, EINSEIRB

Location: Main hall

13:30-14:00 Session 2: Welcome & opening remarks

Location: Main amphi

14:00-15:00 Session 3: Keynote 1: Anamaria Necsulea

Deciphering the genomic basis of convergent phenotypic evolution

Chair:

Guy Perrière

Location: Main amphi

14:00

Anamaria Necsulea

Deciphering the genomic basis of convergent phenotypic evolution

ABSTRACT. Biology has recently undergone a fundamental transformation, powered by the development of sensitive molecular techniques combined with high-throughput sequencing. Major technical breakthroughs include the ability to sequence complete genomes, to finely quantify gene activity and to study its control mechanisms. Applying these techniques across species gives us a unique opportunity to study the evolution of genomic functions, and thus to better understand the mechanisms that underlie phenotypic evolution. Functional evolutionary genomics studies can bring insights into the selective pressures and molecular mechanisms that drive the emergence (or loss) of biological functions and of phenotypes. Here, we illustrate how comparative genomics approaches can bring insights into the genomic basis of convergent phenotypic evolution, in birds. The avian clade displays a spectacular diversity of phenotypes. Numerous instances of convergent phenotypic evolution are known in birds, such as the convergent loss of flight [1] or the parallel gain of vocal learning [2]. In this presentation, we will focus on one peculiar case of convergent morphological evolution : the loss of the intromittent male phallus [3]. Although an intromittent phallus was likely present in the ancestor of all amniotes, this organ was reduced or entirely lost in multiple avian lineages, including the major Neoaves clade and the Phasianidae family [3]. The evolutionary processes that led to phallus reduction or loss are still unclear, as are the genomic consequences of this major phenotypic change. Taking advantage of the availability of hundreds of avian genomic sequences, we have performed large-scale evolutionary analyses of protein-coding gene sequences and of non-coding regulatory elements, searching for genomic changes that occur in parallel with phenotypic changes. We found that hundreds of protein-coding genes and non-coding regulatory elements underwent an acceleration of their rate of evolution following this major phenotypic change. We also identify numerous gene expression differences between bird species that have retained the intromittent phallus and species that have lost this organ. While we cannot claim that these changes in expression patterns and regulatory programs are causal to the loss of the phallus, our findings illustrate the genome-wide consequences of this major phenotypic change.

References

1. Pan S, Lin Y, Liu Q, Duan J, Lin Z, Wang Y, et al. Convergent genomic signatures of flight loss in birds suggest a switch of main fuel. Nat Commun. 21 juin 2019;10(1):2756.

2. Pfenning AR, Hara E, Whitney O, Rivas MV, Wang R, Roulhac PL, et al. Convergent transcriptional specializations in the brains of humans and song-learning birds. Science. 12 déc 2014;346(6215):1256846.

3. Herrera AM, Shuster SG, Perriton CL, Cohn MJ. Developmental basis of phallus reduction during bird evolution. Curr Biol CB. 17 juin 2013;23(12):1065‑74.

15:00-16:00 Session 4A: Statistics, Machine Learning, and AI for Biology and Health

Chair:

Victoria Bourgeais

Location: Main amphi

15:00	Houria Braikia, Sana Ben Hamida and Marta Rukoz Explainable AI for Marine Ecological Quality Prediction: Integrating Microbiome Data, Metadata, and Diversity PRESENTER: Houria Braikia ABSTRACT. Assessing ecological quality (EQ) is crucial for marine biodiversity monitoring. With the advent of High-Throughput Sequencing technologies, metabarcoding has enabled large-scale microbial community analysis through Operational Taxonomic Unit (OTU) tables, providing an alternative for EQ assessment. Machine learning (ML) models have been successfully applied for this task, but they often treat microbial abundance as the sole predictor, overlooking environmental meta-data (e.g., pH, salinity, temperature) and diversity indices (alpha and beta diversity). This study integrates metadata and diversity indices into an explainable ML framework for EQ prediction. Using SHapley Additive Explanations (SHAP), we assess the contribution of these features to model predictions across five genetic markers (V1V2, V3V4, V4, 37F, and V9). Our results highlight marker-dependent feature importance, demonstrating that while OTU-based models remain dominant, incorporating metadata improves accuracy for certain markers. This work enhances interpretability in AI-driven biomonitoring, fostering more reliable marine ecosystem assessments.
15:20	Minh Ngoc Vu, Hoang Ha Nguyen, Antoine Toffano and Pierre Larmande Evaluating deep learning models for plant protein function prediction PRESENTER: Minh Ngoc Vu ABSTRACT. Predicting the functions of proteins remains a critical yet challenging task in computational biology. Advances in high-throughput sequencing, the expansion of protein databases, and the continuous development of artificial intelligence have led to the emergence of many computational methods dedicated to protein function prediction. In this study, we evaluated the performance of four state-of-the-art models - DeepGOPlus, DeepGraphGO, DeepGOZero, and DeepGOSE - using experimentally annotated proteins from the UniProt-KB/Swiss-Prot database. We also trained and tested these models on species-specific datasets from Arabidopsis thaliana and Oryza sativa to investigate their potential and applicability in plant protein studies. Our results showed that DeepGOPlus consistently achieved the best evaluation scores across all datasets. DeepGOSE and DeepGOZero performed comparably and only marginally outperformed DeepGraphGO in certain training attempts. Further analysis revealed that dataset stratification into training, validation, and testing sets introduced variations in Gene Ontology annotation specificity, which may have influenced model performance.
15:40	Julie Cartier, Chloé Agathe Azencott, Adeline Fermanian, Johanna Lagoas and Florian Massip Variable selection in transcriptomics data using knockoffs in a classification framework PRESENTER: Julie Cartier ABSTRACT. The emergence of new sequencing technologies has facilitated the acquisition of large amounts of biological data, which has proven to be a useful tool for better understanding biological systems. One way to take advantage of the potential of sequencing data is to use them to identify the relationship between biological units (e.g. genes) and phenotypical characteristics (e.g. disease outcomes). This question, formulated as a variable selection problem, remains difficult because of the size of the data and their correlation structure. To address these challenges, we studied the applicability of the knockoff (KO) procedure focusing on transcriptomic data in a classification setting. Introduced by Candès et al. in 2015, the KO variable selection procedure has shown promising results on real biological data. This method seeks to identify the truly important predictors by overcoming the correlation structure between variables while controlling the false discovery rate even in high dimensional settings. We conducted an extensive simulation study using real data to evaluate the relevance of recent methods in the context of high-dimensional classification. We also analyzed the benefits of a KO aggregation scheme to mitigate the effect of stochasticity, which is intrinsic to the KO procedure. In addition, we studied the stability of the KO framework as a measure of the reliability of variable selection. Finally, we applied the KO framework to real transcriptomic data.

15:00-16:00 Session 4B: Algorithms and data structures for sequences

Chair:

Raphael Mourad

Location: amphi D

15:00	Simon Herman, Guillaume Bouvier, Christos Papadopoulos, Paul Roginski, Olivier Lespinet and Anne Lopes Structural Space of Microproteins with Protein Language Models PRESENTER: Simon Herman ABSTRACT. Microproteins—defined as proteins comprising fewer than 100 amino acids—have long been overlooked due to challenges in detection arising from their small size and low expression. However, emerging evidence reveals that they are key regulators of translation, development, metabolism, and cellular stress responses, and are implicated in diseases such as cancer and cardiovascular disorders . Recent advances in ribosome profiling and mass spectrometry have uncovered a vast transient microproteome originating from non-canonical open reading frames (ORFs) overlapping annotated genes or located in UTRs or intergenic regions. Although many of these sequences lack signs of selection and are evolutionarily transient, some impact fitness and may represent early stages of de novo gene formation. In this study, we introduce a novel computational framework that employs deep learning–based protein language models (pLMs) to infer the structural properties of microproteins without relying on evolutionary information. Using embeddings from ProtT5-XL, we analyzed thousands of microproteins from annotated sources (e.g., Uniprot) and potential iORF-encoded microproteins across diverse eukaryotic genomes with varying GC content. Using simple dimensionality reduction of these embeddings, we constructed a comprehensive map of the microprotein structural landscape, confirming that amino acid composition and residue ordering are the primary determinants of structure. Then we fine-tuned a classifier that demonstrated robust performance in predicting structural categories, capturing additional signals beyond those revealed by embedding dimensionality reduction. Our results indicate that annotated microproteins occupy narrow, well-defined regions of the structural space, whereas iORF-encoded microproteins exhibit a broader, GC-dependent distribution. Specifically, low-GC iORFs are biased toward encoding transmembrane peptides, while high-GC iORFs predominantly yield disordered proteins. Moreover, certain iORF sequences fall into a “void” region not populated by canonical proteins, suggesting a region that might have been counter-selected by evolution. Taken together, these findings challenge prior knowledge of protein-coding potential and offer fresh insights into the potential evolutionary emergence and structural diversity of microproteins. Our results characterize the structural landscapes of two distinct yet interconnected microproteomes—the annotated coding microproteome and the unannotated, noncoding counterpart—and lay the groundwork for new hypotheses about the molecular evolution of coding sequences from noncoding origins.
15:20	Gaspar Roy, Eugeni Belda, Edi Prifti, Yann Chevaleyre and Jean-Daniel Zucker MetagenBERT: a Transformer Architecture using Foundational Read Embedding Models to enhance Disease Classification PRESENTER: Gaspar Roy ABSTRACT. Microbial ecosystems constitute complex yet information-rich environments whose characterization is crucial for understanding host health and disease. Among them, the human gut microbiome has emerged as a key ”super-integrator”, owing to its dense interactions with host physiology and its established associations with a wide spectrum of pathologies. Driven by advances in high-throughput sequencing technologies and the continuous decline in associated costs, metagenomic studies have expanded exponentially, generating massive amounts of sequencing data and opening new avenues for data-driven disease. modeling. Conventional approaches to microbiome analysis predominantly rely on the alignment of DNA sequencing reads against reference databases to infer microbial composition at the species level. While effective, these methods are inherently constrained by reference bias and limited taxonomic resolution. Recent advances in artificial intelligence—particularly in Natural Language Processing (NLP)—offer new methodological perspectives for metagenomic data representation. In this study, we present MetagenBERT, a Transformer-based framework that relies on the foundational models DNABERT-2 and DNABERT-S for the embedding of DNA sequencing reads. Our approach encodes gut microbiome metagenome in a species-agnostic manner, enabling direct downstream application to disease classification tasks. We show that MetagenBERT attains similar performance to state-of-the-art abundance-based models for cirrhosis prediction and surpasses them in the more challenging context of type 2 diabetes detection. Furthermore, we introduce an alternative representation of metagenomic profiles based on read-level embeddings aggregated into abundance vectors, demonstrating their complementarity with conventional species-level abundance metrics.
15:40	Emile Benoist, Géraldine Jean, Hélène Rogniaux, Guillaume Fertin and Dominique Tessier SpecPeptidOMS Directly and Rapidly Aligns Mass Spectra on Whole Proteomes and Identiﬁes Peptides That Are Not Necessarily Tryptic: Implications for Peptidomics PRESENTER: Géraldine Jean ABSTRACT. SpecPeptidOMS directly aligns peptide fragmentation spectra to whole and undigested protein sequences. The algorithm was specifically and initially designed for peptidomics, where the aim is to identify peptides that do not result from the hydrolysis of a known protein and therefore, whose termini cannot be predicted. Thus, SpecPeptidOMS can perform alignments starting and ending anywhere in the protein sequence. The underlying computational method of SpecPeptidOMS, which is based on a dynamic programming approach, was drastically optimized. As a result, SpecPeptidOMS can process around 12,000 spectra per hour on an ordinary laptop, with alignment performed against the entire human proteome. The performance of SpecPeptidOMS was first evaluated on a publicly available data set of (nontryptic) synthetic mass spectra. Accuracy was estimated by considering the results obtained by MaxQuant on the same data set as the “ground truth”. A second series of tests on a larger, well-known proteomics data set (HEK293) highlighted SpecPeptidOMS’ additional ability to search for open modifications, a feature of interest in peptidomics but also more broadly in conventional proteomics. SpecPeptidOMS is open-source, cross-platform (written in Java), and freely available.

15:00-16:00 Session 4C: Evolution, phylogeny and comparative genomics

Chair:

Sophie Abby

Location: Amphi E

15:00	Julien Pichon, Lauriane Cacheux, Manel Ait El Hadj, Axel Jensen, Katerina Guschanski, Loïc Ponger and Christophe Escudé Evolutionary dynamics of centromeric DNA in guenon might end an old anthropocentric dogma PRESENTER: Julien Pichon ABSTRACT. The characterization of centromeric DNA has recently seen a major breakthrough thanks to advances in sequencing technologies, reaching complete resolution for several human cell lines and other primates through Telomere-to-Telomere (T2T) assemblies. New bioinformatic tools have been developed to describe the main DNA type composing the centromere in most Primates, alpha satellite (AS) DNA. However, both the tools and the resulting annotations often rely on the human genome to interpret the evolutionary dynamics of other species. One example is the use of human AS families to annotate AS sequences in other species, despite the rapid evolution of these sequences. Moreover, the current model of alpha satellite evolution, which is primarily based on observations in humans, needs to be tested against data from other primates. As T2T sequencing remains costly and technically demanding, we propose an alternative approach that directly leverages long-read data. In this study, we identified AS-containing reads from long-read sequencing of a Cercopithecini species, Cercopithecus cephus. Through a de novo annotation of these sequences, we identified two families that we previously detected in two other Cercopithecini species, as well as a new family, C3, which is the least abundant but also the most ancient. These three families also appear to be spatially segmented across the genome, corresponding to distinct homogeneous evolutionary layers. To investigate the organization of alpha satellite monomers within these layers, we developed HORfinder, a tool designed to detect higher-order repeat (HOR) structures without relying on predefined family classifications. Unlike humans, C. cephus exhibits a predominantly monomeric organization of its AS, with only 0.4% of sequences forming HORs. Interestingly, these HORs are mainly found in the oldest evolutionary layers, suggesting a potential transition from HOR to monomeric organization. These findings support the idea that HOR organization is not a unique or highly specialized structure and could arise independently in multiple clades.
15:20	Maëlle Daunesse, Elise Parey, Diego Villar and Camille Berthelot Natural selection acting on gene expression and regulation in mole-rats PRESENTER: Maëlle Daunesse ABSTRACT. Understanding the genetic basis of phenotypic adaptations poses a significant challenge in evolutionary genomics. Despite the morphological and physiological diversity in mammalian traits, their coding genomes exhibit a high degree of conservation, implying that changes in gene expression and regulation are pivotal in driving phenotype evolution This study aims to identify shifts in gene expression and cis-regulatory activity and their potential role in phenotypic adaptation. Using African mole-rats as a model, renowned for their unique phenotypic adaptation traits like cancer resistance and hypoxia tolerance, we aimed to elucidate the genome-wide gene expression patterns underlying these traits that have been mainly characterised at the level of candidate genes and in individual species. Profiling gene expression in heart and liver tissues across two mole-rat species and two rodent outgroups, we used a phylogenetic comparative approach to identify genes with expression shifts within the mole-rat clade and in specific genera. These shifted genes are associated with functions pertinent to known adaptations in naked mole-rats, such as cellular respiration and glycolysis in the heart. Furthermore, our analysis revealed concordant changes in the regulatory landscape of these genes. By employing a phylogenetic comparative approach, we offer new insights into the interplay between gene expression, regulation, and phenotypic evolution in mammals. Our findings shed light on the molecular mechanisms driving the evolution of unique traits in mole-rats and potentially other mammalian species.
15:40	Victor Banon Garcia, Nelle Varoquaux and Ivan Junier A Comprehensive Study of Inverted Repeats in Prokaryotic Genomes: Enrichment, Depletion, and Taxonomic Variations PRESENTER: Victor Banon Garcia ABSTRACT. Inverted repeats (IRs) are genetic elements with a DNA motif (left arm) followed by a gap, or spacer, and its reverse complement (right arm) – e.g., ATACGGnnnCCGTAT. They play a key role in manybiological functions, including gene regulation, DNA replication, and genome plasticity. In this study, we aim to systematically investigate the distribution of short IRs (with gap lengths up to 20 bp) in all completely sequenced prokaryotic species by confronting observed statistics to expected ones computed by permuting DNA sequences under specific constraints. Through this systematic approach, we reveal complex patterns of IR biases with five main observations: (i) a systematic enrichment of IRs with arm lengths longer than 6 bp, (ii) a systematic depletion of palindromes (IRs with a zero-length gap) shorter than 6 bp, (iii) qualitatively different biases between coding and non-coding regions, (iv) in non-coding regions, the most frequent enrichment over bacterial species occurs for gap length of 4 bp similar to the most common loop size of RNA hairpins associated with known transcription terminators in model organisms, and (v) biases in coding regions that strongly depend on the species considered. Altogether, these findings — both corroborating and further deepening previous analyses — highlight universal evolutionary constraints as well as species-specific selective pressures that act on genome sequences, particularly on IRs.

16:00-16:30Coffee Break

16:30-17:30 Session 5A: Statistics, Machine Learning, and AI for Biology and Health

Chair:

Romain Bourqui

Location: Main amphi

16:30	Clémence Réda, Jill-Jênn Vie and Olaf Wolkenhauer Joint Embedding-Classifier Learning for Interpretable Collaborative Filtering PRESENTER: Clémence Réda ABSTRACT. Background: Interpretability is a topical question in recommender systems, especially in healthcare applications. An interpretable classifier quantifies the importance of each input feature for the predicted item-user association in a non-ambiguous fashion. Results: We introduce the novel Joint Embedding Learning-classifier for improved Interpretability (JELI). By combining the training of a structured collaborative-filtering classifier and an embedding learning task, JELI predicts new user-item associations based on jointly learned item and user embeddings while providing feature-wise importance scores. Therefore, JELI flexibly allows the introduction of priors on the connections between users, items, and features. In particular, JELI simultaneously (a) learns feature, item, and user embeddings; (b) predicts new item-user associations; (c) provides importance scores for each feature. Moreover, JELI instantiates a generic approach to training recommender systems by encoding generic graph-regularization constraints. Conclusions: First, we show that the joint training approach yields a gain in the predictive power of the downstream classifier. Second, JELI can recover feature-association dependencies. Finally, JELI induces a restriction in the number of parameters compared to baselines in synthetic and drug-repurposing data sets.
16:50	Louison Silly, Guy Perrière and Philippe Ortet Models for protein domain embedding PRESENTER: Louison Silly ABSTRACT. Protein embedding consist of producing a mathematical representation of a protein based on data such as sequence or structure. Protein embedding is widely use thanks to the last advances in artificial intelligence that allow embedding to resume proteins information (such as function, biochemical properties, ...). One of the setback to protein embedding is the dimensionality of the latest, often very large, leading to difficulties to manipulate them efficiently (the so called Curse of Dimensionality). We present here our work on protein embedding based on protein domains architecture. A protein having less domains than amino acids we hope to produced embedding of lower dimensionality that would be easier to use. We trained two models based on the Bert architecture on different training datasets using the mask language modeling objective. Our training datasets were obtained by annotating Uniprot (Trembl + SwissProt) and BFD proteins using PFAM domains and Low Complexity Region. Our models show good performances on some training sets and seems to be able to learn a good protein representation from their domains architecture.
17:10	Galadriel Brière, Thomas Stosskopf, Benjamin Loire and Anaïs Baudot Benchmarking Data Leakage on Link Prediction in Biomedical Knowledge Graph Embeddings PRESENTER: Galadriel Brière ABSTRACT. In recent years, Knowledge Graphs (KGs) have gained significant attention for their ability to organize complex biomedical knowledge into entities and relationships. Knowledge Graph Embedding (KGE) models facilitate efficient exploration of KGs by learning compact data representations. These models are increasingly applied to biomedical KGs for link prediction, for instance to uncover new therapeutic uses for existing drugs. While numerous KGE models have been developed and benchmarked for link prediction, existing evaluations often overlook the critical issue of data leakage. Data leakage leads the model to learn patterns it would not encounter when deployed in real-world settings, artificially inflating performance metrics and compromising the overall validity of benchmark results. In machine learning, data leakage can arise when (1) there is inadequate separation between training and test sets, (2) the model leverages illegitimate features, or (3) the test set does not accurately reflect real-world inference scenarios. In this study, we implement a systematic procedure to control train-test separation for KGE-based link prediction and demonstrate its impact on models' performance. In addition, through permutation experiments, we investigate the potential use of node degree as an illegitimate predictive feature, finding no evidence of such leveraging. Finally, by evaluating KGE models on a curated dataset of rare disease drug indications, we demonstrate that performance metrics achieved on real-world drug repurposing tasks are substantially worse than those obtained on drug-disease indications sampled from the KG.

16:30-17:30 Session 5B: Metagenomics, Metatranscriptomics, and Microbial Ecosystems Statistics

Chair:

Rayan Chikhi

Location: amphi D

16:30	Antonin Colajanni, Raluca Uricaru, Rodolphe Thiebaut and Patricia Thebault Metatranscriptomic classification in the study of microbial translocation PRESENTER: Antonin Colajanni ABSTRACT. Microbial translocation occurs when bacteria migrate from the gut to the blood due to gut barrier alterations, potentially triggering persistent immune activation and affecting immune responses. Translocation can be studied using whole blood RNA sequencing techniques to analyze the “meta- transcriptome”: all the RNA from bacteria, viruses and fungi from the blood. However, a key challenge is reusing cohort data, which primarily consists of human sequences, to characterize the non-human meta-transcriptome. Cohort studies typically focus on human data while minimizing non- human contamination. From a translocation perspective, these non-human sequences become the focus, requiring human sequences to be filtered out, leaving a small fraction (~2%) to be analyzed. In previous work, Douek et al. used a sequence assembly-based pipeline for translocation analysis. We compared this approach with an assembly-free method and found that integrating both strategies into a “hybrid” pipeline improved classification performance in simulations. While real-data validation remains challenging, our results suggest this hybrid strategy enhances microbial translocation analysis.
16:50	Isaure Quetel, Sourakhata Tirera, Damien Cazenave, Nina Allouch, Chloé Baum, Yann Reynaud, Degrâce Batantou Mabandza, Virginie Nerriere, Serge Vedy, Matthieu Pot, Sebastien Breurec, Anne Lavergne, Séverine Ferdinand, Vincent Guerlais and David Couvin Fast answers to simple bioinformatics needs and capacity building in an island context, a focus on microbial omics data analysis PRESENTER: David Couvin ABSTRACT. Bioinformatics is increasingly used in various scientific works. Large amounts of heterogeneous data are being generated by scientific teams or laboratories for research purposes. Sequencing and other biological data are difficult to interpret and analyze effectively without dedicated and adapted tools. Several software tools have been developed to facilitate handling and analyses of these types of data. The Galaxy project web platform is one of these software tools that allow free access to users and facilitates the use of thousands of bioinformatics tools. Other software tools like Bioconda or Jupyter Notebook make it easier to install tools and their dependencies for bioinformatics scripts or to offer a user-friendly web interface. In addition to these tools, we can mention RStudio which is an integrated development environment (IDE) facilitating the use of R scripts. The aim of this study is to provide some guides (or helpers) to the scientific community to perform some bioinformatics or biostatistics analyses in a simpler manner. We also try with this work to democratize well-documented software tools to make them suitable for both bioinformaticians and non-bioinformaticians. We believe that user-friendly guides and real-life/concrete examples will provide end users with suitable and easy-to-use methods for their bioinformatics analysis needs. Furthermore, tutorials and examples of use will be available on our dedicated GitHub repository (https://github.com/karubiotools/AnssBin). These tutorials/examples (in English and/or French) could be used as pedagogical tools promoting bioinformatics analyses and potential answers to some bioinformatics needs. Platforms and/or services play an important role in helping scientists with their bioinformatics data analysis work. These facilities are the cornerstone of bioinformatics capacity building in the overseas islands and support the growth of nascent networks such as KaruBioNet.
17:10	Camille Champion, Raphaëlle Momal, Emmanuelle Le Chatelier, Mathilde Sola, Mahendra Mariadassou and Magali Berland OneNet—One network to rule them all: Consensus network inference from microbiome data PRESENTER: Magali Berland ABSTRACT. Modeling microbial interactions as sparse and reproducible networks is a major challenge in microbial ecology. Direct interactions between the microbial species of a biome can help to understand the mechanisms through which microbial communities influence the system. Most state-of-the art methods reconstruct networks from abundance data using Gaussian Graphical Models, for which several statistically grounded and computationally efficient inference approaches are available. However, the multiplicity of existing methods, when applied to the same dataset, generates very different networks. In this article, we present OneNet, a consensus network inference method that combines seven methods based on stability selection. This resampling procedure is used to tune a regularization parameter by computing how often edges are selected in the networks. We modified the stability selection framework to use edge selection frequencies directly and combine them in the inferred network to ensure that only reproducible edges are included in the consensus. We demonstrated on synthetic data that our method generally led to slightly sparser networks while achieving much higher precision than any single method. We further applied the method to gut microbiome data from liver-cirrhotic patients and demonstrated that the resulting network exhibited a microbial guild that was meaningful in terms of human health.

16:30-17:30 Session 5C: Functional and Integrative Genomics

Chair:

François Sabot

Location: Amphi E

16:30	Simon Malesys, Rachel Torchet, Bertrand Saunier and Nicolas Maillet AntiBody Sequence Database PRESENTER: Nicolas Maillet ABSTRACT. Antibodies play a crucial role in the humoral immune response against health threats, such as viral infections. Training AI (Artificial Intelligence) models, for example to assist in developing sero-diagnostics or antibody-based therapies, requires building datasets according to strict criteria, to include as many standardized antibody sequences as possible. However, the available sequences are scattered across partially redundant databases and compiling them into a single non-redundant standardized dataset has hitherto remained a challenge. Here, we present ABSD (AntiBody Sequence Database, https://absd.pasteur.cloud) which contains data from major publicly-available resources (abYbank, CATNAP-HIV, CoV-AbDab, GeneBank, IMGT, KABAT, OAS, PDB, PLAbDab, PairedNGS, SACS, SAbDab, UniProt...), creating the largest standardized, automatically updated and non-redundant (i.e., each antibody sequence stored in the database is unique) source of public antibody sequences for different species. While ABSD contains over 1,350,000 antibody sequences today, trillions of them may circulate in the human population. This limitation is unlikely to be resolved anytime soon, but diversity might matter more than sheer number. In the article, we demonstrate that, at least regarding IGHV regions, our methodology does not seem to have introduced a strong bias in the selection of antibody sequences towards specific gene clusters, compared to a classic human repertoire. When training deep learning models, the uniqueness and representativeness of the input data is likely essential for most applications. In this regard, ABSD will help mirror the human repertoire by providing, as broadly as possible and without bias, unique antibody sequences with realistic proportions. Finally, ABSD is a dynamic and adaptive database, designed for automatic updates and easy upgrades. This user-friendly and open website enables users to generate lists of antibodies based on selected criteria and download the unique sequence pairs of their variable regions.
16:50	Typhaine Paysan-Lafosse, Antonina Andreeva, Matthias Blum, Sara Rocio Chuguransky, Tiago Grego, Beatriz Lazaro Pinto, Gustavo Salazar, Maxwell L Bileschi, Felipe Llinares-López, Laetitia Meng-Papaxanthos, Lucy J Colwell, Nick V Grishin, R Dustin Schaeffer, Damiano Clementel, Silvio C E Tosatto, Erik Sonnhammer, Valerie Wood and Alex Bateman The Pfam protein families database: embracing AI/ML PRESENTER: Typhaine Paysan-Lafosse ABSTRACT. The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describes major developments in Pfam since 2020, including decommissioning the Pfam website and integration with InterPro, harmonization with the ECOD structural classification, and expanded curation of metagenomic, microprotein and repeat-containing families. We highlight how AlphaFold structure predictions are being leveraged to refine domain boundaries and identify new domains. New families discovered through large-scale sequence similarity analysis of AlphaFold models are described. We also detail the development of Pfam-N, which uses deep learning to expand family coverage, achieving an 8.8% increase in UniProtKB coverage compared to standard Pfam. We discuss plans for more frequent Pfam releases integrated with InterPro and the potential for artificial intelligence to further assist curation. Despite recent advances, many protein families remain to be classified, and Pfam continues working toward comprehensive coverage of the protein universe.
17:10	Matthew Dyer, Quy Xiao Xuan Lin, Denis Thieffry and Touati Benoukraf MethMotif 2024 Suite Reveals the Epigenetic Blueprint of Context-Specific Transcription Factor Binding Sites PRESENTER: Touati Benoukraf ABSTRACT. MethMotif (https://methmotif.org) is a publicly available database that provides a comprehensive repository of transcription factor (TF)-binding profiles enriched with DNA methylation patterns. Since its inception in 2019, the platform has evolved to incorporate expanded datasets and advanced functionalities, deepening our understanding of context-specific TF functions. In its 2024 release, MethMotif expands its initial collection from 509 to over 700 position weight matrices (PWMs), all annotated with DNA methylation profiles. A key advancement of this update is the segregation of TF-binding motifs based on cofactors and DNA methylation status, allowing researchers to explore how gene ontology (GO) annotations and TF target genes can differ under varying cofactor contexts. MethMotif now supports two additional species: Mus musculus and Arabidopsis thaliana, broadening its applicability for comparative and translational research. By incorporating cofactor-based binding motifs, methylation profiles, and precomputed GO enrichments, MethMotif stands out as the first and only TF-binding motif database to integrate context-specific PWMs with epigenetic information, thus enabling deeper insights into the regulatory mechanisms governing gene expression.

17:30-18:30 Session 6A: Demos

Chair:

Nicolas Tourasse

Location: amphi D

17:30

Julie Lao, Raphaël Tackx, Amanda Dieuaide, Thomas Mignon, Cléa Siguret, Romain Dallet, Pierre Marin, Kenzo-Hugo Hillion, Bérénice Batut, Nadia Goué, Etienne Ruppé, Gildas Le Corguillé, Philippe Glaser, Fabien Mareuil and Claudine Medigue

ABRomics-analysis: a web service for the analysis of Antibiotic Resistance in bacterial genomes

PRESENTER: Amanda Dieuaide

ABSTRACT. Antibiotic resistance (ABR) is a major global public health issue, recognized as an urgent priority by international institutions. Especially regarding the emergence and the global dissemination of multidrug-resistant bacteria (MDRB) and antibiotic resistance genes (ARGs), as they can spread widely across humans, animals and the environmental domains, transcending borders. We present ABRomics-analysis, an online platform designed for tracking MDRB and ARGs from bacterial genomes, with metagenomes analyses planned for the future. This platform aims to provide a tool for the surveillance and the research in antimicrobial resistance in a One Health context. ABRomics-analysis offers a user-friendly interface for uploading and managing sequencing data, launching standardized bioinformatics workflows, and exploring the ABR content of analyzed samples. Once samples have been uploaded, a default workflow is automatically launched, which includes (for whole-genome sequencing data) quality control of the raw data, species identification, contamination checking, sequence typing, genome assembly, genome annotation, plasmid typing, antibiotic resistance gene detection, and virulence gene detection. Users will also be able to perform custom analyses to answer more specific ABR-related questions, such as core-genome multilocus sequence typing (cgMLST) in a future platform version. Once the analysis is complete, an online report summarizes the sample metadata and the results from each step of the analysis. Users can download individual output files (e.g., assembled genome FASTA files, ABR gene detection tables) or retrieve multiple results for cross-sample analyses. For a comprehensive view of ABR gene transmission and to foster broad collaboration, the ABRomics-analysis database enables users to explore all available data through a variety of tools: a database overview with a list of filters, including sample type, spatial and temporal metadata, antibiotic classes associated with detected ABR genes, species, and sequence type detected; an interactive world map that can be filtered to show the localization of collected samples; and an anonymous contact system that facilitates communication and collaboration among users. ABRomics-analysis is accessible at https://analysis.abromics.fr/

18:00

Sylvain Marthey, Natacha Baffo, María-Natalia Lisa and Gwenaëlle André

WHOOPER : Web application for Hands-On identification of protein co-Occurrence among Phyla, focused on user ERgonomics.

PRESENTER: Sylvain Marthey

ABSTRACT. WHOOPER is an intuitive web application designed to facilitate the analysis of protein co-occurrence across taxa. By integrating HHblits and HMMER, it enables efficient detection of remote homologs across extensive proteomic datasets. Its results interface combines an original data representation with advanced dynamic filtering functionalities to enable real-time exploration and exploitation of alignment data. Addressing key limitations of existing tools, WHOOPER streamlines workflows for studying protein prevalence, co-occurrence, and phylogenetic patterns, supporting seamless export for downstream evolutionary and functional genomics research.

17:30-18:30 Session 6B: Demos

Chair:

Elodie Darbo

Location: Amphi E

17:30

Maximilien Colange, Guillaume Appé, Léa Meunier, Solène Weill, Akpéli Nordor and Abdelkader Behdenna

Bridging the Gap Between R and Python in Bulk Transcriptomic Data Analysis with InMoose

PRESENTER: Maximilien Colange

ABSTRACT. We introduce InMoose, an open-source Python environment aimed at omic data analysis. We illustrate its capabilities for bulk transcriptomic data analysis. Due to its wide adoption, Python has grown as a de facto standard in fields increasingly important for bioinformatic pipelines, such as data science, machine learning, or artificial intelligence (AI). As a general-purpose language, Python is also recognized for its versatility and scalability. InMoose aims at bringing state-of-the-art tools, historically written in R, to the Python ecosystem: ComBat, ComBat-Seq, limma, edgeR, DESeq2. InMoose focuses on providing drop-in replacements for R tools, to ensure consistency and reproducibility between R-based and Python-based pipelines. The first development phase has focused on bulk transcriptomic data, with current capabilities encompassing data simulation, batch effect correction, and differential expression analysis and meta-analysis.

18:00

Lucie Gaspard-Boulinc, Luca Gortana, Thomas Walter, Emmanuel Barillot and Florence Cavalli

Interactive toolbox for cell-type deconvolution of spatial transcriptomics data

PRESENTER: Lucie Gaspard-Boulinc

ABSTRACT. Spatial transcriptomics is a powerful method to study the spatial organization of cells, which is a critical feature in the development, function and evolution of multicellular life. However, sequencing-based spatial transcriptomics has not yet achieved cellular-level resolution, such that advanced deconvolution methods are needed to infer cell-type contributions at each location in the data. Recent progress has led to diverse tools for cell-type deconvolution that are helping to describe tissue architectures in health and disease. In this resource we describe the varied types of cell-type deconvolution methods for spatial transcriptomics, contrast their capabilities, and summarize them in a Shiny interactive table to enable more efficient method selection. Our tool gathers 67 cell-type deconvolution methods specifically developed for spatial transcriptomics. We categorize them with two classification systems depending on the use of single-cell resolved RNA-seq data and the mathematical framework used to formulate and solve the deconvolution problem. In addition, we gather 30 different items which detail across all the methods the expected input and output, data pre-processing, strategies to integrate scRNA-seq, image and coordinates, computational and biological validation with links to documentation. Our application is the first extensive database dedicated to cell-type deconvolution and provide a true mean to describe cell-type deconvolution landscape for spatial transcriptomics. Cell-type deconvolution is undergoing incredible advances, providing new insights into the organization of cellular communities from spatial transcriptomics data. The sheer number of methods has made difficult to choose a method and identify improvement venues. Thanks to the application’s interactive filtering and search bar, method developers and users can fetch for families of methods with given characteristics. On the methodological development side, our app gives a comprehensive overview of the mathematical models already used and considerations on how to benchmark such methods regarding simulation, datasets and metrics commonly used, leading to better targeting areas of improvement. On the user side, depending on tissue type, input quality, required output and programming fluency, our app allows to identify potential methods of interest tailored to one’s project. Furthermore, as cell-type deconvolution method development is an active field of research, we provide a form to maintain our tool up to date with recent development in the field. In conclusion, our app offers a comprehensive and interactive resource for selecting and understanding cell-type deconvolution methods for spatial transcriptomics. It enables efficient method selection tailored to specific research needs. As the field advances, our app supports both developers and users in improving and applying these methods, with ongoing updates to reflect the latest developments.

17:30-18:30 Session 6C: Poster session

#3 Etienne Bardet, Mariène Wan, Johann Confais and Hadi Quesneville "REPET v4.0: A robust and accessible tool for TE annotation"

#4 Franck Samson and Pierrick Samson "Digital Concord: Cultivating links between Discord and the secrets of the genome."

#5 Enora Geslain, Filip Volckaert and Hugo Gante "Refining eDNA taxonomic assignments with a phylogenetic approach"

#6 Thomas Vitre, Denis Fargette, Paul Bastide, Elisabeth Fichet-Calvet, Dàniel Cadar, Stéphane Guindon and François Chevenet "EvoLaps 3 : Next-Level Phylogeographic Visualization"

#9 Arnab Mutsuddy, Jonah R. Huggins, Aurore K. Amrit, Cemal Erdem, Jon C. Calhoun and Marc R. Birtwistle "Mechanistic Modeling of Cell Viability Assays with in silico Lineage Tracing"

#11 Mouna El Garb, Emmanuel Coquery, Fabien Duchateau and Nicolas Lumineau "A meta-model for representing bioinformatics workflow to improve reproducibility"

#13 Anna Tran, Valentin Wucher, Charles Petitpre, Céline Riou, Anne-Laurie Pinto, Pauline Wajda, Gabriel Chemin, Elise Peter, Bertrand Dubois, Virginie Desestret, Jérôme Honnorat and Bastien Joubert "Single-cell profiling reveals distinct immune profile in anti-Yo paraneoplastic neurological syndrome"

#14 Nicolas Fontrodona, Matéo Bazire, Julien Ladet, Cyril Bourgeois and Franck Mortreux "COSMIQ-4C: quantifying 3D genome interactions of retroviral proviruses at the single-clone level"

#15 Hajar Bouamout, Sandra Dérozier, Mathilde Rumeau, Louise Deléger, Marine Courtin, Claire Nedellec, Robert Bossy, Sylvie Combes, Valentin Loux and Mouhamadou Ba "HoloOligoDB: Exploring Mammalian Milk Oligosaccharides"

#16 Fabien Mareuil, Rachel Torchet, Luis Checa Ruano, Vincent Mallet, Michael Nilges, Guillaume Bouvier and Olivier Sperandio "InDeepNet a web plateform for predicting and validating protein-protein interaction binding sites"

#17 Leonard Brindel and Clémence Frioux "Metage2Metabo-PostAViz: exploring and visualising the wealth of metabolic modelling predictions to compare microbial communities"

#19 Lauryn Trouillot, Rossana Sussarellu and Gregory Carrier "Exploring the challenges of Dinoflagellate transcriptome assembly with long-read data"

#20 Emma Mathieu, Rajesh Durairaj and Aurore Lamy "Computational investigation of chemosensory receptors in salmonid species"

#21 Salomé Brunon, Laurent Jourdren and Sophie Lemoine "Generation of gene annotations including UTRs for Bulk and Single-Cell RNASeq analyses"

#22 Corinne Blugeon, Ali Hamraoui, Laurent Jourdren, Sophie Lemoine, Tiphaine Marvillet, Catherine Senamaud-Beaufort, Stephane Le Crom and Morgane Thomas-Chollier "GenomiqueENS, the IBENS Genomics core facility"

#23 Alice Jegou, Jérémy Lucas, Marianne Dreuillet, François Parcy and Romain Blanc-Mathieu "Building an atlas of transcription factor DNA-binding properties in Arabidopsis thaliana"

#25 Marinna Gaudin, Damien Eveillard and Samuel Chaffron "Beyond species : towards the modeling and prediction of marine plankton ecological associations biogeography"

#26 Chabname Ghassemi Nedjad, Clémence Frioux and Loïc Paulevé "Seed2LP: seed inference in metabolic networks for reverse ecology applications"

#27 Matthias Zytnicki and The Genotoul-Bioinfo Platform "The Genotoul-Bioinfo platform"

#28 Florent Dumont and Pauline Tran "MOAL: Improving the Reproducibility of OMICS Bioanalysis"

#31 Pauline Le Corre, Yann Le Cunff and Anthony Bretaudeau "MLOps best practices for bioinformatics"

#33 Lindsay Goulet, Florian Plaza Oñate, Pauline Barbet, Alexandre Famechon, Benoît Quinquis, Eugeni Belda, Edi Prifti, Emmanuelle Le Chatelier and Guillaume Gautreau "CroCoDeEL: accurate control-free detection of cross-sample contamination in metagenomics data"

#35 Brieuc Quemeneur, Audrey Bihouée, Samuel Chaffron, Claudine Médigue, Hervé Ménager and Alban Gaignard "A multi-modal and temporal antibiotic resistance knowledge graph"

#36 Hugo Lefeuvre, Audrey Bihouée, Bérénice Batut, Samuel Chaffron, Claudine Médigue and Philippe Glaser "ABRomics-analysis: Developing Metagenomic Workflows for National Antibiotic Resistance Surveillance Platform"

#37 Claire Rioualen, Maria Doyle, Alban Gaignard, Vincent Carey and Hervé Ménager "Annotating Bioconductor packages using the EDAM ontology"

#38 Julie Aubert and Christelle Hennequet-Antier "StatOmique: sharing experience on statistical analysis of omics data"

#39 Saswat Mohanty, Francesca Chiaromonte and Kateryna Makova "Evolutionary Dynamics of G-Quadruplexes in Human and Other Great Ape Telomere-to-Telomere Genomes"

#40 Benjamin Loire, Galadriel Brière and Anaïs Baudot "KGATE, an Autoencoder Training Environment for exploring and benchmarking Knowledge Graph Embedding models"

#47 Bérénice Batut, Gildas Le Corguillé and Anthony Bretaudeau "Exploring the Richness of the French Galaxy Ecosystem"

#48 Helena Rasche, Saskia Hiltemann, Bjoern Gruening, Galaxy Training Network and Bérénice Batut "Galaxy Training: A powerful framework for teaching!"

#51 Romane Libouban and Anthony Bretaudeau "Genome Annotation tooling in Galaxy: contributions of the EuroScienceGateway project"

#54 Thomas Biscop, Delphine Sicard and Hugo Devillers "Investigating the habitat range of yeast species"

#55 Mathilde Sola, French Gut Consortium, Patrick Veiga, Clémence Frioux and Magali Berland "Unraveling gut microbiome latent structures: enterosignatures and microbial guilds"

#57 Gadea Cabrejas, Marine Sroussi, Hugo Croizer, Nicolas Salaün, Lara Jerman, Adèle Trottier, Thierno Balde, Sophie Doublier, Antoine Cazelles, Théo Hirsch, Roseline Vibert, Delphine Le Corre, Sophie Mouillet-Richard, Pierre Laurent-Puig, Clarice Groeneveld and Aurélien de Reyniès "fastCNV: Fast and Accurate CNV prediction from scRNA-Seq and Spatial Transcriptomics Data"

#61 Amel Benarbia, Sarah Djebali and Gaëlle Legube "Characterizing the landscape of genomic rearrangements associated to double-strand breaks associated to transcribed loci (TC-DSB)"

#63 Elena Nicollin and Arthur Leblois "Generation Mechanisms and Species-Specific Properties of Beta Oscillations in the Basal Ganglia"

#66 Julien Roziere, Franck Samson, Cécile Guichard, Margot Correa, Sylvie Coursol, Marie-Laure Martin and Véronique Brunaud "Plant-PLMview: a web-tool and in silico method for identifying cis-regulatory elements in gene-proximal regions of plants"

#68 Alyssa Imbert, Mylène Delosière, Justine Bertrand-Michel, Pauline Le-Faouder, Sylvain Emery, Laurence Bernard and Muriel Bonnet "Normalization Strategies in Lipidomic Profiling: Implications for Ewe’s Milk Lipolysis Biomarker Discovery"

#69 Sébastien Theil, Mahendra Mariadassou, Philippe Ruiz, Guillaume Kon Kam King, Matthieu Bouchon, Isabelle Verdier-Metz, Cécile Bord, Annick Bernalier-Donadille, Juliette Bloor, Nicolas Chemidlin Prévost-Bouré, Evelyne Forano, Elisa Michel, Bruno Martin, Céline Delbès and Anne-Laure Abraham "Microbial transfers in dairy systems under changing climate and farming practices"

#71 Olivier Rué, Gabryelle Agoutin, Lucas Auer, Lynn Bekaï, Maria Bernard and Géraldine Pascal "A web platform for taxonomic exploration of metabarcoding databanks"

#72 Victor Grentzinger, Leonor Palmeira, Keith Durkin, Maria Artesi and Vincent Bours "Lifting the veil on Challenging Medically Relevant Filaggrin Gene"

#74 Louis Ollivier, Fanny Pouyet, Sarah Cohen Boulakia and Gilles Fischer "A Robust Computational Framework to Characterize the Genetic Diversity Across 3,570 Strains in S. cerevisiae"

#77 Martin Racoupeau, Alexis Mergez, Christophe Klopp, Fabrice Legeai, Frederic Choulet, Philippe Bardou and Christine Gaspin "Pan1c: A Snakemake Workflow for Chromosome-Level Pangenome Construction and analysis"

#78 Rachel Legendre, Hugo Varet, Deniz Uresin, Claudia Chica and Anastassia Komarova "Viral RNA meets RIG-I: insights from RNA-Seq"

#79 Maëlle Pomiès, Gabryelle Agoutin, Lucas Auer, Géraldine Pascal and Sylvie Combes "Long-read sequencing of the 16S-ITS-23S operon enables maternal microbiota transmission study at the strain-level resolution"

#80 Ahamed Tchatakoura, Marie Buysse, Marie-Laure Setier Rio, Claire Loiseau and Amandine Aviles "Microplastic effects on Mosquito Gene Expression and Microbiota"

#177 Anthony Haidamous, Océane Reichert, David Meyre and Sébastien Hergalant "Comprehensive Identification of Pleiotropic Associations for Chronic Lymphocytic Leukemia"

#178 Yanis Asloudj, Seydina Mouhamed Diouf, Fleur Mougin and Patricia Thebault "The Good, the Bad and the Ugly: methodological and conceptual pitfalls in single-cell data science."

#191 Franciane Nuissier and Jacques Lagnel "OLGA: Local Accession Management Tool for Biological Resources Centres (BRC)"

#202 Matthieu Najm, Marko Baric, Taru Muranen, Ekaterina Gaydukova, Altti Ilari Maarala, Jaana Oikkonen, Federico Bolelli, Vittorio Pipoli, Veli-Matti Isoviita, Johanna Hynninen, Benno Schwikowski and Johann Dreo "Using OntoWeaver to Integrate Heterogeneous Information in the OncodashKB Semantic Knowledge Graph for Finding Personalized Actionable Drugs in Ovarian Cancer"

Location: Main hall

18:30-20:00 Welcome reception

Location: Main hall