JOBIM 2024: JOURNéES OUVERTES DE BIOLOGIE, INFORMATIQUE ET MATHéMATIQUES 2024
PROGRAM FOR FRIDAY, JUNE 28TH
Days:
previous day
all days

View: session overviewtalk overview

09:30-10:30 Session 18A: Systems biology and metabolomics
09:30
Comparative constraint-based modelling of fruit development across species highlights nitrogen metabolism in the growth- defence trade-off
PRESENTER: Chloé Beaumont

ABSTRACT. Plant genetic diversity is great within plant species. On the other hand, their development is closely linked to primary metabolism and its regulation at different levels, whether through gene expression, enzymatic activities and the abundance of metabolites. Recent work compared the growth of eight fruit species, and showed that among the major compounds of biomass, some are essential for predicting the relative growth rate of fruits (RGR). Constraint-based metabolic modelling has provided a robust framework to predict metabolic fluxes and study how the plant metabolic network generates specific profiles during fruit development. This approach was applied to the panel of eight fruit species using a metabolic model based on the knowledge of heterotrophic cells, which describes a generic metabolic network of primary metabolism. Metabolic fluxes were estimated by restricting the model with a comprehensive set of metabolites and major biomass compounds quantified throughout fruit development. Multivariate analyses revealed a clear common pattern of flux distribution during fruit development with differences between fast- and slow-growing fruits. Notably, only the latter fruits mobilize the tricarboxylic acid cycle in addition to glycolysis, leading to a higher rate of respiration. Building virtual fruits by combining 12 biomass compounds revealed that the growth-defense trade-off is primarily supported by cell wall synthesis for fast-growing fruits and total polyphenol accumulation for slow-growing fruits. Recently, transcriptomic data were generated on the same panel of fruits to compare transcriptome dynamics during fruit development. The relationships between total transcript concentration and four characteristics of fruit development: developmental progress, fruit growth, RGR and protein content were investigated. Finally, predictions of these traits from the transcriptome were obtained very satisfactorily with a machine learning model, highlighting the key role of protein synthesis.

09:50
Improving Snoussi constraints in the Thomas framework for Gene Networks

ABSTRACT. The modeling of gene networks plays a crucial role for the comprehension and control of gene regulatory networks, and the extraordinarily wide range of its applications reinforces the craze for systems biology. This is a transdisciplinary field where the cross-fertilization of disciplines aims at providing tools for helping the modeling activity. Whatever the modeling framework, the bottleneck of the modeling process remains the identification of parameters. Even in discrete abstract modeling frameworks such as the R. Thomas one, the combinatorics of parameter settings make unrealistic the brute force approach consisting in enumeration of all parameterizations. Snoussi introduced in the late 80’, a constraint allowing the discrete model to be consistent with a continuous one. In this article, we show that this constraint is not sufficient and propose an extension to the Snoussi constraints.

10:10
Large-scale computational modelling of the M1 and M2 synovial macrophages in rheumatoid arthritis
PRESENTER: Anna Niarakis

ABSTRACT. Macrophages play an essential role in rheumatoid arthritis. Depending on their phenotype (M1 or M2), they can play a role in the initiation or resolution of inflammation. The M1/M2 ratio in rheumatoid arthritis is higher than in healthy controls. Despite this, no treatment targeting specifically macrophages is currently used in clinics. Thus, devising strategies to selectively deplete proinflammatory macrophages and promote anti-inflammatory macrophages could be a promising therapeutic approach. State-of-the-art molecular interaction maps of M1 and M2 macrophages in rheumatoid arthritis are available and represent a dense source of knowledge; however, these maps remain limited by their static nature. Discrete dynamic modelling can be employed to study the emergent behaviours of these systems. Nevertheless, handling such large-scale models is challenging. Due to their massive size, it is computationally demanding to identify biologically relevant states in a cell- and disease-specific context. In this work, we developed an efficient computational framework that converts molecular interaction maps into Boolean models using the CaSQ tool. Next, we used a newly developed version of the BMA tool deployed to a high-performance computing cluster to identify the models’ steady states. The identified attractors are then validated using gene expression data sets and prior knowledge. We successfully applied our framework to generate and calibrate the M1 and M2 macrophage Boolean models for rheumatoid arthritis. Using KO simulations, we identified NFkB, JAK1/JAK2, and ERK1/Notch1 as potential targets that could selectively suppress proinflammatory macrophages and GSK3B as a promising target that could promote anti-inflammatory macrophages in rheumatoid arthritis.

09:30-10:30 Session 18B: Platform and services: best practices
09:30
Misconceptions about Galaxy debunked by the (French) Galaxy Community
PRESENTER: Romane Libouban

ABSTRACT. Background Open science computing constitutes a vast, interconnected domain, characterized by its complexity and constant evolution. With computational tools and methods continually advancing, individuals often find it challenging to grasp the entirety of this field. Workflow managers such as Galaxy[1], Nextflow[2], Snakemake[3], and numerous others offer a solution by enabling scientists to swiftly construct intricate and reproducible pipelines, alleviating the necessity to comprehend their inner workings entirely. This approach enables researchers to concentrate on high-value data analysis, freeing them from the complexities of building workflows from scratch. Established in 2005, Galaxy offers a workflow manager that is continually maintained and expanded to support analysis across diverse domains. Bolstered by vibrant global communities, Galaxy undergoes continuous refinement, ensuring its adaptability and relevance in various scientific contexts. As a free, open-source platform, Galaxy facilitates data analysis, workflow authoring, training and education, tool publication, infrastructure management, and more. Galaxy is well adopted in the French ecosystem with 10+ servers and UseGalaxy.fr, the French UseGalaxy server. Despite this wide adoption and support by ample training resources, Galaxy faces persistent misconceptions. These include the belief that it's exclusively for genomics and challenging to use. To address these misconceptions, the Galaxy Community conducted a user survey. Results Through this survey, the Galaxy Community commenced the process of dispelling common misconceptions. The French Galaxy Community, inspired by these debunked myths, further advances this endeavor by gathering and showcasing Galaxy initiatives within the French ecosystem. “Galaxy is only useful for genome scientists” - UseGalaxy.fr offers 3k+ tools and hosts several community-specific subdomains beyond genomics (e.g. ecology, workflow4metabolomics). “Galaxy does not scale to large and complex problems” - UseGalaxy.fr is involved in European efforts such as EuroScienceGateway but also many national projects. It is for example the central piece for ABRomics, the national platform for antimicrobial resistance surveillance: all data on this platform is processed using standardized Galaxy workflows run on UseGalaxy.fr. “Galaxy offers nothing to technically capable informaticians who can write their own analysis code” - Hosted by IFB Core Cluster, UseGalaxy.fr offers 8300 CPU cores, 52 TB of RAM, and GPU cards. Several Interactive tools (GxIT) like Jupyter Notebook, AlphaFold, and Helixer, are available. More importantly, Galaxy offers a platform for versioned, reproducible data analysis, which can be easily shared with others, a powerful workflow manager with metadata annotation to encourage FAIR workflows, an access to 100+ standardized workflows available from two workflow registries (WorkflowHub[4] and Dockstore[5]), and a powerful API with openAPI and a Python package (BioBlend). “Galaxy is hard to learn and hard to use.” - Galaxy is a powerful and complex system that can be overwhelming when coming to an unfamiliar web interface, with unfamiliar terminology for the first time. But its interface has been improved, e.g. the workflow editor with workflow annotation. Galaxy offers also 400+ tutorials available via the Galaxy Training Network[6,7]. UseGalaxy.fr also offers the Training Infrastructure as a Service (TIaaS)[8]. Since February 2022, this service has offered 550+ computing days to 70+ events reaching 1700+ learners. “Galaxy is only useful for teaching, not real scientific analyses” - Galaxy is widely used given the 30+ publications per week using Galaxy. As mentioned, it is the central piece of projects like ABRomics or the Vertebrate Genomes Project (VGP)[9]. “Galaxy is neither popular with, nor widely used by real scientists” - Since 2021, 6k+ users ran 3.6M+ jobs on UseGalaxy.fr. “Galaxy is free, so there cannot be professional staff offering quick and helpful support.” - All support on UseGalaxy.fr is done as a community effort, using the public Discourse hosted by IFB (https://community.france-bioinformatique.fr/), with the support of a team of 6 IFB members from multiple regional platforms “Galaxy is free, so there cannot be useful documentation since nobody is paid to write, maintain, or deliver it.” - Galaxy offers documentation for all its pieces and provides plenty of tutorials. “Galaxy is free, so it cannot be good because professional engineers are not being paid to fix bugs and develop enhancements.” - UseGalaxy.fr is co-administered by a team of 6 IFB members from multiple regional platforms. “Galaxy cannot be sustainable in the long run, because no commercial organisation is managing it” - UseGalaxy.fr is more sustainable than being managed by a commercial organisation: it is supported by IFB and several administrative supervisions with permanent positions “Galaxy tools or omics support in Galaxy are dated and not up-to-date” - Tools on UseGalaxy.fr are installed using a standard GitLab repository and auto-updated by a bot (IFBOT) once a week. Conclusions Galaxy faces persistent misconceptions, such as Galaxy's purported limitations in scaling to complex problems, its perceived lack of value for technically skilled informaticians, and its difficulty of use. The Galaxy community addresses these misconceptions, but also misconceptions about its utility only for teaching, its lack of popularity among scientists, and doubts about its sustainability without commercial backing. The survey's findings and debunking work shed light on Galaxy's diverse applications and advantages, emphasizing its central role in open science computing

09:50
nf-core a community dedicated to best-practices Nextflow workflows
PRESENTER: Maxime Garcia

ABSTRACT. Long-term use and uptake of workflows by the life-science community requires that the workflows, and their tools, are findable, accessible, interoperable, and reusable (FAIR). To address these challenges, workflow management systems such as Nextflow have become an indispensable part of a computational biologist’s toolbox. Nextflow is able to provide functionality to make analysis scalable, reusable, portable and reproducible. Based on its shareability, Nextflow pipelines opened the possibility to generate and improve analysis workflows across a multitude of collaborators in order to agree on standards and facilitate the re-use of pipelines beyond their creators. This was the central idea of the nf-core project. The nf-core community has developed a suite of tools that automate pipeline creation, testing, deployment and synchronisation. As the usage of workflow management tools spreads, an increasing number of tertiary tools are tying into the ecosystem. The nf-core analysis pipelines are at the forefront of this, collaborating with initiatives such as bio.tools, the GA4GH-compliant Dockstore and WorkflowHub to facilitate discovery of he pipelines, as well as having plans to work together with the Biocontainers project to further simplify software packaging, Research Object Crate to handle metadata. The primary portal to nf-core is its website https://nf-co.re, which lists available analysis pipelines, all of the shared components, user and developer-centric documentation, and tutorials, as well as usage and contributor statistics. Workflows hosted on nf-core must adhere to a set of strict best-practice guidelines that ensure reproducibility, portability, and scalability. Currently, there are more than 58 released pipelines and 35 pipelines under development as part of the nf-core community, with more than 400 code contributors.

10:10
codabench, a web-plateform to organize scientific competitions
PRESENTER: Magali Richard

ABSTRACT. Codabench is anopen-source,web-based data challenge platform primarily utilized by the machine learning community to orchestrate public competitions in the field data science analysis. Codabench offers the possibily to organize flexible competitions and benchmarks, thus contributing to the development of advanced methods in data analysis and promoting reproducibility of results. In addition, it facilitates hands-on learning and fosters collaboration within the scientific community. Here we will provide an overview of the Codabench platform, its functionalities, and highlight its significance in the realms of bioinformatics and computational biology, where fair comparison of algorithms is crucial. Specifically, we will present two case studies illustrating how Codabench can effectively contribute to both teaching and scientific research within the bioinformatics community.

09:30-10:30 Session 18C: Knowledge representation, omics and cancer
09:30
Graph representation learning and semantic distribution: application to omic expression data
PRESENTER: Idriss André

ABSTRACT. Recent advances in biological sciences and technologies have led to a consequent amount of highly heterogeneous and non-linearly linked data known as omics data. These data offer a highly granular characterization of biological samples but require in return efficient methods to extract joint information contained in different data modalities. Among these methods, some exploit the network structure between fields of omics, and existing knowledge represented in graphs, to produce models able to learn joint representation of knowledge contained in graphs. In this study, we explore the possibility of applying Natural Language Processing (NLP) enhanced graph representation learning to sample representation in order to discover new biomedical data representation integrating existing knowledge. A vectorial gene representation, integrating the context of the knowledge graph PrimeKG (Precision Medicine Oriented Knowledge Graph) is generated thanks to a random walk that textualizes all encountered nodes and vertices. The resulting textual sequence is then embedded through a graph-aware Siamese transformer neural network. These embeddings, along with gene expression data, were tested on a cancer subtype classification task, showing promising improvements compared to raw gene expression data, although they do not achieve yet the state-of-the-art performances on the tested datasets.

09:50
Intégration reproductible d’informations de haut niveau dans des graphes de connaissances sémantiques avec OntoWeaver et BioCypher, applications en oncologie et en écologie
PRESENTER: Johann Dreo

ABSTRACT. La fusion d'informations de haut niveau à grande échelle et l'intégration de données constituent un besoin pressant dans plusieurs domaines scientifiques. Récemment, la communauté biomédicale a créé BioCypher, un outil permettant de créer de grands graphes de connaissances sémantiques (\textit{Semantic Knowledge Graph}, SKG) de manière simple et reproductible. Dans cet article, nous présentons OntoWeaver, un outil complémentaire à BioCypher qui permet d'extraire facilement des données tabulaires dans des SKG à l'aide d'un simple mappage déclaratif. L'intérêt d'OntoWeaver et de BioCypher est démontré via deux cas d'utilisation différents : l'intégration d'une base de données sur le cancer et la surveillance d'espèces invasives. Nous pensons qu'OntoWeaver et BioCypher, deux logiciels libres et gratuits, peuvent aider plusieurs communautés scientifiques travaillant sur les SKG et les problèmes de fusion d'informations sémantiques.

10:10
Characterizing intergenic transcription at RNA polymerase II binding sites in normal and cancer tissues
PRESENTER: Benoit Ballester

ABSTRACT. Intergenic transcription in normal and cancerous tissues is pervasive but incompletely understood. To investigate this, we constructed an atlas of over 180,000 consensus RNA polymerase II (RNAPII)-bound intergenic regions from 900 RNAPII chromatin immunoprecipitation sequencing (ChIP-seq) experiments in normal and cancer samples. Through unsupervised analysis, we identified 51 RNAPII consensus clusters, many of which mapped to specific biotypes and revealed tissue-specific regulatory signatures. We developed a meta-clustering methodology to integrate our RNAPII atlas with active transcription across 28,797 RNA sequencing (RNA-seq) samples from The Cancer Genome Atlas (TCGA), Genotype-Tissue Expression (GTEx), and Encyclopedia of DNA Elements (ENCODE). This analysis revealed strong tissue- and disease-specific interconnections between RNAPII occupancy and transcriptional activity. We demonstrate that intergenic transcription at RNAPII-bound regions is a novel per-cancer and pan-cancer biomarker. This biomarker displays genomic and clinically relevant characteristics, distinguishing cancer subtypes and linking to overall survival. Our results demonstrate the effectiveness of coherent data integration to uncover intergenic transcriptional activity in normal and cancer tissues.

10:30-11:00Coffee Break
11:00-12:00 Session 19: Keynote 5: Clémence Frioux

Abstract

This talk will present approaches for characterizing microbial communities through the lens of systems biology. These microbial ecosystems bring additional challenges in terms of scale, data integration and interpretation compared to the study of individual populations. Yet, they are of outstanding interest in health, agroecology, or even food systems. I will present several questions we aimed to answer with methodological development ranging from statistical learning to knowledge representation and reasoning or numerical models. Starting from a large dataset depicting the composition of human gut microbiomes at all ages, we identified generic signatures whose assembly accurately depicts the dynamic equilibrium of the ecosystem. Going further than the description of a community’s members, we also build mechanistic models in order to decipher their metabolism. For small consortia, we predict the dynamics of the system using constraint-based models and try to improve scalability with statistical learning. For larger communities, we propose a Boolean approximation as a basis for modelling metabolic activity, complementarity and interactions among populations. Overall, our methods try to address several use-cases in the computational study of microbial ecosystems in order to connect the (meta)genome and other data to predictive models.

11:00
Exploration of microbial ecosystems: from compositional patterns to metabolic models

ABSTRACT. Abstract

This talk will present approaches for characterizing microbial communities through the lens of systems biology. These microbial ecosystems bring additional challenges in terms of scale, data integration and interpretation compared to the study of individual populations. Yet, they are of outstanding interest in health, agroecology, or even food systems. I will present several questions we aimed to answer with methodological development ranging from statistical learning to knowledge representation and reasoning or numerical models. Starting from a large dataset depicting the composition of human gut microbiomes at all ages, we identified generic signatures whose assembly accurately depicts the dynamic equilibrium of the ecosystem. Going further than the description of a community’s members, we also build mechanistic models in order to decipher their metabolism. For small consortia, we predict the dynamics of the system using constraint-based models and try to improve scalability with statistical learning. For larger communities, we propose a Boolean approximation as a basis for modelling metabolic activity, complementarity and interactions among populations. Overall, our methods try to address several use-cases in the computational study of microbial ecosystems in order to connect the (meta)genome and other data to predictive models.

12:30-14:00Lunch/boxed lunch