ICSB 2022: THE 21ST INTERNATIONAL CONFERENCE ON SYSTEMS BIOLOGY (ICSB 2022)
PROGRAM FOR MONDAY, OCTOBER 10TH
Days:
previous day
next day
all days

View: session overviewtalk overview

09:15-10:00 Session K6: KEYNOTE VI: Michael Yaffe

Abstract: Major advances in phosphoproteomic technologies have led to the discovery of over 100,000 sites of protein phosphorylation in the human proteome. Connecting these phosphorylation sites to the protein kinases and signaling networks responsible for their creation and regulation has been the major bottleneck in deciphering how phosphorylation at these sites contributes to specific cellular phenotypes in health and disease. I will present ongoing work from the Cantley, Turk, Linding, and Yaffe laboratories aimed at conquering this ‘dark phosphoproteome’ by determining the motifs for the complete human serine/threonine kinome using solution phase oriented peptide library screening. Computational analysis of large public phosphoproteomic datasets using these motifs identifies distinct signaling pathways and networks that are up- or down-regulated in response to particular stimuli, and provides insights into complex biological responses that are mediated through cross-network signaling. Secondly, I will present ongoing work examining the control of cell senescence in cancer cells following DNA damage, revealing unexpected roles for MAP kinases in controlling the decision between senescence and proliferation in response to genotoxic stress.

Location: Alexander
09:15
Systems Approaches to Deciphering Kinase Signaling and the Phosphoproteome: A Focus on Cancer, Inflammation, and DNA Damage

ABSTRACT. Major advances in phosphoproteomic technologies have led to the discovery of over 100,000 sites of protein phosphorylation in the human proteome. Connecting these phosphorylation sites to the protein kinases and signaling networks responsible for their creation and regulation has been the major bottleneck in deciphering how phosphorylation at these sites contributes to specific cellular phenotypes in health and disease. I will present ongoing work from the Cantley, Turk, Linding, and Yaffe laboratories aimed at conquering this ‘dark phosphoproteome’ by determining the motifs for the complete human serine/threonine kinome using solution phase oriented peptide library screening. Computational analysis of large public phosphoproteomic datasets using these motifs identifies distinct signaling pathways and networks that are up- or down-regulated in response to particular stimuli, and provides insights into complex biological responses that are mediated through cross-network signaling. Secondly, I will present ongoing work examining the control of cell senescence in cancer cells following DNA damage, revealing unexpected roles for MAP kinases in controlling the decision between senescence and proliferation in response to genotoxic stress.

10:00-10:30Coffee Break
10:30-12:30 Session 7: PREVENTING FUTURE PANDEMICS

Summary: This session covers pressing questions about the biology and epidemiology of the COVID-19 disease such as understanding disease systems and mechanisms, mapping infection, and distinguishing susceptibity and resilience. Recent systems biology and epidemiological data will be highlighted to discuss actionable paths towards SARS-CoV2 infection prevention and resilience. 

Location: Grenander I+II
10:30
The metabolic and proteomic landscape of genome-scale genetic perturbation

ABSTRACT. Metabolic reactions are vital for keeping cells and organisms growing and alive, and problems with cellular metabolism are implicated in ageing and diseases such as cancer, diabetes and brain disorders. Metabolism in the cell is organised in a genome-spanning network, known as the metabolic network, that connects several hundred enzymes with more than a thousand metabolites. In order to understand metabolism at its scale, novel technologies are required. These need to measure metabolites and proteins at precision, at high throughput, and at costs that facilitate systematic perturbation experiments at large scale. In this lecture, I’ll summarise our efforts in using mass spectrometry, yeast as a simple system, as well as human plasma analytics, for conducting hundreds to thousands of analytical proteome measurements, allowing us to study how these complex metabolic processes are controlled/ Technical aspects of the lecture will include the summary of novel mass spectrometric acquisition techniques that centre around high-floware liquid chromatography to measure up to 1,800 samples per week per mass spectrometer, new acquisition schemes, Scanning SWATH, a new DIA-PASEF implementation and Zeno-SWATH, as well as new software that centres around the DIA-NN suite, developed in my laboratory. I'll further show unpublished results that i) demonstrate the acquisition of large numbers of plasma proteomes to study human metabolic disease, ii) a study that involved the acquisition of more than a 1,000 proteomes of yeast natural isolates to understand dosage compensation in genomic aneuploidies, and iii) the generation of a proteome for each non-essential yeast gene knock-out, to study the function of so far uncharacterized proteins.

10:50
Multi-dimensional immune correlates of early prediction of COVID-19 severe versus asymptomatic disease course

ABSTRACT. Background and objectives: COVID-19 patients present a versatile range of severity from asymptomatic to debilitating symptoms and critical conditions requiring hospitalization. It is still not understood why some patients have a severe or highly symptomatic COVID-19 disease course while others remain asymptomatic. To understand the viral and immune correlates of severity we investigated multiple dimensions of the immune response, including single cell analysis, in asymptomatic versus symptomatic versus hospitalized patients early after SARS-CoV-2 infection. Materials and Methods: Patients (N=104) were recruited early (within seven days post symptom onset) after SARS-CoV-2 infection (wild-type and alpha variants) and four samples were taken within the first month. Serum cytokine levels by ultra-sensitive ELISA (Simoa, Quanterix) and oral/nasal SARS-CoV-2 viral load (RNA and Nucleocapsid-antigen) were measured longitudinally. PBMCs were analyzed by singe-cell intra-cellular-staining (ICS) cytometry, single-cell CyTOF and EliSpot, before and after stimulation with Spike and Nucleocapsid peptide pools. Bioinformatics analysis (PCA, UMAP and ML) was performed to cluster the patients’ groups and identify the cellular immune correlates. Results: A cytokine combination, based on the ratio of inflammatory cytokines to type-I-interferons, was identified as a highly accurate (>95%) early predictor of both the likelihood for hospitalization and symptoms´ severity. Hospitalized patients have significantly higher levels and longer duration of inflammatory cytokines. Moreover, asymptomatic patients present with significantly lower viral loads, in correlation with higher frequencies and counts of SARS-CoV-2 specific activated CD4 and CD8 T-cells. In particular, asymptomatics show a significantly higher frequency of Th1 related cytokine expression, and of note a higher level of poly-functional CD4 T-cells expressing multiple cytokines. Interestingly, asymptomatic status is more significantly associated with a potent CD4 response against Nucleocapsid-antigen, rather than against Spike-antigen. Furthermore, inflammatory cytokine (e.g., Interleukin-6) levels do not correlate with counts of activated, or cytokine expressing, CD4 and/or CD8 T-cells specific for SARS-CoV-2. Rather, higher levels of inflammatory cytokines are correlated with a higher count of classical monocytes. Conclusions: COVID-19 hospitalization and symptoms´ severity can be accurately predicted already within one week of symptoms onset. This early predictor, using cytokines levels that are feasibly measurable at point-of-care setting, can guide personalized medicine with anti-viral or cytokine-inhibitor therapy. Furthermore, extensive single cell analysis allowed us to identify the immune correlates behind this predictor. Asymptomatic disease course is characterized by a potent anti-Nucleocapsid CD4 Th1 response, in association with lower viral loads and lower inflammatory cytokine levels. Conversely, more severe patients have less potent SARS-CoV-2 specific T-cell response and higher viral loads, as well as higher levels of inflammatory cytokines, apparently produced by monocytes rather than by specific T-cells. These results have important clinical relevance for the development of both personalized therapy and vaccines aimed at reducing severity, by indicating the source of the inflammatory cytokine storm.

11:03
New workflow predicts drug targets against SARS-CoV-2 via metabolic changes in infected cells
PRESENTER: Nantia Leonidou

ABSTRACT. COVID-19 is one of the deadliest respiratory diseases, and its emergence caught the pharmaceutical industry off guard. This study presents a novel workflow to predict robust druggable targets against emerging RNA viruses using metabolic networks and information of the viral structure and its genome sequence. For this purpose, we implemented pymCADRE and PREDICATE to create tissue-specific metabolic models, construct viral biomass functions and predict host-based antiviral targets out of one or more genome sequences. We observed that pymCADRE reduces the computational time of flux variability analysis for internal optimizations. We applied these tools to the bronchial epithelial cells infected with SARS-CoV-2 and identified enzymatic reactions with inhibitory effects. The most promising reported target was the Nucleoside Diphosphate Kinase (NDPK1 ), for which the literature reports inhibitors. Finally, we computationally tested the robustness of our targets in all known variants of concern, verifying NDPK1 ’s inhibitory effect. Since our workflow focuses on metabolic fluxes within infected cells, it is applicable for rapid hypothesis-driven identification of potentially exploitable antivirals concerning various viruses and host cell types.

Availability: https://github.com/draeger-lab/pymCADRE/ and DOI: 10.20944/preprints202203.0290.v2

11:16
Temporal Proteomics and Transcriptomics Unravels the Host-Pathogen Interaction Network of Macrophages and Corynebacterium diphtheriae.
PRESENTER: Luca Musella

ABSTRACT. Corynebacterium diphtheriae had been the etiological agent of severe diphtheria epidemics in early industrial times and prior to mass immunization against the eponymous secreted toxin. Nevertheless, scientists currently believe that C. diphtheriae is a re-emerging pathogen, as outbreaks, antibiotic-resistance and systemic infections caused by non-toxigenic strains have been recently reported. In this study, we investigated the host-pathogen interactions between THP-1 derived macrophages and C. diphtheriae ISS3319, a non-toxigenic strain with superior intracellular survival within the macrophage’s phagolysosome, in comparison to other non-pathogenic bacteria. An ad hoc infection assay was set up and total RNA and protein contents were collected 4 and 24 hours after bacteria inoculation, along with control groups, and respectively processed via RNAseq and HPLC-MS/MS. Differential gene expression, integrated with enrichment analyses, genomics data, homologous mapping and networks reconstruction, suggests a mechanistic interpretation of the infection process across time, intracellular compartments and metabolic processes, in a systems biology fashion.

11:28
Mathematical model of the immunopathological progression of tuberculosis

ABSTRACT. Tuberculosis (TB) is a worldwide persistent infectious disease caused by bacteria from the Mycobacterium tuberculosis complex. It is one of the top 10 causes of death worldwide and approximately a quarter of the world's population is latently infected. Efficient treatments are difficult to establish as there is insufficient understanding of the molecular mechanisms behind the immunopathological progression of the disease. Using an integrative systems biology approach, we study the immunopathological progression of TB, analysing the key interactions between the cells involved in the infectious process. We integrated multiple in vivo and in vitro datasets from immunohistochemical, serological, molecular biology and cell count assays into a mechanistic mathematical model. Our ODE model captures the regulatory interplay between the phenotypic variation of the key cellular players involved in the disease progression and the inflammatory microenvironment. The model reproduces in vivo time course data of an experimental model of progressive pulmonary TB in mouse, accurately reflecting the functional adaptations of the host-pathogen interactions as the disease progresses through three phases: 1: innate immune response, 2: adaptive immune response and 3: anti-inflammatory response. We used the model to assess the effect of genotypic variations encoded as changes in parameters on disease outcomes and consistently found an all-or-nothing response, where the virtual mouse either completely clears the infection or suffers an uncontrolled Tb growth. Results show that it is 84% probable that mouse submitted to a progressive pulmonary TB assay will end up with an uncontrolled infection. The simulations also show how the genotypic variations shape the transitions across phases. All genotypes evaluated eventually progressed to phase 2 of the disease, suggesting that adaptive immune response activation is unavoidable. When stationed in phase 2, the infection was cleared. The anti-inflammatory conditions that characterize phase 3 have the highest probability of leading to uncontrolled bacterial growth; in contrast, the pro-inflammatory genotype associated with phase 2 has the highest probability of bacterial clearance. 42% of the genotypes evaluated showed a bistable response, with one stable steady state corresponding to infection clearance and the other one to uncontrolled bacterial growth. A more exhaustive analysis of the key model mechanisms was done through a bifurcation analysis, which showed in a quantitative fashion how already clinically-relevant mechanisms like the capacity of bacteria to increment macrophage phagocytic capacity can lead the system to an uncontrolled infection. Together, our analysis suggests that initial conditions in bacterial and macrophage loads coupled with the inflammatory microenvironment play a key role in determining the outcome of the disease. It is a step forward in understanding the mechanisms that shape TB progression.

11:40
Proteome-scale mapping of binding sites in the intrinsically disordered regions of human and viral proteomes

ABSTRACT. Protein-protein interactions are central to cell function. Numerous studies have provided large-scale information on human protein-protein interactions. However, many interactions remain to be discovered, and low affinity, conditional and cell type-specific interactions disproportionately under-represented. I will present our efforts towards finding the short linear motif (SLiM)-based. I will describe our optimized proteomic peptide-phage display (ProP-PD) library that tiles all disordered regions of the human proteome and allows the screening of ~1,000,000 overlapping peptides in a single binding assay, and the tools and guidelines we have defined for processing the data (1). Using the approach we identified >2,000 interaction pairs for 35 known SLiM-binding domains and confirmed the quality of the produced data by complementary biophysical or cell-based assays. The amino acid resolution binding site information can be used to pin-point functionally important disease mutations and phosphorylation events in intrinsically disordered regions of the proteome. I will further describe how we developed ProP-PD for large-scale pan-viral discovery of virus-host factor protein interactions (2,3). We screened more than 130 protein domains for binding to viral peptides using a phage peptidome tiling the intrinsically disordered protein regions of 229 RNA viruses and report the pan-viral discovery of 1,712 SLiM-based virus-host interactions. We identified cellular pathways frequently deregulated by viral motif mimicry and translated the high-resolution information on virus-host interactions to a peptide-based antiviral inhibitor of an interaction between the G3BP1/2 proteins and an ΦxFG peptide motif in the SARS-CoV-2 nucleocapsid (N) protein. ProP-PD may thus be used both to illuminate the motif-based part of the interactomes and to uncover leads for innovative inhibitor design.

References: 1. Benz et al., (2022) Proteome-scale mapping of binding sites in the unstructured regions of the human proteome. Mol Syst Biol. 18(1):e10584. 2. Kruse et al., (2021) Large scale discovery of coronavirus-host factor protein interaction motifs reveals SARS-CoV-2 specific mechanisms and vulnerabilities. Nat Commun. 19;12(1):6761. 3. Mihalic et al., (2022) Large-scale phage-based screening reveals extensive pan-viral mimicry of host short linear motifs. bioRxiv https://doi.org/10.1101/2022.06.19.496705

11:52
FAIR sharing of reproducible COVID-19 models of epidemic and pandemic forecast

ABSTRACT. A major challenge for the dissemination, replication, and reuse of epidemiological forecasting studies during COVID-19 pandemics is the lack of clear guidelines and platforms to exchange models in a Findable, Accessible, Interoperable, and Reusable (FAIR) manner, hindering the reproducibility of research outcomes. During the beginning of pandemics, models had been developed in diverse tools that were not interoperable, opaque without traceability and semantics, and scattered across various platforms - making them hard to locate, infer and reuse. In this work, we demonstrate that implementing the standards developed by the systems biology community to encode and share COVID-19 epidemiological models can serve as a roadmap to implement models as a tool in medical informatics, in general. As a proof-of-concept, funded by the EOCS rapid COVID-19 program in 2020, we provided 24 epidemiological models via the Biomodels repository for FAIR sharing. These models were encoded using standard formats for model exchange in systems biology, annotated with cross-references to common biomedical data resources, and were packed up with relevant associated data in so-called COMBINE archives for easy sharing. Dissemination of the resulting simulation studies through BioModels repository significantly enhanced the models’ reproducibility and repurposing potential. We recommend the use of open systems biology standards to encode models of epidemic and pandemic forecasts, and we encourage the FAIR sharing of these models through open model repositories. These actions shorten the time for the analysis and forecast of viral transmission and enable us to stay prepared for the future pandemic situation.

12:02
Identification of a SARS-CoV-2 inhibitor by combining binary protein-protein interaction mapping, structure prediction and virtual screening

ABSTRACT. Protein-protein interactions (PPIs) are important targets to expand the druggable proteome beyond traditional protein families such as kinases or GPCRs. Different approaches have been developed to map and characterize PPIs and find potential inhibitors and stabilizers. However, a coherent framework that optimally integrates different methods to prioritize PPI targets and small molecule inhibitors is missing.

In this study, we established a machine-learning algorithm that is able to classify true-positive PPIs from quantitative binary interaction data and show its universal applicability for commonly used assays. We validate the algorithm on an established PPI positive and random reference set (PRS & RRS), and a newly assembled multi-protein complex reference set. The reference set-trained algorithm was then applied to classify 350 SARS-CoV-2 protein interactions, resulting in the identification of 26 high-confidence PPIs between SARS-CoV-2 proteins.

Using AlphaFold, we further predict the structures of 20 of the identified protein complexes and determine their interaction interfaces. Finally, we use the structural information to guide an ultra-large small molecule docking to the interaction interface of the SARS-CoV-2 methyltransferase complex NSP10/NSP16. We thereby identify a NSP10-targeting small molecule, which inhibits its interaction with NSP16, the enzymatic activity of the methyltransferase complex, and the replication of SARS-CoV-2 in an phenotypic assay. By systematically integrating experimental and computational approaches, this pipeline can be used to identify and validate PPI targets and perform early-stage PPI drug discovery.

12:14
A multi-strain model of immunization against influenza A viruses after infection or vaccination
PRESENTER: Lara Bruezière

ABSTRACT. Introduction Influenza A viruses (IAV) are responsible for worldwide seasonal epidemics of flu disease. They are divided into subtypes based on a combination of viral surface proteins HA and NA. Those proteins act as potent antigens in natural infection and are the main antigen components in vaccines against influenza. The challenge of effective long-term vaccination requires an understanding of the immune response triggered by infection or vaccination, and how this response is impacted by the antigenic evolution of viral surface proteins over the years. We delved into such questions through a mechanistic, knowledge based, modeling approach.

Materials and Methods We developed a Multi-Strain Influenza Disease Model (MSIDM) describing viral and in-host dynamics including immunization due to infection and vaccination. The model accounts for cell migration, interactions between cells, antigens, antibodies and cytokines, as well as cross-reactivity of immune cells formed during previous immunizations with antigens of a current strain. Cross-reactivity is modeled with three specific populations of immune memory cells coexisting for each individual: those developed during past infections, those developed against the yearly vaccine strain and those developed against the seasonal circulating strain. Strain-specific immunity is then implemented as the result of different interaction strengths of these three immune subpopulations with the antigens, depending on antigenic distance. To assess the model scope we performed exploratory analysis on a virtual population, built and calibrated to account for inter-patient and virus-specific variability.

Results We succeeded in simulating realistic dynamics in the context of IAV infection using the MSIDM. This offers insights on what impacts clinical outcomes such as duration and severity of symptoms, viral load and epithelial damage in response to an exposure following a vaccination or a primo-infection. The MSIDM also describes the long-lasting immunization process starting upon new strain encounter, either by vaccination or infection, in particular predicting the hemagglutination inhibition assays (HI titers). The model reproduces clinical observations made on both H1N1 and H3N2 viral subtypes, such as partial immune escape and lower vaccine efficacy against H3N2. It allows comparing the relative efficacy of different vaccines such as HA recombinant and Split vaccines.

Conclusion First analyses show how the model can help in exploring how antigenic distance can explain variability in vaccine efficacy from season to season, and how patient immune phenotype can explain variability in vaccine efficacy within a season. With further calibration and validation using in vivo data, the MSIDM could be used to test vaccination strategies taking into account antigenic drift and shift. It could also help in understanding how vaccine efficacy is impacted by viral- and host-related factors such as virus avidity for host cells and immunosenescence.

12:17
Seroepidemiology And Modeling Of SARS-CoV-2 In Ethiopia: Longitudinal Cohort Study Among Front-line Health-care Workers And Community
PRESENTER: Simon Merkt

ABSTRACT. Background: African countries were spared from an overwhelming burden of COVID-19 during the so-called first wave of the pandemic in 2020, but increasingly experience impacts on health systems during the second and third wave in 2021. Due to limited surveillance information the true COVID-19 burden in a country such as Ethiopia remains unknown. We aimed to investigate seroepidemiology of SARS-CoV-2 among frontline healthcare workers (HCW) and communities in Ethiopia. Methods: We conducted a population-based, longitudinal cohort study involving HCW, urban residents, and rural communities in Jimma and Addis Ababa. Serology was performed in three consecutive rounds to obtain seroprevalence and incidence estimates within the cohorts. Moreover we constructed SEIR models for the progression of the SARS-CoV-2 epidemic in Ethiopia and used Baysian approaches for their calibration. Results: SARS-CoV-2 seroprevalence among HCW increased dramatically during the study period. This corresponded with national Covid-19 disease data. The models predicted saturation level of 50%-70% for wild type virus that was confirmed by third round data. However, assuming the introduction of variant strains and re-infections, saturation levels of 80%-90% were estimated. Conclusion: SARS-CoV-2 spread in Ethiopia has been highly dynamic among HCW and urban communities. It can be speculated that the greatest wave of SARS-CoV-2 infections is currently evolving in rural Ethiopia, thus requires attention in respect to healthcare burden and disease prevention. These findings should also greatly impact Covid-19 vaccine strategies in African countries, as for most individuals this will represent booster immunization after prior SARS-CoV-2 exposure. Likely efficient one shot administrations in combination with seroprevalence assessment might be cost-effective, and especially applicable in the context of limited vaccine availability.

12:20
COVIDpro: Database for mining protein dysregulation in patients with COVID-19
PRESENTER: Augustin Luna

ABSTRACT. Background The ongoing pandemic of the coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) still has insufficient treatments. This is partially due to our incomplete understanding of the molecular dysregulations of the infected patients. We aimed to generate a repository and data analysis tools to examine the modulated proteins underlying COVID-19 patients for the discovery of potential therapeutic targets and diagnostic biomarkers.

Methods We built up a web server containing proteomic expression data from COVID-19 patients and equipped it with a user-friendly data analysis and visualization toolset. Specifically, we manually curated the proteomics data of COVID-19 patients published before May 2022. Relevant proteomic data was collected by manual curation of all proteomic data deposited on both ProteomeXchange and data found via PubMed search to produce a comprehensive dataset. Protein expression by disease subgroups across projects was compared. We also visualized differentially changed pathways and proteins. Moreover, circulating proteins that differentiated severe cases were identified as putative predictive biomarkers.

Findings We report a web server COVIDpro (https://www.guomics.com/covidPro/) containing the proteomics data generated by 41 original studies from 32 hospitals worldwide, involving 3077 patients covering 19 types of clinical specimens; the majority from plasma and sera. 53 protein expression matrices were collected, reporting a total of 5434 samples and 14,403 unique proteins. Our analyses showed that the lipopolysaccharide-binding protein, which was identified by the majority of the studies, was highly expressed in the blood samples of severe patients. A panel of significantly dysregulated proteins was identified to separate patients with severe disease from non-severe disease.

12:23
Control of COVID-19 Outbreaks under Stochastic Community Dynamics, Bimodality, or Limited Vaccination

ABSTRACT. The COVID-19 pandemic shows that controlling COVID-19 outbreaks remains challenging even in countries with high vaccination levels. To identify limits of control for and effective measures against future outbreaks mathematical models are required. Not least because interactions in-between humans as well as virus transmission itself are rather stochastic than deterministic processes, which affects conclusions drawn only from empirical data on COVID-19 spreading.  By building an open-source geospatially referenced, demographic, agent-based model (GERDA) we could repeatedly simulate Covid-19 outbreaks in detailed communities. Based on this SIR+ simulations, we showed that COVID-19 outbreak dynamics are community-specific and depend on heterogeneity and stochasticity of human-human interactions. When comparing different vaccination strategies, we found that the herd immunity threshold depends strongly on the applied vaccination strategy.  Further, if vaccine supply is limited different vaccination strategies are optimal for reducing fatalities or for confining an outbreak. Prioritizing highly interactive people diminishes the risk for an infection wave, while prioritizing the elderly minimizes fatalities. 

The inherent stochasticity of virus spreading can, on the one hand, also lead to bimodal outcomes, which renders the effect of limited non-pharmaceutical interventions in these scenarios uncertain.  On the other hand, the stochasticity reduces the suitability of the reproduction number R0  as a predictor for the behavior of the system or the infectiousness of the virus, in low-incidence scenarios.

12:26
Non-parametric model-based estimation of the effective reproduction number for SARS-CoV-2
PRESENTER: Jacques Hermes

ABSTRACT. Viral outbreaks, such as the current COVID-19 pandemic, are commonly described by compartmental models by means of ordinary differential equation (ODE) systems. The parameter values of these ODE models are typically unknown and need to be estimated based on accessible data. In order to describe realistic pandemic scenarios with strongly varying conditions, these model parameters need to be assumed as time-dependent. While parameter estimation for the typical case of time-constant parameters does not pose significant issues, the determination of time-dependent parameters, e.g.~the transition rates of compartmental models, remains notoriously difficult, in particular since the function class of these time-dependent parameters are unknown. In this work, we present a novel method which utilizes the Augmented Kalman Smoother in combination with an Expectation-Maximization algorithm to simultaneously estimate all time-dependent parameters in an SIRD compartmental model. This approach only requires incidence data, but no prior knowledge on model parameters or any further assumptions on the function class of the time-dependencies. In contrast to other approaches for the estimation of the time-dependent reproduction number, no assumptions on the parameterization of the serial interval distribution are required. With this method, we are able to adequately describe COVID-19 data in Germany and to give non-parametric model-based time course estimates for the effective reproduction number. This approach can also be applied in a cell biological context where time-dependent parameter or unknow stimuli need to be estimated.

12:27
Federated mathematical modelling and machine learning for a multi-national cohort of COVID-19 patients
PRESENTER: Manuel Huth

ABSTRACT. The COVID-19 pandemic has highlighted the need for disease modelling based on mathematical and machine learning techniques in order to understand the impact of SARS-CoV-2 crisis. ORCHESTRA is an international research project which aims to deliver scientific evidence for analyses by using datasets supplied by cohorts from various European and non-European countries. The combined information provided by these partners allows for more robust evidence-based disease modelling to take place, given the larger sample size and variation of COVID-19 measures in different regions. However, the sharing of data is a prevalent issue within healthcare in general, as different partners are subject to different data privacy restrictions due to patient concerns and governing legal entities.

To address this problem, we use federated learning, a machine learning technique that trains an algorithm across each cohort’s data storage center by sending non-disclosive aggregated information around, thereby achieving the same results as pooled analyses while respecting the privacy restrictions of each cohort member. In order to provide a data-driven analyses that measures the impact of various factors related to the COVID-19 pandemic, we report on federated algorithms that are central to statistical impact evaluation and well-known from the Econometrics literature. The implemented methods (supported through the RDataShield package) include the non-parametric Difference-in-Differences with multiple time periods, instrumental variable regression allowing for consistent estimates under endogeneity problems, and propensity score matching methods.

The establishment of this federated analysis framework, not only improves the prevention and treatment of COVID-19, but also provides better preparation for future pandemics.

12:28
INSIDe: Integrative modeling of the spread of serious infectious diseases

ABSTRACT. The modeling of the spread of SARS-CoV 2 and in particular of local outbreaks has been crucial for analyzing the pandemic and its guiding policy. Building surveillance models and pipelines with reliable forecasts is crucial for the prevention of future infectious disease outbreaks. Yet, the insights and forecasts provided by existing models are necessarily limited by their resolution and the data used for inference. Models which are based solely on a single data source, like reported numbers of new infections, might have biased forecasts. Case reports might not be representative, whereas serological studies are costly and might have poor time-resolution. Wastewater monitoring, however, has proven to serve as an early indicator for the rise of reported infections and hospitalizations. The current challenge is the integration of different data sources and the interconnection of their respective modeling and simulation frameworks. We will address this with a modular, open-source platform allowing for: (i) the assembly and simulation of complex models consisting of multiple submodels (ii) the data-driven inference of unknown model parameters (e.g. the effect of NPIs) and the design of observation (testing) strategies. The INSIDe platform will combine three state-of-the-art software frameworks: ++SYSTEMS for the fine-grained simulation of flow patterns in wastewater systems, MEmilio for the simulation of the spatio-temporal spread of infectious diseases and pyABC for data-driven modeling of multi-scale processes. Combining these frameworks, we can facilitate the integration of different information. Integrative modeling will improve the assessment of the current state of epi-/pandemics, achieve more robust and reliable predictions, reduce uncertainty and thus allow decision makers to employ more precise non-pharmaceutical interventions (NPIs) to prevent outbreaks of a disease.

10:30-12:30 Session 8: YOUNG SCIENTIST FORUM

Summary: Artificial intelligence (AI), with its ability to propose decisions for world-wide threats, generate realistic videos and imitate human thought processes, is rapidly making its way into many domains of science and everyday life. At the same time, certain spectacular failings of this technology require an intervention by well contextualised human knowledge. In systems biology, AI is now complementing classical modeling approaches (ODE and Boolean models, for example) and is part of new hybrid modeling approaches. Three panel members, representing three modeling approaches, will discuss their benefits and drawbacks in the context of systems biology, a field with its own specific issues in data acquisition and knowledge representation. We open the discussion to topics of current limitations and possible future goals in the field, as well as the role of systems biology in the broader social context. We invite the audience to challenge the panel members with questions they would like to see addressed by the community.

10:30
Introduction
10:35
Pitching for Classical modeling
10:45
Pitching for AI / ML modeling
10:55
Pitching for hybrid/data-driven modeling
11:05
Panel Discussion
12:05
Q&A
12:25
Final remarks
10:30-12:30 Session 9: SIGNAL TRANSDUCTION

Summary: Responsiveness to signals influencing cell phenotypic behaviors is a key characteristic of living organisms. Processing of encoded information through cellular signal transduction networks triggers the expression of genes and modulation of protein activities in metabolism and cytoskeleton, culminating in either "all-or-nothing" or “graded" cellular decisions. These decisions are affected by the heterogeneity in protein expression and cell size and in multicellular organisms link cellular processes to tissue effects. This session covers advances in technologies such as proteomics and life cell imaging that allow to monitor dynamic behavior at high temporal resolution and quantitative accuracy. Likewise advances in mathematical modelling approaches potentially linking the cell population scale to the tissue or single cell level will be covered. Synergies of AI based modeling and mechanistic modelling provide novel avenues to unravel principle mechanism that determine dynamic behavior at multiple scales. 

Location: Alexander
10:30
Combining multi-omics and biological knowledge to understand signaling deregulation in disease

ABSTRACT. Multi-omics technologies, and in particular those with single-cell and spatial resolution, provide unique opportunities to study the deregulation of intra- and inter-cellular signaling processes in disease. I will present recent methods and applications from our group toward this aim, focusing on computational approaches that combine data with biological knowledge within statistical and machine learning methods. This combination allows us to increase both the statistical power of our analyses and the mechanistic interpretability of the results. I will also discuss the value of performing perturbation studies, combined with mathematical modeling, to increase our understanding of signaling deregulation and find therapeutic opportunities.

10:50
Cross-inhibitory feedbacks of nucleocytoplasmic transport give rise to divergent multicellular phenotype in ERBB2-activated breast epithelia
PRESENTER: Bishal Paudel

ABSTRACT. A major challenge in biology is to understand how similar cells exhibit alternative phenotypes in uniform environments. Mixed transcriptional states can be engineered in bacterial cells by synthetic, mutually repressive genetic circuits (Gardner et al. Nature [2000]), but similar examples in eukaryotic cell biology are rare. In 3D cultures of breast epithelia, synthetically activated ERBB2 dimers induce hyperproliferative, disorganized spheroids in one third of the culture (Muthuswamy et al. Nature Cell Biology [2001]). A mechanism that drives incomplete penetrance in this divergent multicellular fate has remained elusive. To systematically identify mechanisms, we randomly profiled 10-cell transcriptomes of single 3D spheroids after ERBB activation and frequency matched induced transcripts to the steady-state proportions of the phenotype. The analysis uncovered a network of nucleocytoplasmic transport regulators, which when perturbed altered proportion of outgrowths by synergizing with (CSE1L, NPIPB11) or antagonizing (NUP37, KPNB1) the ERBB-induced phenotype. To reconcile these results, we constructed an integrated systems model of nucleocytoplasmic transport, and linked transcript abundances to the nuclear-accumulation of different cargo types. The model predicted that an elevated exportin (CSE1L) inhibits classical nuclear-localizing cargoes from accumulating in the nucleus. By proximity-labeling mass spectrometry and follow-up methods, we showed that the ERBB receptors interact with CSE1L and importins, re-localizing to the nucleus by the classical transport pathway. Blocking receptor internalization from the plasma membrane significantly elevated ERBB-induced outgrowth frequency, suggesting that receptors translocated to the nucleus dampen the emergence of disorganized spheroids. Using ChIP-Seq, we found that ERBB1 binds to the promoter of MIR205, a microRNA that negatively regulates the expression of importins, especially KPNA1. These results indicate that ERBB receptors translocate to the nucleus and attenuate outgrowth frequency by MIR205-mediated inhibition of importins, while exportin (CSE1L) induction relieves this inhibition by reducing ERBB nuclear accumulation. Incorporating these feedbacks into the mathematical model causes the steady-state accumulation of ERBB-like cargo to become ultrasensitive to CSE1L abundance. Interestingly, human ERBB2 has a weaker nuclear localization signal than rat Erbb2—dampening nuclear localization signal attenuates CSE1L sensitivity and significantly increases phenotype penetrance. Taken together, these results show that nucleocytoplasmic shuttling of receptors in ERBB-active breast epithelia generates tunable cross-inhibitory feedback through nucleocytoplasmic transport that results in long-term, heterogeneous multicellular fates.

11:05
Cell-to-cell variability in JAK2/STAT5 pathway components and cytoplasmic volumes defines survival threshold in erythroid progenitor cells
PRESENTER: Marcel Schilling

ABSTRACT. Erythropoietin (Epo) induces survival in colony-forming unit-erythroid (CFU-E) cells by activation of the Janus kinase 2 (JAK2)/signal transducer and activator of transcription 5 (STAT5) signal transduction pathway. While survival or apoptosis is a binary decision in individual cells, it was observed that with increasing doses of Epo, the number of surviving cells is increased in a graded manner at the cell population. To identify components of the JAK2/STAT5 signal transduction pathway that contribute to the graded population response, we combined dynamic pathway modeling at the population level with mixed-effect modeling at the single cell level. We calibrated a cell-population-level model with heterogeneous data derived from quantitative immunoblotting, mass spectrometry and qRT-PCR. We extended this mathematical model with flow cytometry data to study the behavior in single cells. By performing model selection, we revealed that the high cell-to-cell variability in nuclear phosphorylated STAT5 is caused by variability in the amount of Epo receptor (EpoR) in complex with JAK2 and of the phosphatase SHP1, as well as the extent of nuclear import. The variability of the nuclear import of STAT5 was predicted by the model to be caused by a large variance in the cytoplasmic volumes of CFU-E cells, which we validated experimentally. Further, the single-cell model predicted that 24-118 pSTAT5 molecules in the nucleus for 120 min are sufficient to ensure cell survival. Thus, variability in membrane-associated processes is sufficient to convert a switch-like behavior at the single-cell level to a graded population-level response.

11:20
Mechanisms of Pro- and Anti-Oncogenic Signaling Downstream of Dual Tyrosine Protein Kinases

ABSTRACT. Bio/Abstract: Pau Creixell studied Human Biology at the Universitat Pompeu Fabra, Barcelona, before moving to Oxford, London and the Technical University of Denmark where he obtained his PhD in Computational Biology. He then received a postdoctoral fellowship from the Helen Hay Whitney Foundation to pursue his postdoctoral studies at the Koch Institute for Integrative Cancer Research at MIT. After securing a K99/R00 Pathway to Independence award from NIH/NCI, he established his lab at the University of Cambridge – CRUK Cambridge Institute in 2020. His laboratory integrates machine learning and high-throughput biochemistry to study how proteins selectively recognize their substrates, how this process is perturbed in cancer and how it can be hijacked to find highly selective and mutant-specific drugs to overcome drug resistance. www.creixell-lab.com

In this talk, I will discuss how while other protein domains have evolved to recognize these phosphotyrosines, it is becoming clear that tyrosine kinases can themselves recognize phosphotyrosine within their active sites. How phosphotyrosine recognition by tyrosine kinases is molecularly regulated, and how its deregulation in cancer cells may affect signal transduction, is unknown. In the work I will focus on, we resolve two distinct molecular determinants driving the selective recognition of phosphotyrosine within the active sites of tyrosine kinases. By perturbing a phosphotyrosine-driven, non-canonical downstream substrate, p27Cip1/Kip1, we show that phosphotyrosine recognition is required for cell cycle progression. Moreover, by in vivo screening cancer somatic mutations perturbing these molecular determinants of phosphotyrosine recognition, we show an anti-oncogenic signaling arm downstream of the pro-oncogenic Bcr-Abl kinase. Our results not only suggest that tyrosine kinases have evolved to simultaneously recognize and catalyze the phosphorylation of tyrosine residues, but also that they can have distinct sets of canonical and non-canonical substrates harbouring opposite roles in tumor progression.

11:34
A dynamic HIF1α-PPARγ circuit controls a paradoxical adipocyte regulatory landscape
PRESENTER: Michael Zhao

ABSTRACT. Hypoxia-induced upregulation of HIF1α triggers adipose tissue dysfunction and insulin resistance in obese patients. HIF1α closely interacts with PPARγ, the master regulator of adipocyte differentiation and lipid accumulation, but there are conflicting results how this co-regulation controls the excessive lipid accumulation that drives adipocyte dysfunction. Using single-cell imaging and modeling, we find that, surprisingly, HIF1α both promotes and represses lipid accumulation during adipogenesis. We show that the opposing roles of HIF1α are isolated from each other and depend on when HIF1α increases relative to the positive-feedback mediated upregulation of PPARγ that drives adipocyte differentiation. A theoretical model incorporating our findings resolves conflicting prior results and suggests that three network nodes before and after the isolation step have to be synergistically targeted in therapeutic strategies to revert hypoxia-mediated adipose tissue dysfunction in obesity.

11:47
A Mechanistic model of EGFR and ERK signaling reveals how allostery and rewiring contribute to drug resistance
PRESENTER: Fabian Fröhlich

ABSTRACT. BRAFV600E is prototypical of oncogenic mutations that can be targeted therapeutically and treatment of BRAF-mutant melanomas with RAF and MEK inhibitors results in rapid tumor regression. However, drug-induced rewiring causes BRAFV600E melanoma cells to rapidly acquire a drug-adapted state. In patients this is thought to promote acquisition or selection for resistance mutations and disease recurrence. In this paper we use an energy-based implementation of ordinary differential equations in combination with proteomic, transcriptomic and imaging data from melanoma cells, to model the mechanisms responsible for adaptive rewiring. We describe two parallel MAPK (RAF-MEK-ERK kinase) reaction channels in BRAFV600E melanoma cells that are differentially sensitive to RAF and MEK inhibitors. This arises from differences in protein oligomerization and allosteric regulation induced drug binding. As a result, the RAS-regulated MAPK channel can be active under conditions in which the BRAFV600E-driven channel is fully inhibited. Causal tracing demonstrates that this provides a sufficient quantitative explanation for initial and acquired responses to multiple different RAF and MEK inhibitors individually and in combination.

11:59
Targeted Proteomics-Driven Computational Modeling of the Mouse Macrophage Toll-like Receptor Signaling Pathway
PRESENTER: Nathan Manes

ABSTRACT. Introduction The Toll-like receptor (TLR) signaling pathway is crucial for the initiation of effective immune responses. Subtle variations in the concentration, timing, and molecular structure of the stimuli (e.g., lipopolysaccharide (LPS)) are known to affect TLR signaling and the resulting pathway dynamics. Tight regulation is essential to avoid acute tissue damage and chronic inflammation. Computational modeling can test mechanistic hypotheses about how regulation is achieved and why it sometimes fails, causing pathologies (e.g., sepsis). In this investigation, mass spectrometry (LC-MS) is being used to enable TLR4 pathway modeling.

Methods Mouse (C57BL/6J) bone marrow-derived macrophages were either unstimulated or stimulated with 10 nM LPS for 30 min. A dilution series of stable-isotope labeled internal (phospho)peptide standards was spiked into cell lysates (136 unmodified peptides and 29 phosphopeptides for 54 (phospho)proteins). LC-MS was performed using a Q Exactive HF, and the resulting data were analyzed using Skyline. AlphaFold-Multimer was used to predict protein complex structures, Rosetta was used to idealize and relax the structures, and Simulation of Diffusional Association (SDA) and TransComp were used to estimate protein-protein association rates. Simmune was used to perform rule-based pathway modeling, simulation, and training.

Preliminary Results Most of the TLR4 pathway was quantified successfully. The protein abundances ranged from 1,332 to 227,000,000 copies per cell. They moderately correlated with transcript abundance values (r = 0.699, p = 1.37e-17), and these data were used to make proteome-wide abundance estimates. Abundance increases and decreases in response to LPS were observed for proteins known to be affected by TLR pathway activation. For example, LPS stimulation resulted in the abundance of phosphorylated ERK1 to increase from 30,000 to 250,000 copies per cell. Hundreds of protein-protein association rates were estimated. The (phospho)protein absolute abundance values and protein-protein association rates are being used as parameters for TLR4 pathway models. The parameter space is being explored to identify parameter sets that accurately reproduce experimental data.

Conclusions Experimental and computational techniques are being integrated to generate a strongly data driven model of the TLR pathway. This work was supported by the Intramural Research Program of NIAID, NIH.

12:11
Mechanistic insights into sensitization and desensitization of the Interferon α signal transduction pathway

ABSTRACT. As a key component of the innate immune system, Interferon alpha (IFNα) orchestrates the antiviral response in hepatocytes. The IFNα signal transduction pathway is known to desensitize upon activation which constitutes a major problem for the usage of IFNα as treatment against chronic viral infections or as an anti-tumor drug. However, the mechanisms that lead to this desensitization remain poorly understood.

Here, an ODE model is presented that describes the biochemical reaction network of IFNα signaling in different hepatoma cell lines as well as primary human hepatocytes (PHH). The calibrated model shows that besides a dose-dependent desensitization mediated by the negative feedback components SOCS1 and USP18 that act at the receptor level, the signaling pathway can also show (hyper-)sensitization in consequence of an upregulation of the intra-cellular components IRF9 and STAT2.

The model predicted the dose-dependent dynamics of transcriptionally active complexes in the signaling pathway and their effect on mRNA production, as shown by independent validation experiments. Furthermore, the model-based analysis of measurement data from PHH unraveled that each cell system establishes a particular dose-depending sensitization behavior whose shape is strongly determined by the abundance of the feedback components USP18 and STAT2.

Our findings will help to understand the dynamics of production of Interferon Stimulated Genes (ISGs) which exert numerous antiviral effector functions, and serve as a basis for a patient-individual optimization of the antiviral response upon IFNα stimulation.

12:16
Information Processing by Homo-Oligomeric Proteins: From First Principles to Cardiac Disease

ABSTRACT. Reversible protein homo-oligomerisation, i.e. the formation of larger protein complexes out of identical subunits, is observed for 30-50% of all vertebrate proteins. Despite being a ubiquitous phenomenon, the specific function of protein homo-oligomerisation remains poorly understood. I previously demonstrated theoretically that homo-oligomerisation could be a versatile mechanism for a range of signal processing capabilities such as dynamic signal encoding, homeostasis and bistability via pseudo-multisite modification. In this talk I will present the first dynamical systems model of phospholamban (PLN), a crucial mediator protein of the physiological "fight-or-flight" response triggered by β-adrenergic signaling and a key regulator of calcium cycling in heart muscle cells. Importantly, PLN forms homo-pentamers whose function remained elusive for decades. Simulations and model analyses demonstrate that pentamers enable bistable phosphorylation and further constitute substrate competition based low-pass filters for phosphorylation of monomeric PLN. Both predictions of the model were confirmed experimentally by demonstrating substrate competition in vitro and and by demonstrating hysteresis of pentamer phosphorylation in cardiomyocytes. These non-linear phenomena may ensure consistent monomer phosphorylation and calcium cycling despite noisy signaling activity in the upstream network and may be impaired by perturbations (e.g. via genetic mutations or in the context of underlying heart disease) which cause cardiac arrhythmias. These studies show that homo-oligomerisation can play unanticipated and potentially disease relevant roles in biochemical signaling networks.

12:19
Periodic Forcing of the ERK pathway
PRESENTER: Nguyen Tran

ABSTRACT. Introduction: Signal transduction networks (STNs) compute extracellular biochemical information to regulate intracellular biochemistry. Different biochemistries dictate different cell outcomes. This extracellular information is stored in signal dynamics at cell surface. Therefore, different signaling dynamics can induce different cell outcomes.

We can describe dynamics in terms of frequency. Frequency-dependent cell behaviour has been documented in experimental literature [1,2,3,4].

One physiologically important STN is the ERK pathway. The ERK pathway transmits EGF signals from the extracellular space to a family of proteins in the cytoplasm called ERK. ERK controls a wide range of cell functions in metazoans including survival, growth, metabolism, migration, and differentiation [5]. These functions are vital to understanding and treating cell diseases like cancer. It is therefore fruitful to ask if they are frequency dependent. Knowing which frequencies promote or inhibit functions like cell growth can help develop therapeutics and experimental techniques.

We investigate this frequency response by deriving the transfer function of the ERK pathway. This function which will tell us how sinusoidal EGF inputs are transformed into ERK outputs. This then sets the basis for obtaining ERK responses for arbitrary periodic forcing patterns, such as triangular and rectangular pulse trains, via Fourier analysis.

Results: We study the EGF-activated ERK pathway model presented in [6] and derive its transfer function. Using this function, we predict how different periodic forcing dynamics of EGF yield different ERK activation dynamics and relate these to different cell outcomes in experimental literature.

Conclusion: Our work aims to provide a predictive tool for experimentalists to relate desired inputs and outputs in conducting frequency domain experiments. It also provides insight into the regions of dynamics that may be useful for treating diseases associated with dysfunctional ERK activation.

References: [1] A. Mitchell, P. Wei, and W. A. Lim. Science, 350(6266):1379–83, 2015.

[2] Jared e Toettcher, Orion d Weiner, and Wendell a Lim. Cell (Cambridge), 155(6):1422– 1434, 2013.

[3] P. Hersen, M. N. McClean, L. Mahadevan, and S. Ramanathan. Proc Natl Acad Sci U S A, 105(20):7165–70, 2008.

[4] Zubaidah Ningsih and Andrew H A Clayton. 2020 Phys. Biol. 17 044001

[5] Lavoie, H., Gagnon, J., & Therrien, M. (2020). Nature Reviews Molecular Cell Biology, 21(10), 607-632.

[6] Ryu, H., Chung, M., Dobrzyński, M., Fey, D., Blum, Y., Lee, S. S., ... & Pertz, O. (2015). Molecular systems biology, 11(11), 838.

12:22
Coordination of p53 and MAPK dynamics controls heterogeneous responses to genotoxic agents in single cells

ABSTRACT. Heterogeneous cellular responses to chemotherapy are a significant obstacle in cancer treatment. One source of such heterogeneity is variability in the temporal expression and activity of key signal transduction pathways that detect cell stresses and coordinate appropriate responses in individual cells. We have shown that variable p53 expression dynamics can generate distinct cellular responses to genotoxic agents. However, in some cases distinct stresses can generate the same p53 dynamics but different cell fate outcomes, suggesting integration of dynamic information from other pathways is important for cell fate regulation. We focused on pancreatic cells and quantified the dynamics of p53 and the MAPKs, signaling systems frequently mutated in pancreatic cancer. To determine how MAPK activities affect p53-mediated responses to DNA double strand breaks and oxidative stress, we used time-lapse microscopy to simultaneously track p53 and ERK, JNK, or p38 MAPK activities in single cells. While p53 dynamics were comparable between the stresses, cell fate outcomes were distinct. Combining MAPK dynamics with p53 dynamics was important for distinguishing between the stresses and for generating temporal ordering of downstream cell fate pathways. Cross-talk between MAPKs and p53 controlled the balance between proliferation and cell death. These findings provide insight into how individual cells integrate signaling information from separate pathways with distinct temporal patterns of activity to encode stress specificity and drive heterogeneous cell fate decisions. Furthermore, our results identify timing windows during which combination drug treatments can effectively alter cell fate responses to genotoxic agents.

12:25
Dynamic single cell analysis of a MAPK signaling cascade and its impact on transcriptional output

ABSTRACT. Mitogen-activated protein kinases (MAPKs) are a three tiered signal transduction cascade involved in key cellular processes. They have been shown to intervene in cellular proliferation, differentiation, and multiple stress response pathways (hyperosmotic, oxydative, inflammation). It has also been demonstrated that malfunctions in these cascades lead to severe oncogenic phenotypes due to the critical nature of the processes they control. Although the main players implicated in these cascades are known, we still lack a quantitative and dynamic description of their function. As one major role of MAPKs is to activate transcriptional programs, investigating the kinetics of transcriptional induction can provide key insights in the genetic regulation of these critical cellular components. We aim to understand how the signal transduction from the cascade impacts the transcriptional activation of stress-responsive genes. To this end the yeast Saccharomyces cerevisiae is used to study the dynamics of the MAPK Hog1 during an hyper-osmotic stress response. It is possible to quantify the activity of Hog1 by monitoring its relocation upon induction by osmotic stress as it accumulates in the nucleus. Downstream osmo-stress inducible promoters such as pSTL1, pHSP12, pGPD1 are functionalized with the PP7 system to label nascent mRNAs. This system comprises 24 stem loop repeats which are bound by a labelled phage coat protein. As soon as the mRNA is transcribed, these loops form and a fluorescent signal which accumulates at the active locus can be observed and quantified. From the signals extracted out of these images key parameters of the promoter output are characterised such as peak transcriptional activity, duration and total output. These parameters are correlated to the MAPK dynamics in order to study the impact of MAPK activity on transcritonal response. We report that the MAPK activity pattern we measure is not a predictor of the promoter expression . We investigate whether the global transcriptional capability of the cell conditions the level of transcriptional response and heterogeneity in single cell behaviour. In parallel we are building a mathematical model of MAPK driven transcription using time dependent rates based on experimental measurements and estimates.

12:28
Deciphering dysregulation in Erythroleukemia to design intervention strategies
PRESENTER: Yomn Abdullah

ABSTRACT. Erythropoietin receptor (EpoR) signaling is crucial for the activation and differentiation of erythroid progenitor cells, however its role in erythroleukemia has not been studied. To characterize Erythropoietin (Epo)-induced signal transduction and elucidate its perturbations in erythroleukemia, a systems medicine approach was used. To develop an integrative dynamic pathway model of EpoR signaling and link it to cell proliferation, we adapted the previously established model for EpoR signal transduction in the murine CFU-E erythroid progenitor cells to the context of erythroleukemia. As a cellular model system the cell line AS-E2 was examined since it depends on Epo for survival and growth. Quantitative mass spectrometry was used to compare the proteome of AS-E2 and human CFU-E cells and it was found that AS-E2 cells harbor significantly higher levels of EpoR and a decreased abundance of the negative regulator SHP1. To adapt the parameters of the dynamic pathway model, AS-E2 cells were stimulated with Epo in a dose and time-resolved manner. Quantitative immunoblotting revealed prolonged phosphorylation of EpoR which was reflected by a sustained activation of the pro-proliferative pathways MAPK, PI3K/ AKT and JAK/STAT. Examination of pathway perturbations employing JAK/STAT, MEK and AKT small molecule inhibitors were used to improve identifiability of model parameters. The calibrated model was able to capture the differential effects of the inhibitors on signal transduction and proliferation and enables us to pinpoint major alterations in the cancer cells compared to the healthy situation. These developments will provide the basis to propose an effective targeted therapy for individual erythroleukemia patients.

12:29
Building the knowledge base to understand cellular signal transduction in different inflammatory phenotypes
PRESENTER: Marcus Krantz

ABSTRACT. Within the X-HiDE project, we aim to understand the establishment and resolution of inflammation, and how different states of the underlying signal transduction network results in different inflammatory phenotypes. However, the biochemistry of this signal transduction network is notoriously complex: Each component may be regulated by multiple modifications and interaction partners, which can be combined in a large number of different configurations. Furthermore, single inputs trigger multiple downstream signalling processes, which each may be triggered or antagonised by multiple inputs. Finally, the function of the signal transduction system differs between individuals and cell types, depending on genetic variation and gene expression differences. Consequently, a useful knowledge base must be comprehensive, to account for all those interacting processes, as well as mechanistically detailed, to account for allele and expression differences as well as the impact of drug treatments. Here, we present a literature based mechanistic model of the network recognising infection, from the recognition of pathogen-associated molecular patterns by the toll-like receptors to activation of NF-kappa-B and IRF3/IRF7 mediated transcription. By using rxncon, the reaction-contingency language, we avoid the combinatorial complexity associated with microstate-based formalisms, and hence we can – in contrast to previous efforts – integrate all processes into a single network that defines a unique logical model that can be executed without further parametrisation. While limited to qualitative predictions, it provides a powerful tool for network validation and genotype-to-phenotype analysis. Taken together, we present an approach that reconciles mechanistic detail and scalability in signal transduction modelling, opening the door to comprehensive – in scope and detail – models of the regulatory network in health and disease.

12:30
Boolean dynamic modeling of TNFR1 signaling predicts a nested feedback loop regulating the apoptotic response at single-cell level (EXCEPTION - ZOOM)

ABSTRACT. Tumor Necrosis Factor Receptor 1 (TNFR1) signaling in cells, triggered by TNFα, exhibits cell-to-cell variability in pro-survival and apoptotic phenotypic responses. The causal factor to account such variability is the heterogeneity in signal flow within intracellular signaling entities. Signal flow controls the balance between these two phenotypes. However, modulating such signal flow and make cells favor apoptosis, which has been considered in cancer therapies, is still under investigation. We use Boolean dynamic modelling to account for signal flow path variability and identify 6-node nested feedback loop that facilitates crucial cross-talk regulation between these two phenotypic responses. We achieve this by systematically developing novel approach “Boolean Modeling based Prediction of Steady-state probability of Phenotype Reachability (BM-ProSPR)” to construct reliable partial state transition graph (pSTG) in a computationally efficient manner and analysing pSTG to accurately predict the extent of network’s long term response. We show that knocking-off Comp1-IKK* complex directs the signal flow path leading to ~62% increase in probability to show apoptotic response and thereby favors phenotype switching from pro-survival to apoptosis. Priming cancerous cells with inhibitors targeting the interaction involving Comp1 and IKK* prior to TNFα exposure could be a potential therapeutic strategy.

12:30-13:30 Session L3: LUNCHEON III: BioModels: Model of the year competition

Summary: BioModels’s Model of the year is a new competition for early career researchers, including but not limited to PhD students and postdocs, to applaud emerging leaders in the area of systems biology modelling. This competition aims to recognize exciting modelling research as well as promote reproducibility and good modelling practice among early career researchers. The competition will be officially launched at ICSB2022. Details on the application process, eligibility criteria, guidance to submit reproducible models (required as part of application) to enter this competition will be discussed during this luncheon workshop. More details are available at https://www.ebi.ac.uk/biomodels/competition/model-of-the-year-2022

Location: Grenander I+II
12:30-13:30 Session L4: LUNCHEON IV: JIPipe Workshop

Summary: The continuous development of new microscopy techniques is tracked by the emergence of new or improved image analysis software that can provide the necessary means of quantification. A stalwart amongst these tools is ImageJ, which has served the increasingly demanding needs of the image analysis community for decades. Despite the large community support behind ImageJ that has been keeping the platform up to date, ImageJ is still hindered by the lack of a versatile visual programming language (VPL) that would allow simple creation of fully reproducible and readily expandable workflows of all levels of complexity according to the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles. Here we designed a new VPL, termed Java Image Processing Pipeline (JIPipe, www.JIPipe.org), which provides over 1000 functionalities of ImageJ in a fully visual format. Main features of JIPipe include full reproducibility of the algorithmic procedure, a standardized project- and results format, automatic scalability towards batch analysis, parallel processing, GPU support via CLIJ2, an integration of Cellpose, as well as support for Python and R scripting. JIPipe is fully equipped to be the new tool of choice for both expert and beginner image analysts and can function as a convenient tool where analysis and quantification ideas can be easily cast into visual workflows. These are easy adaptable and adjustable even by experimentalists, thus bridging the gap between image analysts and experimentalists. In this workshop, we will introduce the main features of JIPipe in interactive sessions, where the details of the main graphical user interface will be demonstrated, and a simple but representative image analysis workflow will be built together with the participants. The structure of the individual image operations, as well as of JIPipe’s backbone data structure will be explored with respect to how batch processing and implementation of the FAIR principles are achieved. More complex examples will follow, where the highly sophisticated data annotation system of JIPipe will be introduced via easy-to-follow examples. Projects will include the tracking of nematodes, the quantification of pathogen uptake by immune cells, deep learning-based segmentation tasks, as well as light sheet microscopy data analysis of the kidney.

13:30-15:30 Session 10: COMPUTATIONAL PATHOLOGY

Summary: The digitization of the diagnostics of tissue sections offers interesting application possibilities for patients, doctors and researchers. Not only the digitization process with the viewing software is one of the advantages, but also the possibility to apply decision support systems in the form of artificial intelligence (machine learning). Structured pathological diagnoses, digital histological multiplex images, molecular pathological data as well as known interactions between gene alterations and drugs are the basis for personalized medicine, where individual predictions can be made for each patient.

Location: Grenander I+II
13:30
System Pathology of Cancer

ABSTRACT. Pathology represents a central integrative subject for patient care. Structured pathological findings, annotated digital histological images, molecular pathological data, and known interactions between gene alterations and drugs will in the future be the basis for personalized medicine, in which individual predictions can be made for each patient. Computational Pathology is thus an important and central building block for the implementation of the concept of precision medicine. The lecture will provide an overview of developments in cancer genomics, proteomics and AI against the background of tumor heterogeneity and response to cancer therapy.

13:50
Advantages of Ensemble CNN’s in Computational Pathology.

ABSTRACT. In computational pathology, we often use histopathological data to predict patient survival, response to therapy, or molecular changes, and we face several problems: one major problem is the small amount of available data points, especially when molecular data are needed for the training process. Another problem is the non standardized data collection and the resulting lack of transferability of the trained models between different clinics. Both problems can be solved by using ensembles: For the prediction of molecular subtypes in gastric cancer, we established a bagging ensemble, in which especially unbalanced and small datasets were efficiently used. To improve the transferability of the trained ensembles, we developed noisy ensembles where we intentionally introduced errors during the training process in the use case of cancer detection in ovarian cancer.

14:00
Artificial Intelligence in Digital Pathology Research - Where do we come from and where do we go?

ABSTRACT. Although AI-based Digital Pathology is a relatively young branch compared to many other medical domains, the field has evolved impressively over the last five years. As it advances, new problems emerge that need to be tackled. What are those problems? How might we be able to solve them? What will be the next hot topics in AI-based Digital Pathology? I will give you an overview of the state of the art in AI-based Digital Pathology Research and an outlook on what we will work on in the near future.

14:10
EMPAIA - Harmonizing Access to Clinical-Grade AI Algorithms in Pathology

ABSTRACT. Image-based diagnostics have made great progress based on the use of artificial intelligence (AI) methods. Accordingly, numerous digital and computational pathology projects have been created in the last few years. They are mainly focused on specific diagnostic questions and aim to provide dedicated applications. EMPAIA (EcosysteM for Pathology Diagnostics with AI Assistance), addresses the demand for validated and certified AI solutions by pathologists from a holistic perspective. As a joint effort by all stakeholders in the market, an open ecosystem has emerged. Part of it are pathologists, national and international EMPAIA reference centers, computer scientists, and industry partners from all domains of digital pathology, supported by scientific societies and professional associations as well as legal and regulatory domain experts. The centerpiece of EMPAIA is a platform on which AI algorithms from multiple vendors are integrated into clinical environments using common APIs. AI services for clinical diagnostics and research are available. Their certification is facilitated by the orchestration of developers, reference institutes and certifiers under clear legal framework conditions. The removal of regulatory, legal, technical, and organizational hurdles also promotes the use of AI. Moreover, we established an educational program called „EMPAIA Academy“ to disseminate knowledge with open-access courses and workshops.

14:20
Computational Pathology of Kidney Disease

ABSTRACT. Machine learning (ML), and particularly deep learning (DL) holds great promise to advance and transform the pathology diagnostic and datamining. One reason is that the major approach of pathology is the analyses of morphological changes, i.e. image analysis, in which DL is particularly powerful. Currently, majority of research on ML/DL is focusing on cancer pathology. Here I will show the challenges and potential of AI/DL in non-tumor pathology of kidney diseases, which mainly includes many types of rare diseases. Apart from end-to-end Dl models, I will discuss an approach of transforming histopathological images into meaningful and explainable numerical biomarkers, i.e. the prospect of next generation morphometry (NGM) and pathomics.

14:40
Latent Dirichlet Allocation for Double Clustering (LDA-DC): Discovering patients phenotypes and cell populations within a single Bayesian framework

ABSTRACT. Introduction Human disorders have a highly multifactorial nature and depend on genetic, behavioral, socioeconomic, and environmental factors. The number of metabolic diseases, cancer, and autoimmune pathologies has increased significantly in recent years, making research in this field a public health priority. In parallel, bioclinical routine datasets have expanded in conjunction with all kind of “omics” data, from both the host and microbiota, as well as metabolomic, proteomic, and cytometry data [1]. All these types of data have some underlying structure on their own, taking values on different scales, with different variability, and are differently distributed. In addition, human patients are an equally important source of variability even among carefully selected cohorts: phenotypic variability (age, gender, previous conditions), dietary habits, bad vs good responders to the treatment, etc. In particular, new types of data have emerged which yield description at the cell level i.e. cytometry of scRNA seq. These data add a new layer of structuration that needs to be taken into account.

Motivations and Results From the analytical viewpoint, the single cell data are huge-dimensional matrices produced for each subject. The data dimension, i.e., the number of cells, vary from one individual to another, and note that cell types, as well as the correspondence between the cell populations of the subjects, have to be identified before applying any statistical machine learning method. We refer to the challenge we introduce and consider here as to a double clustering problem, where the aim is to simultaneously, purely from observations without any prior knowledge determine cell types, as well as stratify patients in order to study mechanisms of pathologies explained by particular cell subpopulations. We propose a novel approach to stratify cell-based observations within a single probabilistic framework, i.e., to extract meaningful phenotype from both patients and cells simultaneously. Our method is a practical extension of the Latent Dirichlet Allocation and is used to solve the Double Clustering task .The first step of our framework is the identification of the cell types. Once the cell types are fixed, we can efficiently estimate both probability of a phenotype given a patient and the probability of a cell type given a phenotype. We tested our method on different datasets ranging from simulated patients to whom with AML (acute myeloid leukemia) or Crohn’s disease, and were able to identify simultaneously clusters of patients and clusters of cells related to patients’ conditions. Furthermore, using a network approach, we were able to stratify patients and identify groups of patients with specific phenotypes.

References [1] C. Manzoni, et al, Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Briefings in Bioinformatics, 19(2):286–302, November 2016.

14:45
Reconstructing a quantitative measure of disease dynamics for discovery and clinical benefit
PRESENTER: Amit Frishberg

ABSTRACT. Diseases change over time, both in their phenotypic manifestation as well as the underlying molecular processes, which drive them. Though understanding disease progression is critical for development of diagnostics and treatments, capturing these dynamics is difficult, due to logistical considerations and the high heterogeneity across individuals. We developed TimeAx, which brings time into the equation by building a comparative framework for capturing disease dynamics, in both acute and chronic, from time-series data. We utilized TimeAx to study Urothelial bladder cancer tumorigenesis from gene expression data over time, discovering molecular mechanisms, which would have been difficult to detect otherwise, that drive disease progression, as well as promote clinical symptoms. Specifically, our framework allowed the discovery of an inflection point, where UBC tumors proceed into an advanced pro-metastatic phase, accompanied by a sharp decrease in tumor purity, due to rapid immune-cell and cancer associated fibroblast infiltration, following by major changes in the expression dynamics of the cell cycle and DNA repair mechanisms. Overall, we present a powerful framework for assembly and study of high-resolution disease progression dynamics, providing improved molecular interoperability and predictive clinical benefit.

biorXiv: https://www.biorxiv.org/content/10.1101/2021.11.17.468952v1.full

14:50
Lineage plasticity in prostate cancer depends on FGFR and JAK/STAT inflammatory signaling
PRESENTER: Joseph Chan

ABSTRACT. The inherent plasticity of tumor cells provides a mechanism of resistance to molecularly targeted therapies, exemplified by adeno-to-neuroendocrine lineage transitions in prostate and lung cancer. Here, we investigate the root cause of lineage plasticity by performing single-cell transcriptomic analysis of time-course experiments in genetically engineered mouse models and murine organoid cultures of castrate-resistant prostate cancer following Trp53 and Rb1 deletion. We observe rapid collapse of cell-type fidelity with the emergence of a mixed luminal and basal phenotype with additional EMT-like features. To quantify dynamic changes in plasticity, we develop scBLender, a suite of methods that measure basal-luminal mixing as a proxy for plasticity. We leverage these plasticity metrics to identify Fgfr and Jak-Stat inflammatory signaling as putative drivers of plasticity that are activated early in the time-course prior to any corresponding morphological changes as well as under therapeutic pressure. Genetic and pharmacologic inhibition of Jak1/2 combined with Fgfr blockade in murine and patient-derived organoids not only reversed the plastic state to wild-type morphology, but also restored sensitivity to antiandrogen therapy in models with residual AR expression. Single-cell analysis of clinical biospecimens confirms the presence of mixed basal-luminal cells with elevated JAK/STAT and FGFR signaling in a subset of patients with metastatic disease, with implications for stratifying patients for clinical trials. Collectively, we show that lineage plasticity initiates quickly as a cell-autonomous process that is further increased in the in vivo setting, and through newly developed computational approaches, we identify a pharmacological strategy that restores lineage identity using clinical grade inhibitors.

14:55
Tribus reveals the single-cell tumor architecture and effect of chemotherapy in ovarian cancer
PRESENTER: Julia Casado

ABSTRACT. Multiplexed imaging at single-cell resolution is becoming increasingly useful to decipher the role of cellular microenvironment in cancer and other complex diseases. To identify spatial patterns of single cells on a tissue we must first assign accurate descriptions to each cell in a step known as cell-type phenotyping. This step is challenging due to (i) laborious annotation of ground truth, (ii) segmentation artifacts, (iii) fluorescence noise and batch effects, and (iv) difficulty to reproduce human-biased thresholding. Here we present Tribus, an interactive, knowledge-based classifier that avoids hard-set thresholds and manual labeling, is robust to noise, and takes less iterations from the user than standard labeling of clustering results. Interactive analysis is done via integration with the Napari image viewer, and each analysis creates a detailed report to enable reproducibility. In this study we show that Tribus compares to human knowledge in public benchmarking datasets where manual cell type annotations are supported by the pathology community. We applied Tribus on a dataset consisting of cyclic immunofluorescence (CyCIF) images of six matched ovarian cancer samples collected before and after neoadjuvant chemotherapy. Accurate cell-type phenotyping enabled a high resolution analysis of cellular phenotypes and their spatial patterns, as well as their temporal dynamics during platinum-taxane chemotherapy. Tribus - an easily integratable open-source package - thus enables accurate phenotyping of single cells to facilitate biological discovery from highly multiplexed images.

15:00
3D reconstruction and mathematical modelling of whole slide images to elucidate resistance to the targeted therapy in melanoma
PRESENTER: Janan Arslan

ABSTRACT. Cutaneous melanoma is a highly invasive tumour. Despite the development of modern therapies, most patients with advanced metastatic melanoma have poor clinical prognoses. The most frequent mutations in melanoma affect BRAF, a protein kinase of the MAPK signalling pathway. Therapies targeting both BRAF and MEK are effective in only 50% of patients and, almost systematically, generate drug resistance. In order to understand the mechanistic origin of the resistance, we build multiscale mathematical models describing intracellular dynamics of metabolic pathways and dynamics of melanoma cell populations in interaction with their microenvironment, taking into account both spatial and cellular heterogeneity. Our primary assumption is that under treatment melanoma cells undergo a series of non-genetic transitions, leading to drug tolerant and resistant cell states. This is consistent with single cell mRNAseq studies (Rambow et al., Cell 2018) and led us to a first mathematical model predicting the outcome of treatments (Hodgkinson et al., Front Oncol 2022). Furthermore, like in (Kumar et al., Cell Metab 2019), we expect that the spatial distribution of sensitive and resistant cells depends on the distance to these sources. In order to refine our models, we require 3D reconstructions of blood vessels and cell states in naïve and treated tumours. Starting with whole slide images of melanoma tumors from patient derived xenograft (PDX) mouse models, we build 3D vascular models and use them to predict zonation of hypoxia and metabolic states within the tumour. For this study, PDX samples underwent serial sectioning over 2mm depth. Every 12um depth, slides were stained with hematoxylin and eosin and the two next adjacent slides with cluster of differentiation 31 (CD31, a blood vessel marker) and CA9 (a hypoxia marker). The 3D reconstruction pipeline involves three steps: 1) Vessel segmentation in 2D sections performed by deep learning with U-Net architecture, 2) Image registration developed using the scale-invariant feature transform, a feature-based method, 3) Vessels 3D rendering performed using a marching cubes algorithm. An original feature of our pipeline is its ability to handle sparse data by generation of synthesized slides using a Generative Adversarial Networks algorithm. This addition is also useful within a clinical context where synthesized slides can be artificially created from a handful of existing, real clinical slides. The resulting 3D vascularization model is used to predict the distribution of hypoxia in the tumor using a partial differential equations (PDE) model pre-trained on adjacent CD31 and CA9 stained sections. Another original aspect of this work is that the PDE model is trained with a few 2D sections and validated using 3D reconstructions. Future work will include modeling of the cell population dynamics in 3D reconstructed tumors and validation of these results on a selection of slides using Imaging Mass Cytometry.

15:05
Automated detection and regional classification of Cerebral Amyloid Angiopathy in digital whole slide images
PRESENTER: Lise Minaud

ABSTRACT. The deposition of amyloid beta (Aß) in cortical and leptomeningeal brain vessel walls, termed Cerebral Amyloid Angiopathy (CAA), can increase susceptibility to brain hemorrhages. CAA is present in 30% of neurologically-normal elderly and frequently co-occurs with Alzheimer's Disease (AD). Characterizing the frequency and anatomic distribution of CAA, typically by human experts visually examining stained brain sections, is a time-consuming task. Furthermore, inter-rater variability in expert assessment limits the ability to harmonize data across studies. Hence, methods are needed for scalable, reproducible assessments of CAA. We introduce an automated tool to reliably overcome these challenges, by training a deep learning model to locate and classify CAAs throughout the brain. We collected 95 whole slide images (WSIs) of postmortem human brain tissue from three institutions, derived from temporal, occipital, and frontal cortices, immunostained with four different antibodies for Aß. We processed the WSIs into 256x256 pixel tiles, from which we isolated ~20,000 candidate tiles of potentially CAA-affected vessels. Six experts independently annotated each tile, for a total of 120,000 annotations. For each tile, annotators labeled whether CAA was present and, if so, its anatomic location, parenchymal vs. leptomeningeal. By combining multiple and at times discordant expert opinions via a consensus strategy, we developed a convolutional neural network (CNN) that learned from six expert annotators. The model achieved held-out test set performance of AUPRC=0.90 for leptomeningeal CAA vessels and AUPRC=0.88 for parenchymal CAA vessels. We intend to release the model as a foundation for a generalizable, scalable means to detect CAA in human postmortem brain WSIs.

15:10
Quantitative Spatial Profiling Reveals Tumor Microenvironment Heterogeneity and Prognostic Biomarkers Associated with Immune Population Architectures
PRESENTER: Haoyang Mi

ABSTRACT. Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive disease with poor 5-year survival rates, necessitating identification of novel therapeutic targets. Elucidating the biology of the tumor immune microenvironment (TiME) can provide vital insights into mechanisms of tumor progression. In this study, we developed a quantitative image processing platform to analyse sequential multiplexed immunohistochemistry data from archival PDAC tissue resection specimens. A 27-plex marker panel was employed to simultaneously phenotype cell populations and their functional states, followed by a computational workflow to interrogate the immune contextures of the TiME in search of potential biomarkers. The PDAC TiME reflected a low-immunogenic ecosystem with both high intratumoral and intertumoral heterogeneity. Spatial analysis revealed that the relative distance between IL-10+ myelomonocytes, PD-1+ CD4+ T cells, and Granzyme B+ CD8+ T cells correlated significantly with survival, from which a spatial proximity signature termed imRS was derived that correlated with PDAC patient survival. Furthermore, spatial enrichment of CD8+ T cells in lymphoid aggregates was also linked to improved survival. Altogether, these findings indicate that the PDAC TiME, generally considered immuno-dormant or immunosuppressive, is a spatially-nuanced ecosystem orchestrated by ordered immune hierarchies. This new understanding of spatial complexity may guide novel treatment strategies for PDAC.

15:15
Quantification of imaging biomarkers in the extracellular matrix of left and right sided colon cancer tissues
PRESENTER: Bharti Arora

ABSTRACT. Location of tumour within the colon is gaining traction as a crucial factor in determining the disease progression, prognosis and management. Studies focussing on clinicopathological features, protein/ genetic biomarkers, composition of gut microbiota and response to therapy, have reported distinctive features in the tumour originating in the left side of colon (LSCC) as opposed to the right sided colon cancer (RSCC). However, the characteristics of tumour microenvironment, particularly, the distribution, texture and density of extracellular matrix (ECM) have not been studied. We used 2-photon laser scanning microscopy (2PLSM) to visualise the intrinsic signal emitted by collagen present in the ECM of human colon tumour tissues in a label-free setting and to identify the imaging biomarkers that can quantitatively distinguish the structure of collagen fibres in the LSCC v/s RSCC. Formalin fixed 50 µm vibratome tissue sections obtained from human RSCC (n=6) and LSCC (n=4) during surgical procedures were scanned by 2PLSM, by acquiring the second-harmonic generation (SHG) signal from collagen fibres and 2-photon excited fluorescence (TPEF). The Ti:Sa laser was tuned at 870 nm; emitted light was filtered by bandpass filters (434/20 nm for SHG; 525/50 nm for TPEF) and collected by photomultiplier detectors in back- and forward direction. 2D overviews were acquired within the tumour stroma. Adjacent paraffin sections were analysed for morphology by H&E and Masson’s Trichrome staining. The collagen content in tumour tissues was quantified by surface rendering of the fibres in IMARIS 9.8.0 (Bitplane). Since fibrillar collagen reorganization has been linked with tumour progression, texture analysis was performed to reveal details about the local orientation and coherence of collagen fibres. This was obtained through a structure tensor-based methodology, wherein the local principal fibre direction and coherence were extracted via a sliding window approach. For statistical analysis, the t-test was performed using Graph Pad Prism 9 with a p-value of 0.05 (*) as a margin for statistical significance. We observed that the distribution of collagen in LSCC is denser than RSCC, which is also evident from the distributions in local orientation and coherence of collagen fibres. The standard deviation of orientation of collagen fibres, which correlates to the waviness of the fibres, is higher in RSCC compared to LSCC. The mean coherence can differentiate healthy tissues from the tumour tissues; however, it cannot distinguish LSCC from RSCC. The observed dense stroma in LSCC might explain the findings that RSCC responds better to some chemotherapies in comparison to the LSCC and correlates with the metastatic potential of the tumour. Our study highlights the relevance of using 2PLSM in extracting imaging biomarkers of collagen, so as to help the clinicians understand the role of tumour ECM in the pathophysiology of colorectal cancer as well as to stratify CC patients.

15:20
Rapid parallel model inference and on-the-fly tiling of digital pathology images
PRESENTER: Sina Ghandian

ABSTRACT. Deep learning model training on digital pathology whole-slide images creates a growing need to develop efficient methods to ingest high volumes of image data. Here, we propose a generalized method to transform gigabyte-scale high-resolution images and conduct inference with neural networks in a scalable and compute-efficient way. We implement this solution with an efficient combination of open tools – Zarr and PyTorch Lightning – to ensure lasting utility and support.

Recent tools make fuller use of CPUs and GPUs available via parallel training and inference. PyTorch Lightning allows computational pathologists to train, evaluate, and iterate their neural networks faster with a simple parameter flag change. However, these parallelization methods require the researcher to first restructure the whole slide image (WSI) into smaller, independent patches. Further, any end user of the model must pre-process the WSI in the same way. Static patch representation of the WSI requires a meta-structure to be imposed on the saved patches; reconstruction of multiple patches is both non-trivial and provides an opportunity for error.

Multi-dimensional array file formats such as HDF5 and OME-TIFF solve many of these issues by off-loading large arrays into manageable chunks on disk. Crucially, none of these formats support parallel read or write access, limiting speed. Zarr supports thread-safe operations and under-the-hood file-locking, to process dynamic patches of WSIs with no further disk usage via a consistent and simple abstraction for the user.

Consider the case in which a model requires multiple inputs of different sizes or needs temporary tiles that are highly overlapped. Prediction heat maps are a concrete example, wherein every pixel must be the center of a sliding tile exactly once, requiring hundreds of thousands of inferences to be made for a single WSI. In this setting, increasing the number of data loading CPU processes (workers) using PyTorch Lightning’s API from 0 to 32 gave us a 4x speed-up per GPU, allowing us to achieve a 16x inference total speed-up by fully utilizing compute resources. The rapid-computation pipeline for pathology whole-slide images described here is a general approach that is deployable from a multi-core laptop to a high-performance computing environment and relies entirely on open-source code.

15:25
Single-cell spatial atlas of high-grade serous ovarian cancer

ABSTRACT. -Background & objective- Every year 450 women are diagnosed, and 320 women die from ovarian cancer in Finland. High-grade serous ovarian cancer (HGSC) is the most common and most lethal subtype. Notably, preliminary evidence suggests that DNA-repair homologous recombination (HR) deficiency tumors have a distinct tumor-immune microenvironment (TME), harboring an increased number of tumor-infiltrating lymphocytes as compared to HR-proficient tumors. Our objective is to characterize changes in the TME by the genotypes in HGSC.

-Methods- The dataset consisted of 1000 tissue microarray cores collected from both the tumor center and the tumor border from 250 HGSC patients. We performed cyclic immunofluorescence (tCycIF) utilizing 34 different protein markers. Image analysis was performed using software Ilastik, Cellprofiler, Matlab and Python scripts. We assessed the BRCA1/2 mutation status and the BRCA1/2 promoter hyper-methylation. For the non-BRCA1/2 mutants were performed sWGS and estimated the CCNE1 amplification status and HR-deficiency using copy number profiles and bioinformatics. The RNA expression of 340 relevant genes was assessed using Nanostring technology.

-Results- Using highly multiplexed imaging, we captured in total 4.8 million single cells. The cells were further annotated and categorized into distinct functional subpopulations within tumor, immune and stromal compartments. Interestingly, the CD8+, the CD20+, the CD11c+ and CD15+ immune cell infiltration was higher in BRCA1/2 mutated tumors as compared to CCNE1, and associated with longer overall survival. The functional clusters of cancer and stromal cells showed heterogeneity among the tumor genotypes.

-Conclusion- Integration of multi-omics data with the single-cell spatial features will reveal the TME landscapes of HGSC with the potential to discover new biomarkers for precision oncology.

13:30-15:30 Session 11: CAREER FORUM

Summary: The two most important aspects for a career in science? Publications and Grants. Taking an interdisciplinary career path with systems biology approaches, should one focus on data science methodologies, experimental technologies, or biological/medical questions to reach a professorship? The session will combine short presentations with an interactive debate about career development. We invite experienced scientists to share their views and advice, and we welcome younger scientists who we will try to support with advice about a career centred around systems biology approaches. We also will be giving practical advice regarding grant writing, publications and personal development.

 

13:30
Become an ICSB - an Interdisciplinary Champion of Systems & Biology

ABSTRACT. Welcome ot the Careers Forum!

We have several interesting speakers lined up but we have also plenty of time to discuss what is on your mind. I will kick off the careers forum with practical advice for writing grant proposals and publications, followed by a series of short talks from Jae Kyoung (KAIST), Alexander Pritzel (Deepmind-Google) Maria Polychronidou (Editor, Molecular Systems Biology), Jana Wolf (MDC), and Bernhard Steiert (Roche Innovation Center Basel). Thomas Lemberger (EMBL) will then lead a discussion. We hope you find motivation, inspiration and practical advice in this session... helping you to become an ICSB - an Interdisciplinary Champion of Systems & Biology ;)

Olaf Wolkenhauer leads the Department of Systems Biology & Bioinformatics, at the University of Rostock. He received his first degrees in systems and control engineering and his PhD for research in possibility theory with applications to data analysis. He spent over ten years of his academic career in the UK at the University of Manchester Institute of Science and Technology. In 2000 he was the first to have a joint appointment between the biomolecular sciences and an engineering department at the University of Manchester Institute of Science & Technology . Since 2005 he holds an adjunct professorship at Case Western Reserve University, USA. In 2005, he became a fellow at the Stellenbosch Institute for Advanced Study (STIAS) and since 2017 he holds an adjunct professorship at Chhattisgarh Swami Vivekanand Technical University, India. In 2003, he was appointed as professor for systems biology and bioinformatics at the University of Rostock. It was the first professorship of its kind, dedicated to systems biology approaches. In 2015, he was elected a member of the Foundations in Medicine and Biology review panel of the German Research Foundation (DFG) and since 2020 he also holds a part-time professorship at the Leibniz Institute for Food Systems Biology, at the Technical University Munich. Olaf Wolkenhauer has coordinated several national and international research consortia and is a regular consultant to companies, ministries and funding bodies around the world.

13:40
How to find good problems with collaborators!

ABSTRACT. How do you find collaborative research problems? This becomes more difficult when you and your collaborators come from different backgrounds (e.g., experiments and theory). Although I am a mathematician, fortunately, I have collaborated with ~30 wonderful experimental labs, including Pfizer inc and Samsung medical center. Thanks to the collaborations, my research field has been extended from molecular biology to pharmacology and digital medicine. In this talk, I will share my tips to find good collaborative problems and maintain collaboration.

13:55
From bench to scientific publishing

ABSTRACT. What skills do you need to become a scientific editor? How does an editor’s working day look like? What are the best and worst aspects of the job? In this session, I will discuss all things related to working as an editor including the most frequently asked question “How does it feel to have left science?” (Spoiler: 10 years after leaving the bench I never felt that “I left science”!)

Maria Polychronidou (Senior Scientific Editor, Molecular Sytems Biology) Maria received her PhD from the University of Heidelberg, where she studied the role of nuclear membrane proteins in development and aging. During her post-doctoral work, she focused on the analysis of tissue-specific regulatory functions of Hox transcription factors using a combination of computational and genome-wide methods. She joined Molecular Systems Biology in 2013.

14:10
From Academic Research to Pharmaceutical R&D and Back

ABSTRACT. Research in systems biology is highly relevant in academia as well as pharmaceutical R&D. The focus in the two types of environment, however, is different. While academia would benefit from scientists experienced in pharmaceutical research, career paths hardly lead from industry back to academia. I will discuss obstacles that complicate this transition and why one nevertheless should consider this option. Jana Wolf is the head of the group Mathematical Modelling of Cellular Processes at the Max-Delbrueck-Center in Berlin (MDC) and Professor at the Department of Mathematics and Computer Science at the Free University Berlin. During her PhD at Humboldt University Berlin she was a guest scientist at the National Institute of Bioscience and Human-Technology, Tsukuba, Japan and at Free University Amsterdam, Netherlands. As a postdoc she worked at the Charité in Berlin before joining GlaxoSmithKline, Medicines Research Centre, Stevenage, UK. She returned to academia to become a group leader. Her group develops and analyses mathematical models of signaling pathways and gene-regulatory networks in normal and disease states. It focusses on processes in cancer biology and early development as well as design principles of biological networks.

14:25
Thinking of a career in big pharma? Here is what you need to know.

ABSTRACT. Big pharma offers numerous career opportunities for systems biology graduates but often is a black-box from the outside, leading to a lot of uncertainty for potential applicants: Is it the right move for me to transition from academia to industry? How to get in? What will I do? How do employees work? Will I be lost? Will I be able to make a significant impact? How will I develop? Will I find meaning and satisfaction in my professional life? Having joined the pharmaceutical industry 5 years ago after completing my PhD in systems biology, I offer to share my personal reflections on these questions and more during this ICSB career track session.

14:40
Q&A and Open Discussion
13:30-15:30 Session 12: CHEMICAL & GENETIC SYSTEMS BIOLOGY

Summary: The inherent complexity of biological systems has fostered the implementation of large-scale experimental screenings to synthesize a deeper understanding of cellular responses to genetic and chemical perturbations. In this session, we will explore novel methodological approaches to integrate and analyze this data to understand the general principles of the wiring of living cells and its context-dependent variations. We thus welcome submissions in the fields of genetic interactions, including synthetic lethal and genetic suppression, context-specific dependency mappings and drug-induced cell responses, as well as computational methods to interpret the corresponding large-scale data sets.

Location: Alexander I+II
13:30
Mapping Genetic and Chemical Genetic Interaction Networks in Yeast and Human Cells

ABSTRACT. We’ve generated a comprehensive genetic network in yeast cells, testing all possible ~18 million gene pairs for genetic interactions. The global network illustrates how coherent sets of genetic interactions connect protein complex and pathway modules to map a functional wiring diagram of the cell. We are also utilizing CRISPR-Cas9 technology to conduct genome-wide screens and map genetic interactions in human cells. Genetic networks provide a powerful model system for interpreting chemical-genetic networks and linking bioactive compounds to their cellular targets in both yeast and human cells.

13:51
Formatting Biological Big Data to Enable Systems Pharmacology

ABSTRACT. Big Data analytical techniques and AI have the potential to transform drug discovery, as they are reshaping other areas of science and technology, but we need to blend biology and chemistry in a format that is amenable for modern machine learning. In this talk, I will present the Chemical Checker (CC), a resource that provides processed, harmonized and integrated bioactivity data on small molecules. The CC divides data into five levels of increasing complexity, ranging from the chemical properties of compounds to their clinical outcomes. In between, it considers targets, off-targets, perturbed biological networks and several cell-based assays such as gene expression, growth inhibition and morphological profiles. We show how CC signatures can boost the performance of drug discovery tasks that typically capitalize on chemical descriptors, including compound library optimization, target identification and anticipation of failures in clinical trials. I will also present the Bioteque, a resource of unprecedented size and scope that contains pre-calculated biomedical embeddings around 11 biological entities (e.g. genes, cells, tissues, disease, etc), derived from a gigantic knowledge graph, so that each entity can be described considering different contexts (e.g. interactions, expression, etc). With small molecule and biological bioactivity descriptors in hand, we now face a new scenario for chemical and biological entities where they both are translated into a common numerical format. In this computational framework, complex connections between entities can be unveiled by means of simple arithmetic operations. Indeed, we demonstrate and experimentally validate that these descriptors can be used to reverse and mimic biological signatures of disease models and genetic perturbations in vitro and in vivo, options that are otherwise impossible using chemical information alone.

References

Duran-Frigola et al. Extending the small molecule similarity principle to all levels of biology with the Chemical Checker. 2020. Nat Biotechnol. 38: 1087-1096.

Bertoni et al. Bioactivity descriptors for uncharacterized chemical compounds. 2021. Nat Commun. 12:1-13.

Pauls et al. Identification and drug-induced reversion of molecular signatures of Alzheimer's disease onset and progression in AppNL-G-F, AppNL-F, and 3xTg-AD mouse models. 2021. Genome Med. 13:168.

Fernández-Torras et al. Integrating and formatting biomedical data in the Bioteque, a comprehensive repository of pre-calculated knowledge graph embeddings. 2022 bioRxiv.

Fernández-Torras et al. Connecting chemistry and biology through molecular descriptors. 2022. Curr Opin Chem Biol, 66: 102090.

14:12
Dissecting the principles underlying transcriptional heterogeneity

ABSTRACT. Cellular heterogeneity is a key determinant for disease outcome and therapeutic treatments across virtually all organisms. Variability within a population underlies the different cellular responses that ultimately determine cell fate and phenotypic diversity. Little is known about the molecular basis of such cell-to-cell variability nor how it impacts the phenotypic spectrum that emerges during adaptive responses to environmental fluctuations. To understand the contribution of each gene to the resulting adaptive transcriptome, we designed a gene-deletion strategy to combine genetic x environmental perturbation screens with single cell RNA-seq profiling (scRNA-seq). Here we profiled a total of 1.2M cells from more than 3000 different genotypes under normal and osmostress conditions to generate high-resolution genotype-transcriptome maps. We use transcriptional phenotype to identify distinct transcriptional architecture, variable gene usage, gene function associations, and uncover regulators of heterogeneity. Our results demonstrate that only a fraction of the core osmoresponsive programme is simultaneously co-expressed, including a preferential use of transcription factors. Harnessing intra- genotype heterogeneity led us to uncover and experimentally validate positive and negative universal or condition-specific regulators of cellular heterogeneity. Our findings expose the complexity of gene and genotype transcriptome layers.

14:32
CRISPRi meets metabolomics: a platform for rapid functional annotation of compound libraries

ABSTRACT. Discovering new antibacterial strategies is a daunting and urgent challenge. While new genetic and genomic approaches have enabled the systematic discovery of new antibacterial targets, the field of antimicrobial discovery is still largely dominated by in vitro susceptibility screening assays, wherein new promising antimicrobial compounds are selected solely on the basis of their ability to inhibit bacterial growth. The lack of mechanistic insights on the MoAs of lead compounds is a major limitation, often leading to the rediscovery of conventional antibacterial compounds, hampering compound optimization, minimization of side effects, drug repurposing and rational design of combination therapies. Hence, there is an urgent need for alternative and efficient strategies for the experimental identification of starting points to produce the next generation of antibacterials. While the information gained in antimicrobial inhibitor screens have so far been one dimensional (e.g. growth inhibition), new techniques for systematic molecular profiling of small-molecule effects can enhance traditional growth inhibition and enable broader analysis of small-molecule bioactivity. Compared to more mature omics profiling technologies, such as transcriptomics and proteomics, non-targeted metabolomics is a cost-effective solution and still offers a throughput advantage in that it can scale with the typical size of chemical libraries. Moreover, by monitoring changes in thousands of cellular metabolites, current metabolomics platforms provide a rich multidimensional representation of drug effects that is largely independent from compounds’ growth inhibitory activity. By leveraging CRISPR technology and non-targeted metabolomics, we developed a novel combined computational/experimental strategy to perform high-throughput de novo functional annotations of small molecules. Our unbiased framework, by linking genetic to drug-induced changes in nearly a thousand metabolites, allows for high-throughput functional annotation of compound libraries in Escherichia coli. First, we generated a reference map of metabolic changes from CRISPR interference with 352 genes in all major essential biological processes. Next, on the basis of the comparison of genetic changes with 1,342 drug-induced metabolic changes, we made de novo predictions of compound functionality and revealed antibacterials with unconventional modes of action (MoAs). By validating our approach also in Mycobacterium smegmatis, tuberculosis and a lung cancer cell line, we show that our framework, combining dynamic gene silencing with metabolomics, can be adapted as a general strategy for comprehensive high-throughput analysis of compound functionality from bacteria to human cell lines. While there is not a single technology that provides a general solution to the problem of drug target identification, metabolome profiling offers a complementary and sensitive readout orthogonal to directly probing protein–drug binding, cell growth or morphological phenotypes. The scalability of this framework, together with recent advances in CRISPR technology enabling genetic manipulation in non-model organisms, makes this approach of widespread use to tackle fundamental bottlenecks in drug development and discovery across many diverse therapeutic areas.

14:45
Dynamic metabolome profiling uncovers potential TOR signaling genes

ABSTRACT. Although the genetic code of the yeast Saccharomyces cerevisiae was sequenced 25 years ago, the characterization of the roles of genes within it is far from complete. The lack of a complete mapping of functions to genes hampers systematic understanding of the biology of the cell. The advent of high-throughput metabolomics offers a unique approach to uncovering gene function with an attractive combination of cost, robustness, and breadth of applicability. Here we used flow-injection time-of-flight mass spectrometry (FIA-MS) to dynamically profile the metabolome of 164 loss-of-function mutants in TOR and receptor or receptor-like genes under a time-course of rapamycin treatment, generating a dataset with over 7,000 metabolomics measurements. We demonstrate that dynamic metabolite responses to rapamycin are more informative than steady state responses when recovering known regulators of TOR signaling, as well as identifying new ones. Deletion of a subset of the novel genes causes phenotypes and proteome responses to rapamycin that further implicate them in TOR signaling. We found that one of these genes, CFF1, was connected to the regulation of pyrimidine biosynthesis through URA10. These results demonstrate the efficacy of the approach for flagging novel potential TOR signaling-related genes and highlights the utility of dynamic perturbations when using functional metabolomics to deliver biological insight.

14:58
Mapping metabolic regulation in E. coli with a metabolism-wide CRISPRi library
PRESENTER: Hannes Link

ABSTRACT. Cells must control enzyme expression in their metabolic network, because high enzyme levels are costly and low enzyme levels can limit metabolic flux. Here, we created a CRISPR interference library to downregulate expression of all 1515 proteins in the metabolic network of E. coli. We have previously studied the library with a pooled approach (Donati et al., 2020), and have now arrayed all 1515 CRISPRi strains into 96-well plates. This allowed us to probe the metabolome and the proteome of 283 CRISPRi strains that had a growth phenotype during growth on glucose minimal medium. The proteome data showed a compensatory upregulation of enzymes in response to several CRISPRi knockdowns. These responses identified metabolic pathways with a feedback circuit that actively works against CRISPRi-knockdowns. Network Component Analysis (NCA) inferred transcription factors that are involved in these feedback circuits, and the metabolome data revealed the regulatory metabolite that activated the feedback circuit in the first place. Thus, by integrating metabolome and proteome data from hundreds of CRISPRi-knockdowns we can identify genetic-metabolic circuits that regulate the metabolic network of E. coli.

Reference: Donati S, Kuntz M, Pahl V, Farke N, Beuter D, Glatter T, Gomes-Filho JV, Randau L, Link H. Multi-omics analysis of CRISPRi-knockdowns identifies mechanisms that buffer decreases of enzymes in E. coli metabolism. Cell Systems 12, 1-12 (2020).

15:11
A new platform for deep multiplexed metabolomics with a DNA barcode readout
PRESENTER: Andy Fraser

ABSTRACT. Each cell contains thousands of different metabolites. Measuring these metabolites gives a beautiful view of the state of any cell that complements gene expression analysis — in an ideal ‘multiomics’ world we would routinely measure both gene expression and metabolite levels at the single cell level. However, metabolites present major hurdles compared with RNA or DNA that makes this extremely hard: • They are biochemically highly diverse e.g. sugars, amino acids, and lipids are very different. • There is no ‘PCR for metabolites’ — this creates problems for single cell metabolite analysis. • The work-horse of metabolomics is the mass spectrometer — the output is complex, most peaks are unassigned to any metabolite, and the data are a mess for machine learning. Given these hurdles, how can we make deep metabolomics as accessible as RNAseq?

Here we present a new technology for measuring metabolite levels using DNA barcodes as readouts. At the core of the technology is a library of aptamer sensors — each sensor has been selected to recognise a specific metabolite. When a sensor recognises its metabolite, it releases a unique DNA barcode and each sensor is paired to a different barcode e.g. the glucose sensor releases one barcode, the lysine sensor releases a different barcode. This allows us to measure the levels of each metabolite simply by reading out the levels of each DNA barcode — since each sensor releases a different barcode, they can be highly multiplexed allowing us to read out many thousands of sensors in parallel. We show: • our sensors can recognise highly diverse metabolites and drugs including sugars, amino acids, hormones, and a range of FDA approved drugs. • our pipeline for sensor selection measures quantitative binding of >100M sensors for each ligand — this is a rich data set for machine learning approaches to predict sensors (collaborations with AI welcome!) • the response to each ligand is quantitative across several orders of magnitude • we can ‘tune’ each sensor to respond across different concentration ranges • crucially, the sensors can specifically recognise their metabolite ligands in complex mixtures including cell lysates • we can highly amplify the sensor output by amplifying the released barcodes.

This new platform for metabolomics should finally make the deep multiplexed measurement of metabolites and drugs in single cells into a reality.

15:19
Dynamics of growth and ribosome level in batch cultures of S. cerevisiae
PRESENTER: Yu Huo

ABSTRACT. Cells double their proteins during each cell cycle, and ribosomes are responsible for this protein production. Previous research has shown a linear correlation between the mass fraction of ribosomes and the specific growth rate during exponential growth in both E. coli and S. cerevisiae. Nevertheless, how the levels of ribosomes change and how those changes couple to growth rate beyond the exponential stage are unclear. Here, we monitor the dynamics of both growth and ribosome levels in S. cerevisiae using microplate readers and estimate the effective translation rate over time. We show that the effective translation rate remains constant during the early phase of growth on various sugars, does not change under energy stress imposed by a weak acid, yet decreases when ribosome-targeting antibiotics are present. Our results suggest the existence of an empirical upper limit to the effective translation rate and provide details of the ribosomal dynamics of yeast in batch cultures, which should allow us to extend existing self-replicator models beyond steady-state conditions.

15:20
An optimal RNA growth law and its relationship with genome organization in bacteria
PRESENTER: Xiao-Pan Hu

ABSTRACT. The distribution of cellular resources across bacterial proteins has been quantified through phenomenological growth laws; for example, the content of ribosomal proteins increases linearly with growth rate. Here, we describe a complementary bacterial growth law for RNA composition, emerging from optimal cellular resource allocation across ribosomes and the complex of tRNA and elongation factor Tu. The predicted decline of the tRNA/rRNA ratio with growth rate agrees quantitatively with experimental data for diverse fast-growing microbes. We find that its regulation is implemented in part through chromosomal localization: rRNA genes are typically closer to the origin of replication than tRNA genes; due to replication-associated gene dosage effects, rRNA genes thus show increasingly higher relative gene dosage at faster growth. At the highest growth rates in E. coli, the tRNA/rRNA gene dosage ratio based on chromosomal positions is almost identical to the observed – and theoretically optimal – tRNA/rRNA expression ratio, indicating that the chromosomal arrangement has evolved to favor maximal transcription of both types of genes at this condition. These insights, which quantify the links between cellular resource allocation, growth, and genome organization, may aid in the rational genomic design of efficient synthetic biological systems.

15:21
A model of RNA repair to study antibiotic tolerance
PRESENTER: Hollie Hindley

ABSTRACT. Antibiotic tolerance, the mechanism of bacteria transiently surviving antibiotic treatment, is emerging as a precursor to the development of full antibiotic resistance. An RNA repair system, the Rtc system, has recently been shown to promote antibiotic tolerance upon exposure to ribosome-targeting antibiotics. The role of this system in the absence of antibiotics is largely unknown, and even less so the mechanisms by which tolerance is obtained. In this work, we develop and analyse the first mathematical model of the Rtc system for RNA repair, to investigate the mechanistic action of Rtc leading to antibiotic tolerance in bacteria.

The Rtc system is an RNA repair system found in all domains of life. Recent work has highlighted the role of Rtc in maintaining RNA components of the translational apparatus, allowing bacteria to counteract the translation-inhibiting effects of antibiotics, as well as roles in chemotaxis and motility processes. The system consists of an RNA cyclase, RtcA, and an RNA ligase, RtcB, which together perform an end-healing and -sealing function for RNA ends and are both regulated by RtcR.

The expression of RtcA and RtcB is tightly regulated by a σ54-factor that requires an activator protein, RtcR. Under normal conditions, RtcR exhibits negative self-autoregulation and requires cooperative activation by a ligand. Once active, RtcR interacts with the σ54-RNA polymerase (RNAP) holoenzyme and using its ATPase activity, converts RNAP from the closed complex to the open complex, where transcription of RtcA and RtcB can begin.

Building a mathematical model of the Rtc system, we investigate the potential of ribosome maintenance in rescuing growth upon antibiotic exposure. We model expression of the three Rtc genes and their action on ribosomes. We further model ribosomes as three separate species: healthy and damaged ribosomes, and `healed' ribosomes that have been tagged by RtcA for `sealing' by RtcB. Tagged ribosomes act as ligands to RtcR, and so a positive feedback loop is created, with tagged ribosomes leading to expression of RtcA and RtcB.

Preliminary analysis indicates a high sensitivity of RtcAB expression on ATP availability. The system further displays potential for bistability, which may explain the heterogeneity observed in the expression of Rtc and in tolerance levels across isogenic cells.

Integrating heterogeneous data on Rtc expression, growth rate and ribosome efficiency, we work to embed the Rtc model within a mechanistic cell model of bacterial growth, which will allow Rtc to be studied under various growth conditions, analyse how antibiotics affect its expression, and in turn, how Rtc affects bacterial growth.

15:22
Gene-essentiality based drug signature helps repurposing non-cancer drugs
PRESENTER: Jing Tang

ABSTRACT. Cancer drugs often kill cancer cells independent of their putative targets. The lack of understanding on drug-target interactions prevents biomarker identification and ultimately leads to high attrition in clinical trials. In this study, we explored whether the integration of loss-of-function genetic and drug sensitivity screening data could help identify the mechanisms of action of drugs. We constructed a gene-essentiality drug signature by integrating loss-of-function genetic and drug sensitivity screening data. A machine learning model was developed, where the coefficients of all the genes were considered as the gene-essentiality signature of the drug. We compared the gene-essentiality signatures against structure-based fingerprints as well as the gene expression signatures in both supervised and supervised target predictions. We showed that the gene-essentiality signature can better predict drug targets and their downstream signaling pathways. We then confirmed the validity of our framework in the PRISM dataset generated by the large-scale drug screening experiment. Finally, we predicted the targets for the non-cancer drugs in the PRISM screens that explain better their anticancer efficacy, which may pave the way for drug repositioning.

15:23
Towards rational design of antibiotic combination therapies

ABSTRACT. Antimicrobial resistance is on the rise globally. Increased levels of resistance have been reported across bacterial strains and antibiotic compounds. Of special concern are multi-drug resistant bacteria that do not respond to last-resource antibiotics, leaving patients without available treatment options. Some predictions estimate that by 2050, 10 million deaths per year will be attributed to bacterial resistance, making it a major threat to human health [1].

Tackling this issue by developing novel compounds has proved to be a difficult process: economic, regulatory and scientific bottlenecks slow down the antibiotic pipeline, evidencing the need for innovative ways of slowing the emergence and spread of resistance [2]. Antibiotic combination therapies are a promising approach to potentiate treatment and slow resistance evolution. However, administering two compounds can also lead to a loss of effect or an increase in toxicity [3,4]. A key aspect to predicting antibiotic efficacy and developing new therapeutic approaches is a thorough understanding of the relationship between drug susceptibility and bacterial physiology.

Here we present a mechanistic modelling approach to quantitatively predict single antibiotic effect on bacterial growth dynamics under different environmental conditions. We focus on ribosome-targeting antibiotics, which constitute more than half of the drugs used to treat bacterial infections and are among the most successful antimicrobials. We model the uptake of antibiotics and their dynamic interplay with ribosomes within an established model of bacterial growth physiology [5]. Integrating data on growth responses to three ribosome-targeting antibiotics (chloramphenicol, tetracycline and streptomycin), we infer drug-associated parameters and obtain estimates consistent with reported literature values. Furthermore, the calibrated model recovers the effects observed in an independent dataset of growth curves with the same antibiotics.

Currently, we are working on expanding this framework to predict the effect of antibiotic combinations on growth behaviour. By integrating theoretical knowledge and data on growth responses, we expect to identify crucial interactions and gain further mechanistic understanding of combined drug action. This will bring us closer to a predictive theory of bacterial responses to antibiotics, and thus on a path to rational antibiotic therapy.

[1] O’Neill, J. Tackling drug-resistant infections globally: final report and recommendations. (2016).

[2] Gupta, S. & Nayak, R. Dry antibiotic pipeline: Regulatory bottlenecks and regulatory reforms. Journal of Pharmacology & Pharmacotherapeutics, 2014.

[3] Tyers, M. & Wright, G. Drug combinations: a strategy to extend the life of antibiotics in the 21st century. Nature Reviews Microbiology 2018, 2019.

[4] Coates, A., Hu, Y., Holt, J. & Yeh, P. Antibiotic combination therapy against resistant bacterial infections: synergy, rejuvenation and resistance reduction. Expert review of anti-infective therapy, 2020.

[5] Weiße, A., Oyarzún, D., Danos, V. & Swain, P. Mechanistic links between cellular trade-offs, gene expression, and growth. PNAS, 2015.

15:24
Patterns of differentially essential genetic interactions characterize functional modules across cancer types

ABSTRACT. The main goal of this study is to exploit the cancer dependency map (DepMap) to establish a map between differential essential genetic interactions and the cancer cell line context in which they gain or lose their essentiality. A novel strategy for identifying gene pairs corresponding to genetic interactions with shifting essentiality across contexts is proposed, and interactions sets with context dependent overlap are revealed as context specific functional modules. We aim to at least in part characterize the underlying genetic, proteomic and phenotypic features associated with differential essentiality - thus providing mechanistic hypotheses for cancer development and targeting. Preliminary analyses indicate that some of these interaction modules, when enriched for biological processes, point to mechanical aspects such as adhesion and cell motility, processes which could be linked to metastatic potential. While genetic rewiring (context specific synthetic lethal interactions identified from heterogenic loss of function screens) has been probed in other labs recently , systematically studying context dependent changes in essentiality of genetic interactions reveals new aspects in our understanding of the genetics of cancer.

15:25
Identification of causal genes at GWAS loci with pleiotropic gene regulatory effects using instrumental variable sets
PRESENTER: Mariyam Khan

ABSTRACT. Genome wide association studies (GWAS) have shown that genetic architecture of human health and disease traits is highly complex, with most traits being affected by large number of small effect genetic variants spread across the entire genome. At molecular level, genetic variants affect surrounding epigenetic states, leading to altered transcription of nearby genes by cis-acting mechanisms, which then causes downstream trans effects on gene expression and clinical phenotypes via gene regulatory networks.

We are interested in inferring trans acting causal relations between gene expression traits at disease risk loci identified by GWAS and clinical phenotypes using Mendelian Randomization (MR). In traditional MR, variant with local gene regulatory effect (a cis- expression quantitative trait locus or cis-eQTL) acts as a randomized “instrument” for the expression of the gene, like random assignment of individuals to treatment groups in randomized controlled trials, such that the statistical associations between the variant, the gene and the phenotype can be used to estimate the causal effect of the gene on the phenotype.

However, in human data it has been found that up to 57% of genetic variants with local gene regulatory effects are linked to expression of multiple nearby genes (regulatory pleiotropy). Here, traditional MR cannot be applied, and identification of causal genes and their relative causal effects at GWAS loci with pleiotropic regulatory effects is an open question in the field.

We have used Wright’s method of causal path coefficients to prove mathematically that if a regulatory site is shared by ‘d’ cis-eGenes, and if ‘d’ genetic variants can be found in the shared-site locus, each associated with at least one of the cis-eGenes, not in perfect linkage disequilibrium with each other, then these variants form a generalized instrumental variable set and allow identification of the relative contributions of each cis-eGene to the phenotype, irrespective of any hidden confounding among the cis-eGenes and the phenotype.

As a proof of principle, we identified candidate causal genes at GWAS loci for coronary artery disease risk with pleiotropic gene regulatory effects.

15:26
A Boolean Modeling Framework for Drug Synergy Prediction in Breast Cancer
PRESENTER: Kittisak Taoma

ABSTRACT. Breast cancer is one of the leading causes of death in women, contributing to ~685,000 global losses in 2020. Recently, drug combinations have been proved to provide effective drug regimens and improved treatment efficacy over monotherapy. However, the discovery process of effective drug combinations is costly and time-consuming due to a large combinatorial space of available drugs. In the present study, we develop a generic Boolean model of breast cancer signaling regulation, which is subsequently extended to represent triple-negative and luminal breast cancer cells by incorporating the genomic and transcriptomic data. The sub-type specific Boolean models can capture gene expression profiles and resistance behaviors observed in drug-perturbed experiments. Finally, proxy functions based on Boolean activities of a set of proteins are derived by a genetic algorithm for predicting drug synergy. The framework can reasonably predict synergy among drugs while providing mechanistic explanations of the biological process underlying effective drug combination in cancer therapy

13:30-15:30 Session 17A: WILDCARDS *.* [Part I]

Summary: This session will entail talks across any hot topic or landmark work selected by all session chairs and organizers of ICSB 2022. They can be from any field of systems biology or associated fields. We will consider both contributed wildcard talks and approach researchers who has or is conducting exciting groundbreaking work.

Location: Alexander III
13:30
Tissue-specific codon usage: from systems to synthetic biology

ABSTRACT. Although different tissues showcase differences in codon usage and anticodon tRNA repertoires, the codon-anticodon co-adaptation of multicellular eukaryotes is not completely understood. On the one hand, coding sequences are determined by manifold overlapping factors (codons, mRNA stability, splicing, etc.) and, on the other hand, tRNAs are intricately regulated at multiple levels (expression, modification, aminoacylation, fragmentation). Here, we uncover translational determinants of tissue-specificity applying a systems biology approach to human high-throughput datasets. First, analyzing the tRNA abundance in over 8,000 tumor and healthy samples unveiled that the variability of the tRNA pool is largely related to the proliferative state across tissues, and that cancer patient survival is associated with the translational efficiency of certain codons. To quantify the extent codons are efficiently translated, we then leveraged transcriptomics and proteomics datasets to compute the protein-to-mRNA ratios across 36 different healthy human tissues. We detected two clusters of tissues with an opposite pattern of A/T- vs G/C-ending codon preferences. Using these, we then developed and experimentally validated CUSTOM (custom.crg.eu), a codon optimizer algorithm for tissue-specific protein production. Altogether, our work not only provides evidence of tissue-specific tRNA expression and protein synthesis, but also makes this knowledge applicable to the development of tissue-targeted therapies and vaccines.

13:40
Interrogating the effect of enzyme kinetics on metabolism using differentiable constraint-based models
PRESENTER: St. Elmo Wilken

ABSTRACT. Metabolic models are typically characterized by a large number of parameters. Traditionally, metabolic control analysis is applied to differential equation based models to investigate the sensitivity of predictions to parameters. A corresponding theory for constraint based models is lacking, due to their formulation as optimization problems. Here, we show that optimal solutions of optimization problems can be efficiently differentiated using constrained optimization duality and implicit differentiation. We use this to calculate the sensitivities of predicted reaction fluxes and enzyme concentrations to turnover numbers in an enzyme-constrained metabolic model of Escherichia coli. The sensitivities quantitatively identify rate limiting enzymes and are mathematically precise, unlike current finite difference based approaches used for sensitivity analysis. Further, efficient differentiation of constraint-based models unlocks the ability to use gradient information for parameter estimation. We demonstrate this by improving, genome-wide, the state-of-the-art turnover number estimates for E. coli. Finally, we show that this technique can be generalized to arbitrarily complex models. By differentiating the optimal solution of a model incorporating both thermodynamic and kinetic rate equations, the effect of metabolite concentrations on biomass growth can be elucidated. We benchmark these metabolite sensitivities against a large experimental gene knockdown study, and find good alignment between the predicted sensitivities and in vivo metabolome changes. In sum, we demonstrate several applications of differentiating optimal solutions of constraint-based metabolic models, and show how it connects to classic metabolic control analysis.

13:50
Lipid network and moiety analysis for revealing enzymatic dysregulation and mechanistic alterations from lipidomics data
PRESENTER: Nikolai Köhler

ABSTRACT. Graph-based algorithms are nowadays in the standard repertoire of computational genomics, transcriptomics, and proteomics workflows. However, comparable tools are missing from the landscape of lipidomics data analysis. Due to the growing importance of lipidomics for clinical and biomedical research, improvements in the interpretation of lipid data can direct translate into higher clinical relevance of lipidomics data.

To leverage the potential of lipid metabolic networks, we developed a framework called LINEX2 (Lipid Network Explorer 2) [1]. It generates lipid networks using curated information on lipid-metabolic reactions from public databases together with information on fatty acid metabolism. The resulting networks are not only on a lipid species level, but also specific to the lipids measured in a given lipidomics dataset. Additionally, LINEX2 supports user contributions for new lipid classes and metabolic reactions to keep up with novel insights into lipid metabolism in a community effort. Since enzymes participating in lipid-metabolic reactions are multispecific, meaning one enzyme can catalyze the same reaction for different combinations of lipid species, an analysis of lipid-metabolic networks should take this peculiarity into account. Therefore, we developed an enrichment algorithm, specifically designed for this setting. By inferring changes in enzymatic activity considering the context of their multispecific nature and identifying a subnetwork of maximum change, it aids the mechanistic interpretation of quantitative lipidomics data.

As a proof-of-principle for the enrichment algorithm we analyzed data from a study by Thangapandi et al. [2], which compares the lipid profile of the liver between wild-type mice and mice with a hepatospecific knock-out of MBOAT7 under non-alcoholic fatty liver disease conditions. As the reaction catalyzed by MBOAT7 is already known, such a dataset is well-suited for testing the LINEX2 enrichment algorithm. The subnetwork resulting from the analysis pinpoints the exactly the reaction catalyzed by MBOAT7, demonstrating the capability of our framework to infer changes in reaction activity on the basis of lipidomics data.

To showcase how our analysis facilitates hypothesis generation on real-world clinical data, we also applied it to data from the AdipoAtlas [3], a comprehensive reference set of the White Adipose Tissue lipidome of lean and obese humans. The enrichment results indicate that the highest change in reaction activity is occurring in reactions catalyzed by the Phospholipase A2 Group IVC (PLA2G4C) and the asparaginase. These results are also supported by literature, which reports PLA2G4C to be differentially expressed in obese individuals and products of PLA2 activity to be mediators of adipose tissue metabolism.

LNEX2 is available as a web service (https://exbio.wzw.tum.de/linex2/) and a python package for high-throughput analysis (https://pypi.org/project/linex2/).

[1] Rose and Köhler et al., 2022, bioRxiv [2] Thangapandi et al., 2021, Gut [3] Lange et al., 2021, Cell Reports Medicine

14:00
Structural proteomics evidence for a predicted glycolytic metabolon

ABSTRACT. Glycolysis and gluconeogenesis are two of the most important metabolic pathways, producing ATP and pyruvate or replenishing the hexoses pool. It has been proposed that some, if not all, of the glycolytic enzymes form a complex termed the glycolytic metabolon. This complex could be dynamic and condition-dependent, mediating substrate channeling and acting as regulator of glycolytic flux. Until now, this concept could not yet be convincingly demonstrated, possibly because the low-affinity complex does not survive cell lysis and because its composition is unknown, leaving it unclear what to look for. To study a potentially labile glycolytic metabolon in E. coli, we are using thermodynamic modeling and chemical crosslinking mass spectrometry. We used thermodynamic models of glycolysis and gluconeogenesis to investigate which groups of reactions were more likely to operate in enzyme complexes to maintain overall favorability of the pathway. We found that, under physiological conditions, TPI and FBA were the most unfavorable steps of gluconeogenesis and glycolysis, with a stronger effect in the former. However, these reactions were predicted to be favorable in case of substrate channeling between them and FBP or PFK. Thus, we considered fbp-fbaA-tpiA and pfkA-fbaA-tpiA as target complexes for experimental investigation in gluconeogenesis and glycolysis. To understand whether a physical interaction between the predicted groups of enzymes is possible, we performed in vitro chemical crosslinking of the purified enzymes under glycolytic and gluconeogenic conditions. Mass spectrometric and gel electrophoresis analyses indicated that these enzymes interact and suggested potential interaction sites. Using the chemical crosslinks as restraints, we performed docking of available enzyme crystal structures, and obtained structures of potential glycolytic and gluconeogenic complexes. To investigate the physiological relevance of these potential complexes, we performed proteome-wide crosslinking, followed by the targeted mass spectrometry analysis of crosslinked peptides detected in vitro. We detected one inter-enzyme crosslink in situ, and we are working on increasing the sensitivity of the approach.

14:10
From System Biology to Pharmacology: Identifying what is important

ABSTRACT. Systems biology offers a wealth of large-scale kinetic models of pharmacologically relevant processes. These models are increasingly used in drug discovery to study the response of the system to a given intervention, e.g., for target identification. While the high complexity and detailedness of systems biology models is desirable in the early stage, it is problematic in later stages of drug development, where the estimation of inter-individual variability and thus of model parameters is of key importance. A natural solution is model reduction. While there exists numerous model reduction techniques, we take a different approach and first ask the question: What are the important molecular constituents of the model under consideration? And how to define importance?

To the end, we introduce the index analysis approach; it is based on different time- and state-dependent quantities (indices) to identify important dynamic and static characteristics of the molecular species of a kinetic model. All indices are defined for (i) a specific pair of input and response variables; (ii) a specific magnitude of the input; and (iii) a specific time interval.

In application to a large-scale kinetic model of the EGFR signalling cascade, we identified different phases of signal transduction to an EGF stimulus, the peculiar role of Phosphatase3 during signal activation and Ras recycling during signal onset. In addition, we discuss the challenges and pitfalls of interpreting the relevance of molecular species based on knock-out simulation studies, and provide an alternative view on conflicting results on the importance of parallel EGFR downstream pathways. ​ We envision that index analysis will be beneficial in comparing different model scenarios (e.g., healthy and diseased conditions), in designing more informed model reduction approaches and in translating large-scale systems biology models from early to late phase in drug discovery and development.

14:20
Optimization of functional efficiency in key tissues constrains protein rate of evolution in multicellular species
PRESENTER: Dinara Usmanova

ABSTRACT. What determines the rate of protein evolution? Surprisingly, despite many decades of investigation, this question about one of the most fundamental parameters in molecular evolution is not yet answered. It has been previously demonstrated that across organisms the rate of protein evolution anti-correlates strongly with gene expression levels, but the mechanisms underlying this universal relationship remain unresolved. In the present work we approach the question by studying multicellular species, where multiple tissues with diverse expression patterns and cellular environments provide an extra dimension of variation enabling the assessment of potential mechanism behind the rate of protein evolution. Using tissues and cell-types specific expression data from various animal and plant species, we showed that expression – evolutionary rate anticorrelation in multicellular species is driven by the transcriptome in a limited number of dominant cell types. Specifically, in adult neuronal cells in animals, and fast-growing tissues in plants. Expression in these limited number of cell types affects protein evolutionary clock not just of proteins specific to corresponding tissues, but for all proteins widely expressed across tissues. Importantly, we find that metabolic enzymes highly expressed in tissues that primarily affect protein evolution show the highest level of catalytic efficiency. Moreover, the enzyme catalytic efficiency explains a substantial fraction of the variance of evolutionary rate across proteins in the human and Arabidopsis proteomes. These results suggest that the selection for protein functional efficiency plays an important role in mediating slower evolution of highly expressed proteins. Finally, the analysis of multiple single-cell transcriptomics datasets further suggests that specific cellular processes, such as energy demanding synaptic transmission in animals and cell growth and elongation in plants, likely increase the relative costs of protein production in corresponding cells, leading to levels of expression – evolutionary rate correlations similar in magnitude to those reported for unicellular organisms. Altogether, our results shed light on several inter-related research areas: protein evolution, optimization of protein molecular function, and variability of protein expression across tissues in multi-cellular organisms. The study reveals how adaptation of individual components of complex biological organisms is shaped by optimization and constraints combined across multiple levels of hierarchical organization.

14:30
Universally valid reduction of multiscale stochastic biochemical systems with simple non-elementary propensities
PRESENTER: Yun Min Song

ABSTRACT. As experimentally characterizing all underlying processes of reactions in biochemical systems is almost impossible, their combined effects have frequently been described by simplified non-elementary reaction functions (e.g., Michaelis-Menten and Morrison functions). Recently, the deterministically driven non-elementary reaction functions have been heuristically used for stochastic simulations with the Gillespie algorithm. While this approach has been one of the most popular methods for efficient stochastic simulations, its validity condition has remained poorly understood. In this talk, we derive a complete condition under which this approach can accurately capture the stochastic dynamics of reversible binding, a critical reaction to describe nearly all biochemical systems. Using the approach outside the identified range of validity can seriously distort the stochastic dynamics of various biochemical systems, such as the circadian clock. Importantly, we suggest alternative simplified reaction functions for stochastic reversible binding. This provides a universally valid framework for simplifying stochastic biochemical systems with rapid reversible bindings. To facilitate the framework, we provide a computational package, ASSISTER, that automatically performs the universally valid stochastic model reduction.

14:40
A physiologically-structured population model for pharmacokinetics of orally inhaled drugs

ABSTRACT. The pharmacokinetics of orally inhaled drugs are determined by pulmonary processes such as particle deposition, drug dissolution, and mucociliary clearance. To understand the complex interplay of these processes, mathematical models can be employed. Existing modelling approaches either focus on a single process or considerably simplify the processes, such as by neglecting the nonlinear dependency of drug dissolution on particle size, saturation effects, or spatial effects due to anatomical differences across the airways. To overcome these limitations, we propose a physiologically-structured population model for pulmonary pharmacokinetics which accounts for particle size and spatial effects, coupled to an ODE-based model for systemic pharmacokinetics. We propose a tailored numerical resolution scheme accounting for mass balance at the interface. Parametrized from anatomical and systemic pharmacokinetic data, a priori pharmacokinetic predictions after drug inhalation from this model are confronted to a diverse set of clinical data. The model successfully predicts lung retention profiles of insoluble particles, particle size effects, differences between slowly and quickly dissolving substances, and pharmacokinetic differences between healthy volunteers and asthmatic patients.

14:50
When cells decide to give up on repairing DNA damage

ABSTRACT. Why biological quality-control systems fail is often remained mysterious. Checkpoints in yeast and animals are overridden after prolonged arrests allowing self-replication to proceed despite the continued presence of errors. Although critical for biological systems, checkpoint override is not understood quantitatively or at the system level by experiment or theory, even though the genes and circuits involved in many checkpoints have been researched extensively.

To uncover potential patterns obeyed by error correction systems, we derived the mathematically optimal checkpoint strategy, balancing the trade-off between risk and opportunities for growth. The theory predicts the optimal override time without free parameters based on two inputs, the statistics i) of error correction and ii) of survival.

We applied the theory experimentally to the DNA damage checkpoint in budding yeast, an intensively researched model for other eukaryotes, whose override is nevertheless not understood quantitatively, functionally, or at the system level. Using a novel fluorescent construct which allowed cells with DNA breaks to be isolated by flow cytometry, we quantified i) the probability distribution of repair for a double-strand DNA break (DSB), including for the critically important, rare events deep in the tail of the distribution, as well as ii) the survival probability after override. Based on these two measurements, the optimal checkpoint theory predicted remarkably accurately the DNA damage checkpoint override times as a function of DSB numbers, which we also measured for the first time precisely. Thus, a first-principles calculation uncovered hitherto hidden patterns underlying the highly noisy checkpoint override process. Our multi-DSB results revise well-known bulk culture measurements and show that override is a more general phenomenon than previously thought. Further, we show that override is an advantageous strategy in cells with wild-type DNA repair genes.

The universal nature of the balance between risk and self-replication opportunity is in principle relevant to many other systems, including other checkpoints, developmental decisions, or reprogramming of cancer cells, suggesting potential further applications.

Reference: Sadeghi et al., Nature Physics 2022

15:00
Quasi-Entropy Closure: A Fast and Reliable Approach to Close the Moment Equations of the Chemical Master Equation
PRESENTER: Nicole Radde

ABSTRACT. The Chemical Master Equation is a stochastic approach to describe the evolution of a (bio)chemical reaction system. Its solution is a time-dependent probability distribution on all possible configurations of the system. As this number is typically large, the Master Equation is often practically unsolvable. The Method of Moments reduces the system to the evolution of a few moments, which are described by ordinary differential equations. Those equations are not closed, since lower order moments generally depend on higher order moments. Various closure schemes have been suggested to solve this problem. Two major problems with these approaches are first that they are open loop systems, which can diverge from the true solution, and second, some of them are computationally expensive. Here we introduce Quasi-Entropy Closure, a moment closure scheme for the Method of Moments. It estimates higher order moments by reconstructing the distribution that minimizes the distance to a uniform distribution subject to lower order moment constraints. Quasi-Entropy Closure can be regarded as an advancement of Zero-Information Closure, which similarly maximizes the information entropy. Results show that both approaches outperform truncation schemes. Quasi-Entropy Closure is computationally much faster than Zero-Information Closure, although both methods consider solutions on the space of configurations and hence do not completely overcome the curse of dimensionality. In addition, our scheme includes a plausibility check for the existence of a distribution satisfying a given set of moments on the feasible set of configurations. All results are evaluated on different benchmark problems.

15:10
GEARS: Predicting transcriptional outcomes of novel multi-gene perturbations
PRESENTER: Yusuf Roohani

ABSTRACT. Motivation: Transcriptional response to genetic perturbation can reveal fundamental insights into the functioning of a cell. It is central to numerous biomedical applications from identifying genetic interactions involved in cancer to methods for regenerative medicine. Recently, large-scale CRISPR-based perturbational screens (e.g. PerturbSeq) have emerged as an important tool for uncovering these insights. While single-cell transcriptional outcomes of perturbation can now be sampled experimentally, perturbing all possible combinations of genes remains slow, laborious, and expensive. The combinatorial explosion in the number of possible multi-gene perturbations makes computational methods indispensable for prioritizing which perturbations to test experimentally.

However, existing computational approaches face many limitations in fulfilling this potential. They are either limited by the complexity of genetic interactions that they can learn or in their ability to predict outcomes of perturbing combinations of genes not experimentally perturbed. Here, we present GEARS (Graph-Enhanced gene Activation and Repression Simulator), a geometric deep learning method that can predict transcriptional response to both single and multi-gene perturbations using single-cell RNA-sequencing data from perturbational screens.

Results: GEARS is uniquely able to predict outcomes of perturbing combinations consisting of novel genes that were never experimentally perturbed by leveraging geometric deep learning and a knowledge graph of gene-gene relationships. This significantly expands the space of possible combinatorial perturbation outcomes that can be computationally predicted using the same amount of experimental data. GEARS also predicts new biologically meaningful phenotypes that are different from experimentally-observed phenotypes used for model training.

GEARS shows a performance improvement greater than 45% in predicting genetic perturbation outcomes across 3 different datasets. GEARS’ predictions were also found to be significantly more directionally consistent, thus highlighting its ability to detect the correct nature of regulatory relationships. GEARS is able to predict outcomes for combinatorial perturbations consisting of arbitrarily many genes. GEARS has more than 50% higher precision than existing methods in predicting four key genetic interaction subtypes (e.g. synergy, epistasis, redundancy) and can identify the strongest genetic interactions twice as well.

Conclusion: GEARS uses deep learning to combine prior knowledge of biological processes with large single-cell perturbational datasets to create a reliable model of key cellular processes. As CRISPR-based perturbational screens become ubiquitous for discovering drug targets, GEARS is uniquely positioned to exponentially multiply the information gained from these screens. Moreover, since it can predict emergent transcriptional behavior, GEARS is also very useful for discovering tractable routes for engineering cell identity. Thus, GEARS is a systems biology model that can not only impact the discovery of novel small molecules for targeting disease but also push the frontier in the design of the next generation of cell and gene-based therapeutics.

15:13
Bias and reproducibility in a Computational Neurobiology PhD’s journey

ABSTRACT. The aim of this poster is to present some of the key questions I have used during my PhD to enquire the ethical standards and reproducibility of computer models in Computational Neurobiology. Any research can be seen as a journey, with different milestones. Here, I divide the milestones of research as “design, data collection, data analysis and reporting”, and highlight some of the key questions we can ask ourselves through our research journey, in order to make it more ethical, accessible and reproducible. This poster serves as a visual representation of questions to be asked about the bias we carry into our research, as well as what starting key resources can be used to make our research more reproducible.

Rather than presenting the results of a study, this poster suggests examples of how to include reproducibility as a key characteristic of a PhD as well as how it is possible to think about biases of our own research as we go along. The presented questions and topics can be taken and applied by anyone in their research journey.

15:16
Learning synthetic cell classifier designs with genetic algorithms and logic programming
PRESENTER: Melania Nowicka

ABSTRACT. Background: Cell classifiers are synthetic bio-devices performing type-specific in vivo classification of the cell's molecular fingerprint. In particular, they can recognize cancerous cells and trigger their apoptosis, shaping novel therapies for cancer patients. Here, the classifiers describe the relationship between cells' molecular profiles and their annotation as cancerous or non-cancerous. Such a relationship can be represented as a partially defined logical function where the output indicates the cell condition. A single circuit's processing logic is usually described using a larger individual Boolean function, whereas multi-circuit classifiers are ensembles of simpler logic designs. Such distributed classifier consists of a group of single-circuit classifiers deciding collectively whether a cell is cancerous according to a predefined threshold function. Both architectures have shown the potential to predict the cell condition with high accuracy. However, the lack of comprehensive workflows to design and evaluate the classifiers, in particular, assessing their robustness to noise and novel information, makes their application limited.

Results: Here, we present a framework for designing miRNA-based distributed cell classifiers, employing genetic algorithms and Answer Set Programming. We develop optimization criteria comprising the accuracy and robustness of the circuits and train classifiers that achieve high performance (89.78% accuracy for the most-perturbed data set), as shown in multiple simulated data studies. The evaluation performed on cancer data demonstrates that distributed classifiers outperform single-circuit designs by up to 13.40%. Our workflow provides inherently interpretable classifiers comprising relevant miRNAs previously described in the literature, as well as more complex regulation patterns underlying the data. Ultimately, we show how our approach can be applied to other binary classification problems employing different biological modalities such as gene expression or mutation patterns providing interpretable classifiers.

15:17
Predicting metabolite accumulation in cancerous metabolic network
PRESENTER: Tin Yau Pang

ABSTRACT. Different theoretical approaches are developed to predict the phenotype of metabolic networks. “Metabolic network expansion” considers how the presence and absence of enzymes and their reactions affect the functions of the network and the synthesis of metabolic products. “Flux balance analysis” (FBA) assumes that the network is optimal for certain metabolic objectives and predicts the metabolic fluxes across the reactions in a network based on optimization. However, the objective of a cell in a multicellular organism cancerous or a cancerous cell with aberrant mutations is elusive. Kinetics modeling may predict the metabolic fluxes in the network, but it requires the kinetic parameters of each reaction in the network.

Methodologies based on machine learning can now predict a protein’s various properties with order-of-magnitude accuracy. The Michaelis constant (KM) or catalytic rate constant (kcat) of an enzyme can be predicted from its sequence and the substrate’s structure. The intensity of a protein in the proteome can also be predicted from its sequence and the transcriptome. Thus, we now can roughly estimate the kinetic parameters of most reactions in a metabolic network.

Here we predict the enzymes’ abundance or their kinetic parameters if they are not empirically measured, and calculate the fluxes in the metabolic network. We calculate the enzyme abundance using the data from cancer cell line encyclopedia (CCLE). In a cancerous condition, some metabolites are accumulating, either because they are the metabolic products of the network, or because of the mismatch of production and consumption fluxes caused by mutations. We compare our predictions of metabolite concentrations and accumulations with the metabolomics data. Our modeling framework serves as the basis for the integration of other gene networks, such as small molecule regulation of proteins, which further improve model predictions.

15:18
Nonstationary Biomedical Signal Feature Extraction

ABSTRACT. With the advancements in sensor technologies, data analytics, and machine learning, the role of meaningful feature extraction is a key area of investigation especially for biomedical signals. Most of the real world signals, and especially the signals from biosensors possess long-term, non-stationary and non-linear characteristics. Signal representation, information processing and feature extraction from these signals is a challenging task. This talk will focus on five generations of signal processing algorithms developed for analysis and interpretation of biomedical signals. The talk will touch upon event analysis, spectral analysis, time-frequency domain analysis and multi-modal biomedical signal processing. Specifically feature extraction algorithms from time domain, frequency domain, signal decomposition domain, time-frequency matrix and image domains. Recent advances in using sparse signal representation and compressive sensing of long-term signals for Internet of Medical Things (IoMT) applications will also be covered. The application of the extraction and classification of features from cardiac signals (electrograms and ECG), neural signals (EEG), bio-acoustical signals (pathological voice), and sleep signals (polysomnography) will be discussed in detail. Machine learning (ML) results in using different nonstationary signal feature extraction techniques directly from time or frequency domain or joint Spatio-temporal/time-frequency signals will also be presented to highlight the key advantages in feature analysis using well-known ML techniques such as linear discriminant analysis, decision trees, support vector machines, and Naive Bayes classifiers. Comparative results to automatic feature analysis results provided by deep learning models in certain areas of cardiac and neural applications will also be presented.

15:19
Using multi-omics data and machine learning to unravel alternative splicing regulation
PRESENTER: Ulf Schmitz

ABSTRACT. Background Intron retention (IR) is a form of alternative splicing that is widespread in cells of vertebrates, insects and plants and is involved in a multitude of cell-physiological processes. IR expands gene regulatory complexity by adding new mRNA isoforms, increasing sophistication in gene expression fine-tuning via nonsense-mediated decay, and by introducing non-linear network-level dynamics. The importance of IR in humans has come into focus following landmark discoveries of dynamic IR programs in immune cell differentiation and aberrant IR patterns in cancer.

Results To investigate how IR is regulated in primary immune cells we integrated transcriptomics (mRNA-Seq) data with epigenomics data including genome-wide DNA methylation (WGBS), histone modifications (ChIP-Seq), and nucleosome occupancy (NOMe-Seq) data. Using machine learning we trained two complementary models to determine the role of epigenetic factors in the regulation of IR in cells of the innate immune system. Our results suggest that intrinsic characteristics are key for introns to evade splicing and that epigenetic marks can modulate IR levels. However, cell type-specific IR profiles are largely mediated by changes in chromatin accessibility, whereby predisposed introns in nucleosome free regions are more likely to be retained. We show that increased chromatin accessibility, as revealed by nucleosome-free regions, contributes substantially to the retention of introns in a cell-specific manner. Dynamically retained introns are involved immune response mechanisms including mmune cell adhesion and activation.

Conclusions Our results have profound implications for the analysis of other forms of alternative splicing regarding their conservation, regulation and role in normal physiology as well as in diseases such as leukemia. Our findings about epigenetic IR regulation coincide with an increasing number of studies describing pathogenic alterations in splicing regulation and therapeutic approaches targeting aberrant splicing. Therefore, our findings could inform novel epigenetic therapy development.

15:30-16:00Coffee Break
16:00-16:45 Session K7: HFSP sponsored KEYNOTE: Jörg Overmann

Abstract: Current estimates of the global bacterial diversity exceed 1 billion species. By comparison, the number of bacterial species that are described and available for direct physiological and biochemical characterization in the laboratory amounts to less than 19,000. Microbes offer insights into novel biology, drive biogeochemical transformations and are key sources of biochemical innovation. Therefore their functions and interactions need to be better resolved. Insights from state-of-the-art, culture-independent studies of bacteria in their environment can guide the development of novel cultivation approaches that better mimic their natural niche, e.g., natural substrate conditions, growth in biofilms, or exploiting signal compounds and biotic interactions. A combination with high throughput, automated methods that allow multifactorial testing of a large variety of such cultivation conditions (“culturomics”) are particularly promising. These approaches have begun to return novel types of bacteria of interest for research and applications. A currently almost untapped source for novel insights is the wealth of molecular, environmental, and phenotypic data on microorganisms that are principally available but highly dispersed, heterogenous, and non-standardized. Recent integrative database projects address these challenges, render data accessible for semantic searches and artificial intelligence analyses, and thereby offer new, digital gateways to biodiscovery.

Location: Alexander
16:00
Elucidating functions of microbial dark matter

ABSTRACT. Current estimates of the global bacterial diversity exceed 1 billion species. By comparison, the number of bacterial species that are described and available for direct physiological and biochemical characterization in the laboratory amounts to less than 19,000. Microbes offer insights into novel biology, drive biogeochemical transformations and are key sources of biochemical innovation. Therefore their functions and interactions need to be better resolved. Insights from state-of-the-art, culture-independent studies of bacteria in their environment can guide the development of novel cultivation approaches that better mimic their natural niche, e.g., natural substrate conditions, growth in biofilms, or exploiting signal compounds and biotic interactions. A combination with high throughput, automated methods that allow multifactorial testing of a large variety of such cultivation conditions (“culturomics”) are particularly promising. These approaches have begun to return novel types of bacteria of interest for research and applications. A currently almost untapped source for novel insights is the wealth of molecular, environmental, and phenotypic data on microorganisms that are principally available but highly dispersed, heterogenous, and non-standardized. Recent integrative database projects address these challenges, render data accessible for semantic searches and artificial intelligence analyses, and thereby offer new, digital gateways to biodiscovery.

16:45-19:30 Session P2: POSTER SESSION II (Even Submission Numbers)

EVEN NUMBERED POSTERS (2, 4, 6, ...;)

Location: All Grenanders
Manatee invariants entailing signal transduction pathways in Petri nets of signaling networks
PRESENTER: Marius Kirchner

ABSTRACT. Invariant analysis in Petri nets has proven to be a viable way to search for functional pathways in different kinds of biological networks. The idea of feasible transition invariants in signal transduction analysis was introduced by Sackmann et al. and shown on the mating pheromone response pathway of S. Cerevisae. Later the concept was improved upon by Scheidel et al. describing manatee invariants as a new type of invariant consisting of place invariant free combinations of transition invariants. We used the concept of manatee invariants on the network by Sackmann et al. and found additional pathways which we linked to biologically relevant pathways. Trares et al. used the concept of manatee invariants on models of the canonical and non-canonical NF-κB pathways and a combined network describing the crosstalk between both pathwaysin CD40L-stimulated B cells. The poster explains the main principles of Petri nets and manatee invariants and discusses the application to the TNFR1 pathway. We currently work on modeling of host-pathogen interactions in Petri nets and applying manatee invariant analysis on these networks. The differences in signaling pathways between cancerous cells and normal cells could be shown analytically with manatee invariant analysis of the respective signaling networks, but it would require extensive biological data or knowledge to model these networks.

Model Predictive Control of Cancer Cellular Dynamics: A New Strategy For Therapy Design
PRESENTER: Benjamin Smart

ABSTRACT. Recent advancements in Cybergenetics have led to the development of new computational and experimental platforms that enable us to robustly steer cellular dynamics by applying external feedback control. Such technologies have never been applied to regulate intracellular dynamics of cancer cells. Here, we show in silico that adaptive model predictive control (MPC) can effectively be used to steer the simulated signalling dynamics of Non-Small Cell Lung Cancer (NSCLC) cells to resemble those of wild-type cells. Our optimisation-based control algorithm enables tailoring the cost function to force the controller to alternate different drugs and/or reduce drug exposure, minimising both drug-induced toxicity and resistance to treatment. Our results pave the way for new cybergenetics experiments in cancer cells, and, longer term, can support the design of improved drug combination therapies in biomedical applications.

Machine learning-based sepsis risk prediction modeling using electronic health records of cancer patients

ABSTRACT. Sepsis is a common disease in clinical practice, and when it progresses to septic shock, the mortality rate reaches 50%. In particular, cancer patients are classified as a high-risk group for sepsis because they are often immunosuppressed due to the cancer itself and chemotherapy. Therefore, predicting the onset of sepsis in cancer patients is a clinically important issue, and predicting sepsis is essential in order to increase the patient's survival rate. This study aimed at modeling that can predict the risk of sepsis in cancer patients at an early stage using medical clinical information data. We analyzed the electronic health records (EHR) of cancer patients prior to emergency room visits (onset of sepsis). We assumed that the patterns of prescribed medications and lab test results would be different between the sepsis group and the control group, and this was demonstrated through graph network-based association analysis and statistical analysis. The relationship between the prescribed medications and the lab test information were used as additional inputs to model training, and we found that this features affect the increase in model predictive performance. When using the prescribed medications relationship as an additional input, the model accuracy was increased up to 10%, AUROC (area under the receiver operating characteristic) up to 10%, Precision up to 13%, Recall up to 10%, AURPC (area under the precision-recall curve) up to 21%, and F1 score up to 6% compared to when only the common EHR information was used. Furthermore, when additionally using the lab test information, the accuracy was increased up to 12%, AUROC up to 12%, Precision up to 18%, Recall up to 22%, AURPC up to 25%, and F1 score up to 14%. Our results can be applied to efficiently predict the sepsis risk in cancer patients, and it is expected to predict the occurrence more accurately and faster than before in the medical field.

Numerical approaches for the rapid analysis of prophylactic efficacy against HIV
PRESENTER: Lanxin Zhang

ABSTRACT. HIV remains a major public health threat. Currently, neither a cure, nor an efficient vaccine are available. However, antiretroviral drugs have been used suc- cessfully to prevent HIV infection. An important method for HIV self-protection is pre-exposure prophylaxis (PrEP). To improve PrEP, many next-generation regimens, including long-acting formulations, are currently under investigation. However, the identification of parameters that determine prophylactic efficacy from clinical, ex vivo or in vitro data is extremely difficult. Clues about these parameters could prove essential for the design of next-generation PrEP com- pounds. Mathematical models that integrate pharmacological, viral- and host factors are frequently used to complement our knowledge about prophylactic efficacy of antiviral compounds. Stochastic simulation methods are currently the gold standard for estimating prophylactic efficacy from these models. However, to obtain meaningful statistics, many stochastic simulations need to be conducted to accurately determine the sample statistics. To remedy the shortcomings of stochastic simulation, we developed a numerical method to directly compute the efficacy of arbitrary prophylactic regimen in a single run, without the need for sampling. Based on several examples with dolutegravir (DTG) -based short- and long-term PrEP, as well as post-exposure prophylaxis, the correctness of this new method and its outstanding computational performance is demonstrated. For example, a continuous 6-month prophylactic profile is computed within a few seconds on a laptop computer. Due to the method’s computational performance, we envision that the approach can greatly expand the scope of analysis with regards to estimating prophylactic efficacy, by allowing to analyse the long-term effect of prophylaxis, as well as performing sensitivity analysis.

A Boolean Modeling Framework for Drug Synergy Prediction in Breast Cancer
PRESENTER: Kittisak Taoma

ABSTRACT. Breast cancer is one of the leading causes of death in women, contributing to ~685,000 global losses in 2020. Recently, drug combinations have been proved to provide effective drug regimens and improved treatment efficacy over monotherapy. However, the discovery process of effective drug combinations is costly and time-consuming due to a large combinatorial space of available drugs. In the present study, we develop a generic Boolean model of breast cancer signaling regulation, which is subsequently extended to represent triple-negative and luminal breast cancer cells by incorporating the genomic and transcriptomic data. The sub-type specific Boolean models can capture gene expression profiles and resistance behaviors observed in drug-perturbed experiments. Finally, proxy functions based on Boolean activities of a set of proteins are derived by a genetic algorithm for predicting drug synergy. The framework can reasonably predict synergy among drugs while providing mechanistic explanations of the biological process underlying effective drug combination in cancer therapy

Fast and scalable machine learning approach for dynamic metabolic engineering

ABSTRACT. Metabolic engineering aims to produce chemicals in genetically modified microorganisms. Whereas traditional methods use constitutive or inducible promoters to express pathway enzymes, dynamic control methods aim to build regulatory circuits that respond to changes in cellular conditions. The implementation of these systems is costly and requires many trial-and-error iterations between system design and prototyping. At their core, these systems employ pathway intermediates to sense and actuate enzyme expression over the course of the culture. The key design challenge is to determine circuit architectures, i.e. which metabolite to sense and which enzymes to control, as well as the dose-response curves of the metabolite biosensors that allow to improve yield with a moderate burden on the production host. Computational simulation using differential equation models provides a low-cost option for circuit design; however, existing methods cannot simultaneously optimize both architectures and continuous biosensor parameters.

Here we present a machine learning approach to rapidly explore and identify optimal gene circuits for metabolic control. We employ a Bayesian optimization framework, which is commonly employed for tuning deep learning algorithms, to find circuit architectures and biosensors that optimize relevant design objectives. We test our method on four different models of dynamically engineered pathways, including models of allosteric and reversible reactions, systems with multiple regulatory loci, and examples of nested metabolic and genetic control. We illustrate the efficiency and scalability of this method by applying it to study robustness to growth conditions, parameter sloppiness, kinetic perturbations, and chemical toxicity factors. This method can serve as a fast screening method for dynamic control architectures prior to experimental testing.

Reinforcement learning and Monte-Carlo tree search in gene expression enable the entire gene regulation in a cell

ABSTRACT. Expression of more than 10,000 genes is precisely controlled in a human cell with adapting various situations. In systems biology, it remains unclear how genetic and epigenetic mechanisms cooperates to achieve quantitatively proper expression patterns. While recent progress in artificial intelligence indicates the importance of learning process in complex systems, conventional biology prefers to reveal causal relationships than the learning process. In this study, by assuming that cells change the expression pattern if it is inappropriate, I theoretically show that biological processes known as epigenetic regulation works as a reinforcement learning and the leaning model reproduces the changes of expression of all genes in human cells. Performing Monte-Carlo simulations, I reveal how two kinds of agents autonomously approach a target ratio through repetitions of stochastic processes of increase and decrease in an agent-based model. The results show that the increase process should be competitive amplification with a small additive noise and the decrease process should be decay depending on the difference between current and target ratios of agent numbers. In this case, each unit simply changes with an equal probability. The following stochastic differential equation represents this reinforcement learning process; dx_i/dt = A x_i/∑x_j – E(x_i/∑x_j, T) x_i + B(i, x), where x_i is the number of i-th kind of agent, the decay probability E is an approximated value from mean squared error between the current x_i/∑x_j and the target T ratios, and the bias B is a small noise. The ratio x_i/∑x_j autonomously approaches the ratio where E becomes smaller. Epigenetic regulation of two genes that have similar promotor sequences would be this learning pair process, considering that the openness of chromatin increases the accessibility of a histone acetylase. The expression ratio of many genes can be controlled in a hierarchical architecture of these learning pairs. A gene is selected by Monte-Carlo tree search through the hierarchical pairs with activating the pathway branches in competitive amplification. This is like a signal transduction. The model well reproduces the changes of whole-gene expression during human early embryogenesis, by setting the initial and target values in each pair. Gene expression changes during hematopoiesis are also reproduced in the same model without modifying any other parameters than initial and target values. In this model for whole gene expression, epigenetic regulation as the reinforcement learning is clearly distinguished from a genetically-conserved hierarchical-pair architecture. Based on these findings, I propose the law of biological inertia, which means that a living cell basically maintains the expression pattern while metabolizing its contents and achieves leaning ability, as represented by the above equation. This principle would give insights to understand various complex systems.

Timelapse of Single Cell Lipidomics
PRESENTER: Paul Jonas Jost

ABSTRACT. In recent years, high-throughput single-cell technologies have rapidly developed and given entirely new aspects to the life sciences. Today, single-cell sequencing, single-cell mass cytometry, single cell lipidomics and multi-parametric imaging allow for an in-depth analysis of the state of a biological system and have vastly improved our understanding of complex biological processes. Yet, as available single-cell omics technologies provide only a single snapshot of individual cells -- as the cells are destroyed in the measurement processes -- our understanding of the dynamics of individual cells (e.g. time and dose response) is still severely limited.

Here, we propose a framework that allows for the analysis of the dynamics of a single cell from a single measurement time point. To achieve this counter-intuitive goal, we will combine a novel single-cell lipidomics approach with model-based analysis. Our lipidomics approach makes use of multiple labels, which are introduced at different time points or with different perturbations. Additionally, the modeling approach allows for a detailed insight into the dynamics of metabolites. Under mild assumptions, a single measurement of multiple labels can be mapped to e.g. multiple time points of a single dynamic, and thus provide multiple measurements of said dynamic. Therefore, a single destructive measurement can still provide information about multiple timepoints. We use synthetic data as a proof of concept and to show that we can accurately retrieve single parameters allowing to quantify cell heterogeneity of cells while showing that non identifiably parameters still provide robust dynamics.

Efficient brute-force model selection by iterative elimination of less useful model subspaces
PRESENTER: Dilan Pathirana

ABSTRACT. Model selection is a common task in systems biology, and more broadly statistical inference, wherein different models are compared to find the most useful model. This can often involve a single superset model, from which the model space is generated by disabling model components. The model space grows exponentially with the number of model components; hence, a brute-force strategy is often computationally infeasible. In such model spaces, the likelihood of any model is an upper bound on the likelihood of all subset models.

We will present algorithms that use this bound to efficiently eliminate less useful model subspaces, to find the most useful model in a model space more efficiently. The algorithms are suitable for use with model selection criteria, with specific formulations provided for the Akaike information criterion (AIC), the corrected AIC (AICc), and the Bayesian information criterion (BIC). We will also present application examples in systems biology.

Candida expansion in the human gut is associated with an ecological signature that supports growth under dysbiotic conditions

ABSTRACT. The overgrowth of Candida species in the human gut is considered a prerequisite for invasive candidiasis. However, the reason that many individuals with high levels of gastrointestinal Candida do not develop systemic candidiasis is unclear. We integrated mycobiome and shotgun metagenomics data from stool of 75 patients at risk but with no systemic candidiasis, to determine the role of gut bacteria in shaping mycobiome composition. In addition, we developed machine learning models that used only bacterial taxa or functional relative abundances to predict the levels of Candida in an external validation cohort with an area under the curve of 78.6-81.1%. Last, we proposed an intriguing mechanism for Candida species overgrowth involving changes in short-chain fatty acid producing-bacteria and oxygen levels.

Variance of filtered signals: characterization for linear reaction networks and application to neurotransmission dynamics
PRESENTER: Ariane Ernst

ABSTRACT. Neurotransmission at chemical synapses relies on the calcium-induced fusion of synaptic vesicles with the presynaptic membrane. The distance to the calcium channels determines the release probability and thereby the postsynaptic signal. Suitable models of the process need to capture both the mean and the variance observed in electrophysiological measurements of the postsynaptic current. In this work, we propose a method to directly compute the exact first- and second-order moments for signals generated by a linear reaction network under convolution with an impulse response function, rendering computationally expensive numerical simulations of the underlying stochastic counting process obsolete. We show that the autocorrelation of the process is central for the calculation of the filtered signal’s second-order moments, and derive a system of PDEs for the cross-correlation functions (including the autocorrelations) of linear reaction networks with time-dependent rates. Finally, we employ our method to efficiently compare different spatial coarse graining approaches for a specific model of synaptic vesicle fusion. Beyond the application to neurotransmission processes, the developed theory can be applied to any linear reaction system that produces a filtered stochastic signal.

Analyzing human E3 ligome for efficient design of PROTACs
PRESENTER: Arghya Dutta

ABSTRACT. E3 ubiquitin ligases play a critical role in maintaining protein homeostasis by protein-degradation via the ubiquitin–proteasome system. Importantly, they provide the crucial substrate specificity to target and degrade specific proteins. As a result, E3 ligases have become promising candidates in the design of novel therapeutics. Here we compile and annotate a consolidated dataset of E3 ligases to build the human E3-ligome. We then integrate disparate datasets at various granularity layers, such as protein sequence, domain architecture, 3D structure, function, cellular localization, and tissue expression for metric learning. The optimized distance-metric reproduces experts' classification of known E3 ligases and expands it to previously uncharacterized ligases. Clustering using this optimized metric provides a better understanding of conserved and auxiliary features of unbiased E3 ligase classes and sub-classes, leading to the design of new E3 handles for Proteolysis targeting chimeras (PROTACs).

Mechanistic modeling of patient-specific response to hormone therapy and CDK inhibitors in breast cancer

ABSTRACT. The majority of breast carcinomas are hormone receptor-positive and negative for HER2. Hormonal therapy, which either inhibits the activity of the estrogen receptor or the production of estrogen, is the backbone for patients with ER-positive breast cancer. In addition, CDK4/6 inhibitors, a novel class of drugs that target the cell-cycle machinery, have shown improved results in combination with hormone therapies compared to only hormone therapy. Yet, the response between patients to these treatments remains heterogeneous.

To analyze the patient-specific response to CDK4/6 inhibitors and hormone treatment, we built upon a mechanistic protein signaling model, which captures the core pathways affected by these drugs. The model is based on ordinary differential equations and individualized to cell lines and patients using gene expression data. We first calibrate the model on publicly available molecular data and viability assays from breast cancer cell lines. We then apply the calibrated model to simulate the response of breast cancer patients from the CORRALLEEN clinical trial (NCT03248427), which have been treated with the hormonal agent Letrozole and the CDK4/6 inhibitor Ribociclib. The simulated response correlates with proliferation after treatment identifies patients at a higher risk of relapse, highlighting the usefulness of mechanistic models in studying patient response to breast cancer treatments.

Harnessing cancer heterogeneity for the systematic discovery of treatable cancer-driver exons with spotter

ABSTRACT. Alternative splicing shapes the regulatory and functional diversity in the cell. Cancer cells tend to select alternative splicing programs involved in tumor progression. However, while therapies based on targeting splicing events have been developed to treat cancer and other diseases, the systematic prioritization of potential disease-driver targets still remains unaddressed. Here, by using publicly available gene-level cancer dependencies from RNAi viability screens across 713 cancer cell lines, we define 140,310 exon-level linear models using splicing profiles and mRNA levels. We then identified cancer-driver exons as the ensemble of models that best prioritized experimental cancer dependencies across individual samples, which we call spotter. The 1,073 selected models corresponded to exons that mostly disrupt their gene's ORF or create new isoforms. These exons belong to genes related to the splicing machinery and cell proliferation and show a low rate of aberrant mutations. Interestingly, our ensemble model inferred the effects of single and multiple splicing perturbations on cell proliferation. Integrating pharmacological screens with our predicted splicing-level dependencies, we uncovered cancer-driver exons that mechanistically mediate drug sensitivity and synergize with drug effects. In patients, our ensemble model can not only aid the systematic prioritization of splicing targets across 14 different types of cancer but also identify putative splicing events driving patient response upon drug treatment or pinpoint susceptible splicing events at single-patient resolution. Taken together, in silico RNA isoform screening with spotter sheds light on the weak spots of cancer samples at the splicing level and holds the potential to be implemented for personalizing treatments.

A training strategy for hybrid models to break the curse of dimensionality: An application in mortality estimation for cohorts of COVID-19 patients
PRESENTER: Moein E. Samadi

ABSTRACT. A hybrid mechanistic/data-driven model combines mechanistic or physics-based equations that describe available process knowledge with data-driven approaches such as Machine Learning. In comparison to sole Machine Learning models, hybrid models promise a low demand for training data alongside the ability to extrapolate beyond the validation data domain.

In this work, we introduce a supervised learning strategy for tree-structured hybrid models to perform a binary classification task. Given a set of binary labeled data, the challenge is to use them to develop a model that accurately assesses labels of new unlabeled data. Our strategy employs graph-theoretic methods to analyze the data and deduce a function that maps input features to output labels.

Our focus here is on data sets represented by binary features in which the label assessment of unlabeled data points is always extrapolation. Our strategy shows the existence of small sets of data points within given binary data for which knowing the labels allows for extrapolation to the entire valid input space. An implementation of our strategy yields a notable reduction of training-data demand in a binary classification task compared with different supervised machine learning algorithms.

As an application, we have fitted a tree-structured hybrid model to the vital status of a cohort of COVID-19 patients requiring intensive-care unit treatment and mechanical ventilation. Our learning strategy yields the existence of patient cohorts for whom knowing the vital status enables extrapolation to the entire valid input space of the developed hybrid model.

Deep insight into SABIO-RK data via visualization
PRESENTER: Dorotea Dudas

ABSTRACT. SABIO-RK (http://sabiork.h-its.org) is a manually curated database for biochemical reactions and their kinetic properties with data extracted from scientific literature published since the 1960s until now. The growing amount of data in the database requires a better overview of the database content, especially of the kinetic parameter distribution. In order to facilitate interactive search and data refinement through the SABIO-RK data, a new visualization module is developed. Its goal is to improve the understanding of the database content and detect possible discrepancies between kinetic parameters from different publication sources. It is meant to support both modellers and experimentalists to extract the highest possible amount of information from accumulated and orderly presented data. Clustering and grouping of the data (e.g. kinetic parameters, EC numbers, environmental conditions) is implemented according to the needs of SABIO-RK users and curators. Curators can easily identify clusters or outliers of kinetic parameter values by navigating through the different visualizations. Reactions, proteins/enzymes, organisms, tissues and experimental conditions (pH and temperature) are included within three different visualization concepts representing a heat map overview, parallel coordinates and a scatter plot matrix with histograms. Since each database entry can contain several kinetic parameters (with its types, values, units and associated species) they are shown in two separate visualizations. This improves the possibilities of exploring the kinetic data and its connections to the rest of the data in SABIO-RK. Data can be visually adjusted by determining what exactly is shown within the graphs, by reordering the data and by selecting different color schemes for the visualizations using the user interface.

The new visualizations enable navigating through the database without the need to know much about available keywords in the database or about manually composing search queries i.e. with minimal prior knowledge thus making the data available for the wider user circle.

Improving drug-induced transcriptional descriptors and their biological connectivity

ABSTRACT. Compound bioactivity signatures allow the description of small molecules according to the biological effects that they exert, providing complementary opportunities to current drug discovery strategies. The Chemical Checker (CC) resource provides a rich collection of bioactivity signatures, including drug-induced transcriptional changes, enabling the assessment of functional similarities between small molecules. Here, we present a novel characterization of small molecules based on the gene expression changes that they induce in different cell lines. By recomputing the gene expression signatures of the compounds and filtering the unrobust ones, we show how our bioactivity signatures better characterise and preserve the biological coherence of the raw data, improving those in the Chemical Checker repository. Finally, we train a neural network to integrate any novel differential gene expression experiment with the corpus of available drug-induced gene expression signatures, connecting biology and chemistry.

Dynamics of growth and ribosome level in batch cultures of S. cerevisiae
PRESENTER: Yu Huo

ABSTRACT. Cells double their proteins during each cell cycle, and ribosomes are responsible for this protein production. Previous research has shown a linear correlation between the mass fraction of ribosomes and the specific growth rate during exponential growth in both E. coli and S. cerevisiae. Nevertheless, how the levels of ribosomes change and how those changes couple to growth rate beyond the exponential stage are unclear. Here, we monitor the dynamics of both growth and ribosome levels in S. cerevisiae using microplate readers and estimate the effective translation rate over time. We show that the effective translation rate remains constant during the early phase of growth on various sugars, does not change under energy stress imposed by a weak acid, yet decreases when ribosome-targeting antibiotics are present. Our results suggest the existence of an empirical upper limit to the effective translation rate and provide details of the ribosomal dynamics of yeast in batch cultures, which should allow us to extend existing self-replicator models beyond steady-state conditions.

Biological-informed multi-modality integration of unmatched single-cell-omics datasets to provide a multi-dimensional view
PRESENTER: Lea Seep

ABSTRACT. Most of nowaday’s research projects generate rich sets of experimental data to capture an in-depth state of the experimental system. Especially single-cell techniques capture heterogeneous cell states focusing mostly on single but vastly differing characteristics of such states (also termed modalities), e.g. single-Cell transcriptomics or proteomics.

Various new approaches exist for the integration of such data, distinguishing between methods in need of matched data (matching referring to different modalities measured within the same cell) and methods trying to perform such matching initially. The integration of different modalities enables a multi-layered view of a single cell’s state within a heterogeneous population. These approaches are mostly based on mathematical and machine learning techniques, such as a latent space alignment within an autoencoder framework, which are in principle able to follow the non-linear path of the transcription, translation, and regulation processes within a cell. However, the vast majority of generated data is unmatched and the methods widely used within the community lack the explicit incorporation of biological constraints.

Here we propose a scalable deep-learning approach that enables the integration of different data modalities and additionally employ model-based priors to exploit the potential to account for complex relationships between modalities within cells. The use of deep autoencoders allows integration of unmatched data. Biological priors are derived from the biological relation of the differing multi-modalities, e.g. mRNA and protein measurements, which enforce a biological-informed alignment of the latent space. This allows a biologically meaningful outcome of the single-cell multi-modality integration.

The subsequent assessment of the trained model using interpretable artificial intelligence techniques provides novel mechanistic insights between integrated modalities. Finally, the use of autoencoders allows the translation of modalities into each other which enables to close gaps due to limited data availability.

Hybrid cellular automaton representation of the influence of mechanical stimulation on the development of bone metastases
PRESENTER: Claire Villette

ABSTRACT. Bone metastases (BMs) are among the most debilitating complications for cancer patients. They are associated with poor prognosis and are often incurable. BMs develop through cancer-induced perturbation of the inherent bone remodelling process, which is responsible for healthy bone integrity through balanced resorption of old/damaged bone and formation of new tissue. Osteolytic BMs interfere with this balance in a vicious cycle whereby cancer cells favour bone resorption. Growth factors are released from the degraded matrix and enhance tumour growth, which in turn intensifies bone resorption. Mechanical loading naturally induces an opposite shift to the remodelling balance by stimulating bone apposition. Early in-vitro and in-vivo experiments suggest a therapeutic potential for mechanical stimulation against metastases in bone [1].

The aim of this study was to develop a computational model of load-induced bone remodelling in the context of cancerous metastases, in order to screen for loading regimens with potential therapeutic benefits. This model aimed to recapitulate five main processes at play in this context: differentiation and/or proliferation of healthy cells (osteogenic cells and osteoclasts) and cancer cells, osteogenic/osteolytic activity of healthy cells, influence of cancer cells on healthy cell activity, influence of mechanical stimulation on cellular activity, and changes in mechanical environment due to osteogenesis/osteolysis.

A hybrid cellular automaton framework was implemented in FEniCSx to support this model. Cellular events (proliferation, differentiation, migration) were modelled using a cellular automaton on a regular grid of 10 micrometer resolution. In parallel, separate partial differential equation (PDE) problems were defined to represent the mechanical environment in response to loading and the diffusion of osteoprotegerin (OPG), receptor activator of NF-jB ligand (RANKL), and parathyroid hormone-related protein (PTHrP) signals. The PDEs were solved using Finite Element solvers on linear elements with a mesh resolution of around 5 micrometers. Osteogenic cell secretion of OPG increased in response to increased von Mises stress. In response to PTHrP signals, OPG secretion reduced, and osteolytic activity increased. Osteogenic and osteolytic activities resulted in changes in the system mechanical properties, which in turn influenced local von Mises stress.

This model was evaluated against qualitative observations from in-vitro experiments. It proved capable of capturing changes in extra-cellular matrix deposition by osteoblasts seeded in hydrogel in response to loading, as well as changes in OPG/RANKL expression ratio in response to loading and presence of cancer cells [2, 3].

Next development steps include quantitative model calibration and simulation of different mechanical loading scenarios for comparison of therapeutic benefits in terms of cancer cell proliferation and activity.

[1] Lynch et al., 2013. Journal of Bone and Mineral Research, 28(11), pp.2357-2367. [2] Mc Garrigle et al., 2016. Eur Cell Mater, 31, pp.323-340. [3] Curtis et al., 2020. Journal of the Royal Society Interface, 17(173), p.20200568.

A framework for predicting the effect of multiple factors on HIV protection from heterosexual transmission through truvada-based pre-exposure prophylaxis (PrEP) in women

ABSTRACT. Despite intensive interventions, HIV epidemic continues to spread, counting around 1.5 million new infection in 2020. The Sub-Saharan region accounts for around half of worldwide infections, of which the 60% is women and young girls. To tackle this crisis, existing antivirals have been repurposed to prevent new infections in healthy individuals, a strategy named Pre-esposure prohpylaxis (PrEP). The general consensus reports high efficacy of Truvada when administered once-daily. However, PrEP has been widely associated with suboptimal adherence. Many studies have reported adequate levels of protection with “on-demand” adherence, consisting of circa 4 doses per week, in men. This seems not to be the case for women, where only higher levels of adherence are associated to satisfactory protective levels. There is no available data that highlights the underlying causes leading to such different efficacy outcomes for “on-demand” PrEP in women. The lack of global consensus on PrEP guidelines for specific risk groups highlights the need to further investigate such underlying mechanisms.

In this work, we present a framework for simulating prophylactic efficacy estimates from heterosexual transmission through Truvada-based PrEP, integrating a set of factors that could play a role in the efficacy outcomes of PrEP in women. Realistic adherence patterns from women are used to simulate pharmacokinetic trajectories. The framework further allows to integrate different exposure routes (Receptive Anal Intercourse (RAI) and/or Receptive Vaginal Intercourse (RVI)). The generated pharmacokinetic trajectories can be adjusted by using data extracted from literature, in an attempt to represent local levels concentrations (e.g. cervicovaginal tissue concentrations). A model for the molecular mechanism of action of emtricitabine triphosphate (FTC-TP) and tenofovir diphosphate (TFV-DP) is used to estimate the direct drug effect of the combination therapy on the viral dynamics.

Altogether, these components are integrated by using a recently developed numerical method, which ultimately allows to compute prophylactic efficacy (the relative reduction in infection probability per exposure). These factors can be tested in combination, allowing to test co-occurrence and its implications. For instance, adherence-prophylactic profiles can be generated and used to derive a statistical estimate of the prophylactic efficacy in a population of individuals that takes PrEP according to particular adherence patterns. This analysis can be further extended by considering tissue-specific drug concentrations and/or exposure site (RAI and/or RVI), illustrating how this would translate into PrEP efficacy. Preliminary results highlight time windows of highly variable protection (ranging from 30% to 90% prophylactic efficacy). These time-windows appear when ≥ 3 consecutive doses were missed prior to viral exposure.

With this comprehensive framework, the interplay of multiple factors and their effect on PrEP efficacy in women will be simulated and presented to give an overview of how different hypothesis could play a role on their own and/or in combination.

Investigating antagonistic chromatin remodeling by the Swi-Snf and Tup1-Cyc8 (Ssn6) complexes in yeast

ABSTRACT. Swi-Snf is an ATP-dependent chromatin remodelling complex which generally acts as a co- activator of gene transcription via its removal of promoter nucleosomes. Conversely, Tup1- Cyc8 (Ssn6) is a co-repressor complex which acts to repress transcription by positioning nucleosomes at gene promoters. The antagonistic activity of these two complexes has been investigated at only a handful of genes, including the FLO1 and SUC2 genes. We have identified all of the genes in Saccharomyces cerevisiae which are subject to co-regulation by these two complexes and have mapped the Snf2 and Tup1 proteins across the genome to identify genes directly under the control of Swi-Snf and Tup1-Cyc8. The impact upon chromatin structure of target genes by Tup1-Cyc8 and Swi-Snf has also been shown. The co- regulated genes are enriched for stress-response genes, and 30% of these genes reside in subtelomeric regions. Furthermore, the co-regulated subtelomeric genes are the most robustly regulated genes under the control of these two complexes. The data has revealed two potential models for the chromatin remodelling activities by Swi-Snf and Tup1-Cyc8 at co-regulated genes. In one model, Swi-Snf is recruited to Tup1-Cyc8 repressed genes in the absence of the co-repressor to activate transcription. In the second model, Snf2p and Tup1p both occupy the repressed gene, whilst gene activation correlates with an enrichment of Snf2p at target genes in the absence of Tup1. Thus, this study has identified (i) which genes are under control of the Swi-Snf activator and the Tup1-Cyc8 co-repressor, (ii) where Snf2p and Tup1p are located across the genome, and (iii) how these complexes remodel the chromatin at target genes.

Theory of biochemical information processing with transients
PRESENTER: Manish Yadav

ABSTRACT. Cells in tissues and organisms operate in dynamic environments, continuously sensing and responding to time-varying chemical signals. In order to accurately interpret the complex information from their environment, biochemical networks in single cells actively process these extracellular signals in real-time. The current concept of biochemical computations places a strong focus on attractor-based information processing in cells. Recent studies, however, have shown that cells generate completely opposite phenotypic responses depending upon the frequency of the growth factor, independent of growth factor identity. This breaks down the steady-state description of biochemical information processing. Therefore, we propose to describe biochemical networks embedded in non-stationary environments as non-autonomous systems whose solutions are the dynamic input-dependent trajectories. We show that memory arising through metastable states on the level of the input layer of the biochemical network will enable the system to integrate time-varying signals such that, inputs resulting in different phenotypic responses will be uniquely encoded in phase-space trajectories. The extracellular information of different phenotypes is spread throughout the large signaling networks and represented by characteristically different classes of phase-space trajectories. This encoded information will further be decoded downstream by early response genes (ERG) in real-time, where we show that the feed-forward structure of ERG is sufficient for this task.

An optimal RNA growth law and its relationship with genome organization in bacteria
PRESENTER: Xiao-Pan Hu

ABSTRACT. The distribution of cellular resources across bacterial proteins has been quantified through phenomenological growth laws; for example, the content of ribosomal proteins increases linearly with growth rate. Here, we describe a complementary bacterial growth law for RNA composition, emerging from optimal cellular resource allocation across ribosomes and the complex of tRNA and elongation factor Tu. The predicted decline of the tRNA/rRNA ratio with growth rate agrees quantitatively with experimental data for diverse fast-growing microbes. We find that its regulation is implemented in part through chromosomal localization: rRNA genes are typically closer to the origin of replication than tRNA genes; due to replication-associated gene dosage effects, rRNA genes thus show increasingly higher relative gene dosage at faster growth. At the highest growth rates in E. coli, the tRNA/rRNA gene dosage ratio based on chromosomal positions is almost identical to the observed – and theoretically optimal – tRNA/rRNA expression ratio, indicating that the chromosomal arrangement has evolved to favor maximal transcription of both types of genes at this condition. These insights, which quantify the links between cellular resource allocation, growth, and genome organization, may aid in the rational genomic design of efficient synthetic biological systems.

Decrypting drug actions and protein modifications by dose- and time-resolved proteomics
PRESENTER: Florian P Bayer

ABSTRACT. Most drugs act on proteins, are proteins themselves, lead to the production or degradation of proteins, or otherwise use the protein machinery of a cell to exert their therapeutic effects. Protein post-translational modifications (PTMs) can serve as molecular markers for target engagement, can identify drug-modulated pathways, and can highlight downstream responses of the cell. However, little systematic information is available about a drug's mode of action (MoA) at the level of PTMs, even though many drugs work by modulating the activity of enzymes that regulate PTMs, such as kinase inhibitors. Here, we present a quantitative chemical proteomics approach, termed decryptM, that is able to assess target and pathway engagement by measuring dose-resolved modulation of PTMs on a proteome-wide scale. Briefly, cells were treated with increasing doses of a drug, and each perturbed proteome was encoded by stable isotopes (TMT). Tryptic peptides bearing PTMs were enriched and then measured by liquid chromatography tandem mass spectrometry (LC-MS/MS). The dose-response characteristics for each PTM peptide were derived from the intensities of the TMT reporter ions. Non-linear regression of a 4-parameter sigmoidal curve yielded significant effective concentration (EC50) values, effect sizes, and effect direction, which, taken together, are more informative than a conventional fold-change analysis with a single-dose vs. control setup. In total, we collected data for 31 drugs representing six drug classes in 14 human cell lines. The presented data set comprises more than 1.8 million quantitative assays of regulated and unregulated PTMs. The quantitative precision of the approach allows the interpretation of a single significant PTM drug response. Additionally, not regulated PTMs are equally informative for understanding drugs and pathways in cells systematically. DecryptM profiles could group drug-regulated PTMs into pathways. Furthermore, the correspondence between drug-target affinity and drug-PTM potency was preserved, showing that target engagement and pathway engagement are intimately linked. This close coherence raises the possibility of placing functionally uncharacterized PTMs into known pathways. Comparative analysis of 10 kinase inhibitors in the same cell line revealed clear dose-driven signatures of drug response on oncogenic pathways, identifying common and drug-specific effects. DecryptM signatures are also cell-type specific, highlighting the molecular heterogeneity of cancer. For epigenetic drugs, decryptM profiling highlighted regulated acetylation events that enabled delineating the different activities and substrate specificities of HDAC complexes. The analysis of protease inhibitors at the level of the ubi- and phospho-proteomes revealed mechanisms leading to the activation of the unfolded protein response system over time. This presentation will highlight both technical and biological aspects of our novel mass spectrometry-based approach, DecryptM, and outline why it constitutes an important new tool for drug discovery and chemical systems biology in general.

Machine learning based pathway deregulation analysis of metabolomics data for Parkinson's Disease

ABSTRACT. Parkinson’s Disease (PD) is a complex and heterogeneous disorder, influenced by both genetic and environmental factors. Accurate diagnosis of PD is still a challenge and even after the onset of clinical motor symptoms, misdiagnoses can still occur. Machine learning analysis of blood plasma metabolomics data may provide a means to identify molecular signatures associated with PD diagnostic status.

Here, the goal was to build machine learning models for motor-stage PD vs. control classification which are both robust and biologically interpretable. We investigated global cellular pathway alterations of metabolomics data through multiple aggregation statistics and dimension reduction methods, which summarize the abundance information from pathways’ metabolite members into global fingerprints of pathway activity. These pathway activity fingerprints were then cross-validated and tested on hold-out data as predictors for classification of PD patients and controls using machine learning methods, while accounting for common confounders. We compared the resulting models’ predictive performance and most informative features derived from both the pathway-based data representations and the original metabolite features.

Overall, our results suggest that blood plasma metabolomics data contains significant predictive information for sample classification and that a pathway-based modeling approach can reveal robust and interpretable global deregulations in cellular processes. Additional targeted measurements of the observed metabolite alterations and further validation on independent studies will be needed to corroborate the results.

Modeling the tumor microenvironment in patient-derived organoid culture

ABSTRACT. Patient-derived organoids are a model of choice to elucidate inter- and intratumoral heterogeneity to combat therapy resistance. However, their utility is limited by heterologous and poorly-defined extracellular matrices and lack of proper tumor microenvironment, thus failing to model the tumor in its complexity.

Here, we present an approach to identify relevant paracrine interactions between stromal and tumor cells in colorectal cancer. Single cell-RNAseq data of 12 patients were analyzed for ligand-receptor pairs enabling stroma-to-tumor signaling. Physiological relevance was tested by adding stroma-derived ligands to the organoid culture, followed by mass cytometry and scRNAseq analysis. We also aimed to model extracellular matrix composition in colorectal cancer by supplementing the laminin/collagen IV rich environment with other known matrix proteins such as collagen I to identify the impact of a changing substrate on cell plasticity.

We identified paracrine factors and signals affecting proliferation, differentiation, and developmental trajectories of patient-derived organoids in vitro. We hypothesize that environmental factors may limit the phenotypic space in which organoid cells differentiate, disabling the study of more invasive behaviors in vitro. We show that extracellular matrix parameters have a strong impact on cell plasticity and highlight the importance of adjusting and expanding organoid in vitro culture models.

Our data provide guidelines to improve existing tumor organoid models and provide a feasible approach to address common limitations in organoid culture. Based on our findings, we currently identify factors that can interfere with drug efficacy and potentially favor clinically relevant therapy resistance mechanisms.

Accelerating whole-cell modelling with surrogate machine learning models
PRESENTER: Ioana Gherman

ABSTRACT. Whole-cell models are mathematical models designed to capture the function of all genes and core processes within a cell. Developing whole-cell models is seen as a grand challenge of the 21st century, and although explored for over a decade, only two partially complete models have been published to date, for the bacteria Mycoplasma genitalium and Escherichia coli. The interest in whole-cell models stems from their ability to provide an integrated picture of diverse processes within a cell, uncover novel cellular phenotypes, and understand the behavior of engineered cells (e.g., those containing new metabolic pathways or having genes knocked out) for biotechnology purposes (e.g., bioproduction). A major bottleneck when running whole cell models is the computational demand of simulations. A typical scenario would be to run tens of thousands of simulations required to understand the effect of changes to a cell and to engineer applications like genome design, where we attempt to augment or alter the core functionalities of the cell. Such experiment would require several months to run on a supercomputer. Here, we aim to address such computational challenges by building a ‘surrogate’ of a whole-cell model that uses a machine learning approach to reduce the number of simulations needed to predict specific phenotypes from genotypes. We explore the feasibility of using supervised and unsupervised learning in the context of surrogates for whole-cell models. Preliminary results demonstrate that these algorithms can both speed-up simulations in specific circumstances and be used to uncover interesting dynamics of cellular phenotypes, that would be nearly impossible to assess with current experimental methods. Surrogate models may hold the key to making whole-cell modelling practical for studying cellular biology and bioengineering with constrained computational resources. Furthermore, they may help to improve the accessibility of this powerful modelling technique (i.e. whole-cell modelling).

A systems pharmacology approach reveals robust drug metabolism and altered glucuronide disposition in a mouse model of liver cirrhosis
PRESENTER: Rebekka Fendt

ABSTRACT. Liver cirrhosis impairs the liver’s function and alters drug absorption, distribution, metabolism, and excretion (ADME). Therefore, drug doses for patients with liver cirrhosis might need adjustment to ensure efficacious and safe pharmacotherapy. However, the effect of cirrhosis on pharmacokinetics (PK) is not fully understood. We investigated PK in a mouse model of liver cirrhosis with a systems pharmacology approach consisting of physiologically based pharmacokinetic (PBPK) model predictions and experimental validation.

Liver cirrhosis in mice was induced by repeated injections of carbon tetrachloride (CCl4, twice per week). After 12 months, the mice were administered a drug cocktail of caffeine, codeine, midazolam, pravastatin, talinolol, and torsemide. The drugs served as probes for the metabolic enzymes Cyp1a2, Cyp2d22, Cyp3a11, Oatp1b2, and the drug transporters Mdr1 and Cyp2c29. PBPK models were established for all compounds and applied to simulate reduced drug metabolism, altered drug transport, and further cirrhosis-associated pathophysiologies.

The expression of CYP1A, a marker for liver function, was reduced in liver sections of CCl4-treated mice. PBPK model simulations with reduced metabolic enzyme activity predicted increased parent drug concentrations and reduced production of metabolites. Surprisingly, the PK of most drugs was not significantly altered in cirrhotic mice and in vitro assays of liver microsomes also suggested functional drug metabolism.

Furthermore, RNA expression of the drug transporters Oatp1b2 and Mdr1 in the livers of CCl4-treated mice was significantly altered. The pravastatin and talinolol PBPK models predicted only a minor influence of the altered transporter expression on PK, which was in line with the observed data.

However, concentrations of glucuronidated metabolites formed in phase 2 of the biotransformation were increased in cirrhotic mice. We hypothesized that either (1) glucuronosyltransferase activity was increased, (2) biliary excretion was impaired, or (3) basolateral export was increased. PBPK simulations showed that all three mechanisms could explain altered glucuronide disposition. Experiments revealed increased RNA expression of basolateral glucuronide transporters. Therefore, we concluded that increased basolateral export probably caused altered glucuronide disposition.

The CCl4 mouse model recapitulated many features of liver cirrhosis, but drug metabolism was surprisingly robust. In this respect, the mouse model for cirrhosis might differ from patients, who often show reduced metabolic clearance. On the other hand, it is also common that PK in liver cirrhosis patients is less affected than predicted by PBPK simulations which might indicate compensational mechanisms [1].

Experiments and PBPK modeling mutually contributed to a deeper understanding of pharmacokinetics in a mouse model of liver cirrhosis. PBPK modeling linked altered expression to functional impact and helped to explore scenarios that could not readily be tested experimentally. Reference: 1. Heimbach, T., et al., Physiologically-Based Pharmacokinetic Modeling in Renal and Hepatic Impairment Populations: A Pharmaceutical Industry Perspective. Clin Pharmacol Ther, 2021. 110(2): p. 297-310.

Predicting developmental states in zebrafish using transfer learning

ABSTRACT. Understanding how cells make decisions and change over time is an important question in developmental biology. Recent advances in single-cell technologies allow for the thorough and unbiased characterization of molecular states across developmental stages. Yet, these techniques can only provide static snapshots of the cellular dynamics, revealing a ‘cell state’ in gene expression space. Many computational methods for trajectory inference have been developed to construct a pseudo-temporal ordering of the cells according to their transcriptomic profiles. But these approaches are descriptive in nature and unable to produce in-sample or out-of-sample predictions. Recently, generative models have shown great success in out-of-sample predictions, but remain limited to perturbation response and batch removal. To address these limitations, we present Dcp (deep cell predictor), a transfer learning approach based on variational autoencoder, and normalizing flows over single-cell transcriptomic data. Dcp models cell transitions in distinct lineages during early zebrafish development. We show that the model accurately predicts gene expression changes across developmental stages. We implement Dcp in embryonic development and in an adult stem cell system. Further, we demonstrate that the predictability of cell states depends upon shared information between lineages in a biological system.

Expanding the disease network of Glioblastoma Multiforme via topological analysis
PRESENTER: Apurva Badkas

ABSTRACT. Even among cancers, Glioblastoma Multiforme (GBM) is a challenge. Classified as a grade 4 glioma, it is one of the most common forms of brain cancers, with poor prognosis and limited therapy options. Understanding the molecular players causing the underlying heterogeneity is a key step in expanding therapeutic arsenal for GBM. Several computational methods have explored GBM, however, these are top-down approaches and are limited by the challenge of obtaining adequate number of disease and control datasets and require comprehensive data integration/batch correction efforts. A complementary, bottom-up, network approach is presented in this study which is based on minimal inputs, and two centrality measures – betweenness and eigenvector centrality. Using publicly available protein-protein interaction (PPI) dataset, the method corrects for degree bias commonly encountered in the network analysis methodologies. It highlights several topologically important key nodes in periphery of the known GBM genes. 26 out of the 36 top ranked genes have been linked to glioma/GBM in literature. Several of these candidates are also found to be differentially expressed between other gliomas and GBM. The method proposes to expand the list of GBM associated genes. Additionally, some of the highlighted candidates are known drug targets. Thus, establishing the role of these candidates in GBM patients can help expand the available drug repertoire for GBM.

Single cell pseudo-time inference based on copy number variants for identifying a tumor transition state
PRESENTER: Jonghyun Lee

ABSTRACT. Recent in-depth pan-cancer analysis revealed that there is no apparent universal cause of cancer; driver mutations and the rate of mutation accumulation are all unique to the types of tissue and organ which cancer originates from. Nonetheless, transformation events would have happened to alter a healthy tissue into a tumor. We focused on a universal framework for this normal-to-cancer transition where various causes of cancer are implicitly included, from which a transition state between healthy and tumor can be identified and potential targets for anti-cancer treatments can be predicted. There have been extensive studies on transition states for cellular development, differentiation and reprogramming. Typical pseudo-time analysis based on the single-cell transcriptomic data infers the trajectory of the cellular development from stem cells to differentiated states, and further extended concepts such as RNA velocity assign which direction the cells are likely to progress. However, while the differentiation process occurs along the same genetic background, the tumorigenesis progresses along with the accumulation of genetic changes. Therefore, transcriptional profiles are insufficient in describing the transition of normal to tumor cells. One of the well-established factors regarding the genetic changes during tumorigenesis is the accumulation of aneuploidy. Copy number variation (CNV) inference algorithms such as InferCNV and CopyKat deduce the aneuploidy from the single-cell expression data. Based on these algorithms, one can not only differentiate tumor cells from normal cells but also uncover an intermediate clone that consists of the mixture of normal and tumor cells, that is, a tumor transition state. By utilizing the CNV accumulation as a means to identify the transition state, we were able to construct and define transition states in three different types of cancer: Breast, Lung and Colon. We describe the CNV profiles of the transition states in different cancers, and the genes affected by these aneuploidies. Our algorithm identifies a group of cells that were previously uncharacterized in cancer research. Possible drug targets identified from the cells in the transition state during tumorigenesis have strong implications in cancer prevention in both onset and remission and are the first step in personalized treatment and medicine in cancer treatment.

Single cell analysis of a breast pre-cancerous state to explore critical tumor microenvironment interactions
PRESENTER: Juyeon Cho

ABSTRACT. Understanding how cells in a pre-cancerous state orchestrate the microenvironment for the development of cancer is a fundamental and challenging task. The recent development of single-cell technology enabled the exploration of an intermediate state during cell fate changes, such as cell differentiation, reprogramming, and tumorigenesis. Recently, studies have focused on the cell type distributions or characteristic features of normal breast tissues, a pre-cancerous state, from BRCA1 mutation carriers by exploiting single cell analysis. In this study, we utilized time-resolved single-cell RNA sequencing data from mouse models of breast cancer with BRCA1 mutations to investigate dynamical changes of the cell-cell network from normal to a pre-cancerous state and breast cancer, induced by BRCA1 loss-of-function. To extract significant cell-cell interactions in an unbiased manner, we utilized survival analysis to identify the so-called 'critical ligands', which are key ligands associated with the survival of cancer patients in TCGA datasets. With this list of ligands, we filtered the critical tumor microenvironment (TME) interactions of the cell-cell networks obtained from the NicheNet algorithm, a method that predicts ligand–target links between interacting cells by combining their expression data with prior knowledge on signaling and gene regulatory networks, along the tumor progression axis. The resulting cell-cell networks convey crucial information, especially in a pre-cancerous state, on which type of interactions between tumor cells and TME have critical roles in the development of cancer and also influence the malignancy and prognosis of cancer patients.

Uncovering cell-type specific mechanisms across an arbitrary number of cell-types by adapting the clustered LASSO to differential equation models

ABSTRACT. Mathematical models formulated as ordinary differential equations are frequently employed to analyse biological systems. Identification of mechanisms that are specific to certain cell-types can be crucial for constructing useful mathematical models and gaining biological insight. Regularisation techniques, such as LASSO (least absolute shrinkage and selection operator), have been proposed and successfully applied to identify such mechanisms specific to one of two types of cells, e.g., healthy and cancer cells.

For more than two cell-types, however, these approaches are not consistent. They require to choose a reference cell-type, which the result potentially depends on. We propose a combination of nonlinear ordinary differential equation modelling and the clustered LASSO for correctly identifying cell-type specific mechanisms across any number of cell-types in ordinary differential equation models. Therefor, the pairwise differences of parameters encoding a certain mechanism in different cell-types are penalized, yielding a symmetric prior that eliminates the dependence on the reference cell-type.

We demonstrate how state-of-the-art numerical optimization of the likelihood function must be adapted accordingly, as well as the statistical testing for model selection. We evaluate the performance of the symmetric prior in the setting of ordinary differential equation models with realistic biological models and synthetic data, and point out the advantages over traditional approaches. We showcase the application of our method to a published biological model with experimental data. Our method is available within the open-source modelling environment Data2Dynamics.

A physiology-based model of bile acid metabolism in mice
PRESENTER: Bastian Kister

ABSTRACT. Bile acid (BA) metabolism is a complex system that encompasses a diverse mixture of primary and secondary, as well as conjugated and unconjugated BAs that undergo continuous enterohepatic circulation (EHC). Alterations in both composition and dynamics of BAs have been associated with various diseases; however, a mechanistic understanding of the relationship between altered BAs metabolism and related diseases is lacking. Various animal models have been employed in investigations of BA metabolism and its role in human disease. Mouse models are widely used for such studies; however, mice and humans differ in BA composition and their recycling, gut physiology and energy homeostasis. Considering the inherent complexity of BAs metabolism and a need for suitable cross-species extrapolation approaches, computational modeling can be applied to facilitate a mechanistic understanding of the network of physiological processes in BA metabolism. In this study, we developed a physiology-based model of murine BA metabolism describing synthesis, conjugation, microbial transformations, systemic distribution, excretion and EHC of BAs as well as an explicit representation of the host physiology at the whole-body level. For model parametrization, BA metabolism was characterized in vivo by measuring BA levels and composition in various organs, expression of transporters along the gut and cecal microbiota composition. We found significantly different BA levels between male and female mice that could be explained by the model with adjusted expression of the hepatic enzymes. For qualification, the model was tested on equivalent data generated from germ-free mice. To our knowledge, the here presented model represents the first published physiology-based model of BA metabolism in mice and can aid in addressing hypothesis and translating insights generated from mouse studies to a clinically relevant context.

Analysis of the impact of MYCN on the energy metabolism in neuroblastoma using metabolomics data and mathematical modeling
PRESENTER: Mareike Simon

ABSTRACT. The fact that cancer cells show altered metabolic properties compared to healthy cells is well known and recognized as one of the hallmarks of cancer development. First indications for this were found by Otto Warburg nearly 100 years ago. He found that cancer cells, compared to their healthy counterparts, tend to take up more glucose and convert it to lactate instead of using the more energy efficient pathways of mitochondrial metabolism, even in the presence of sufficient oxygen. This is known today as Warburg effect or aerobic glycolysis. In recent years, the role of oncogenes for metabolic changes has gained attention.

In this work we focus on a specific oncogene, MYCN, which is a transcription factor of the MYC family. MYCN amplification is an important risk factor in one of the most common childhood cancers, Neuroblastoma. Neuroblastoma accounts for approximately 7% of all childhood cancers and 15% of all childhood cancer death. It is well known for its very diverse prognosis. The survival rates are typically high for patients under one year of age at diagnosis and spontaneous regression occurs at a high frequency in certain stages. For older patients and patients with certain risk factors, such as MYCN amplification, survival rates are still below 50%. Since MYCN amplification is so important for risk stratification, its influence on many cellular processes, such as proliferation, cell cycle progression and metabolism, has been intensively studied. It has been previously shown that MYCN upregulates the expression of several glycolytic enzymes, but a pathway-wide analysis of the effects is missing.

To increase the understanding of how MYCN influences metabolism, we made use of in vitro models of MYCN-inducible neuroblastoma cells in a combined experimental and theoretical approach. As part of this project a metabolomics data set and extracellular flux data were analyzed. This data shows that MYCN induces widespread metabolic changes and has a Warburg-like effect on the flux distribution [1].

In the following, these data were integrated with additional data from literature to develop a mathematical model of the energy metabolism in neuroblastoma cells with low MYCN expression. The model was used to analyze the effects of expression changes in known MYCN targets on the overall glycolytic pathway. In addition to individual and pairwise alterations, the effects of simultaneous expression alteration of all MYCN targets were studied. Our analysis sheds light on the complex interplay between the different MYCN targets and the importance of the individual targets for the overall effect of MYCN on the energy metabolism.

[1] Tjaden, B., et al. "N-Myc-induced metabolic rewiring creates novel therapeutic vulnerabilities in neuroblastoma." Scientific reports 10.1 (2020): 1-10.

Modelling mutations in B-cell malignancies reveals how mutations synergise to contribute to aggressive disease.
PRESENTER: Richard Norris

ABSTRACT. Background Diffuse large B-cell lymphoma (DLBCL) is the most common non-Hodgkin lymphoma. Double hit lymphomas harboring mutations affecting cMyc and Bcl2 are particularly aggressive tumours with poor outcomes. It is unclear whether the aggressive pathology of double hit lymphoma is an emergent property of dysregulation of signaling networks controlling cell division and cell death. Multiscale computational systems biology models of B-cell fates have previously been used to disentangle cell proliferation and death in healthy B-cells.

Aims Develop computational models and methods to simulate the regulatory networks controlling the cell cycle and apoptosis in B-cells with and without gain of function mutations in cMyc, Bcl2 and other commonly dysregulated genes in B cell malignancies. Use multiscale models to predict the impact of cMyc and Bcl2 mutations individually and when combined in double hit lymphoma, to provide mechanistic insight into how these mutations combine in aggressive disease. Identify other combinations of mutations that may confer synergistic changes in cell fates and aggressive disease in B cell malignancies.

Methods We use established ordinary differential equation models that simulate reactions involved in the regulation of the cell cycle and apoptosis and solve them using Julia. Mutations are simulated by modifying appropriate parameters. Cell cycle phases and cell death times are calculated from thresholds in molecular species. Multiscale modeling approaches are used to enable individual cells to divide and die.

Results Ranking the predicted impact of mutations in all simulated genes on cell cycle progression we find that cMyc overexpression is predicted to have a relatively moderate effect on cell cycle length. Interestingly, in simulations of the cell cycle in a heterogeneous population of B cells with and without overexpression of cMyc, we find that cMyc overexpression is predicted to switch the fate of a small population of cells from quiescence to rapid proliferation. In simulations of apoptotic signaling, we find a moderate dose-dependent delay in apoptosis compared to wild-type cells with increasing Bcl2 abundance. In multiscale simulations of B-cell populations, we reveal a synergistic effect of cMyc and Bcl2. We find the impact of the double hit mutations is greater than the added effect of the two mutations alone on cell numbers. This is in agreement with experimental results and could account for the aggressive nature of ‘double hit’ DLBCL. Through modeling commonly occurring mutations in multiple myeloma we identify combinations of mutations that are predicted to synergistically combine to confer high disease burden and find these commonly occur in severe disease.

Conclusions Taken together these results demonstrate the ability of computational modeling to disentangle mutational heterogeneity, and predict how combinations of mutations combine to control disease severity in B cell malignancies.

From Plants to Plants and Beyond: How Modeling Strategies from the Engineering Field Could Benefit Systems Biology
PRESENTER: Maria Krantz

ABSTRACT. Mathematical Modeling has become a vital part of data analysis in many fields. Two of these are systems biology/medicine and the engineering field. Despite some knowledge transfer between these fields, the potential for beneficial exchange is not being fully exploited. The systems of interest in biology and engineering have many aspects in common. Modeling is carried out in both fields with the intention to understand the system’s behavior, identify abnormal behavior and estimate the effect of interventions. Engineering uses models to predict the behavior of machines or simulate logistics networks. At their core, these systems are very similar to a cell’s metabolism and signaling network. This can be exemplified by looking at a modern production plant. Such systems are combinations of mechanical and electrical parts (machines), which process the product, and the computational parts, which control the behavior of the machines. This is very similar to the way a cell functions – enzymes (molecular machines) process metabolites (products) and the action of these enzymes is controlled by signaling pathways (computational parts in a production plant). These parallels in the systems of interest provide an ideal basis for exchange and collaboration between modelers from the respective fields. However, this exchange is, up to now, rather limited. It would therefore be useful to foster exchange between these two fields of modeling by focusing on modeling approaches, rather than on model outcomes. Furthermore, terms from the engineering field should be linked to terms in the biological field to enable a common ground between researchers from both fields. This can be achieved by presenting models from the engineering field and exemplifying similarities to biological models. An exchange and knowledge transfer between researchers from systems biology/medicine and engineering would be beneficial for researchers in both fields and could help advance modeling in the life sciences.

Moving Horizon Estimation for Patient-Specific Optimal Control of Blood Glucose in Intensive Care Units
PRESENTER: Dilan Pathirana

ABSTRACT. Patients in the intensive care unit of a hospital can have difficulty regulating blood glucose. Current control techniques include intravenous infusion of glucose and insulin, which are manually administered by a medical professional. Patients may also receive additional drugs or food that affect blood glucose levels.

In this project, we develop a software pipeline for patient-specific optimal control of blood glucose levels. During each time interval of the moving horizon problem, new data from bedside patient monitoring systems are combined with older data and used to calibrate a mathematical (ODE) model of blood glucose metabolism, yielding a patient-specific model. The optimal control is then found by defining an objective function with synthetic measurements at the desired glucose level, then estimating the required infusion rates. These infusion rates can be automatically applied via the infusion pumps. The uncertainty of the corresponding blood glucose predictions are automatically assessed, such that a medical professional can intervene in the case of unreliable optimal control results.

The pipeline involves several software components. The BioXM platform (Biomax) is used as the data provisioning layer, between the hardware monitoring systems, the software control system, and the intravenous infusion pump. Priors on patient-specific parameters are provided by pharmacokinetic-pharmacodynamic modeling (esqLABS). Subsequent modeling is performed using: community standards for model (SBML) and parameter estimation (PEtab) specification; and state-of-the-art methods available in open-source tools for simulation and sensitivity analysis (AMICI), optimization and uncertainty quantification (pyPESTO), and infusion rates and their optimal control (PEtab Timecourse; PEtab Control).

A multi-strain model of immunization against influenza A viruses after infection or vaccination
PRESENTER: Lara Bruezière

ABSTRACT. Introduction Influenza A viruses (IAV) are responsible for worldwide seasonal epidemics of flu disease. They are divided into subtypes based on a combination of viral surface proteins HA and NA. Those proteins act as potent antigens in natural infection and are the main antigen components in vaccines against influenza. The challenge of effective long-term vaccination requires an understanding of the immune response triggered by infection or vaccination, and how this response is impacted by the antigenic evolution of viral surface proteins over the years. We delved into such questions through a mechanistic, knowledge based, modeling approach.

Materials and Methods We developed a Multi-Strain Influenza Disease Model (MSIDM) describing viral and in-host dynamics including immunization due to infection and vaccination. The model accounts for cell migration, interactions between cells, antigens, antibodies and cytokines, as well as cross-reactivity of immune cells formed during previous immunizations with antigens of a current strain. Cross-reactivity is modeled with three specific populations of immune memory cells coexisting for each individual: those developed during past infections, those developed against the yearly vaccine strain and those developed against the seasonal circulating strain. Strain-specific immunity is then implemented as the result of different interaction strengths of these three immune subpopulations with the antigens, depending on antigenic distance. To assess the model scope we performed exploratory analysis on a virtual population, built and calibrated to account for inter-patient and virus-specific variability.

Results We succeeded in simulating realistic dynamics in the context of IAV infection using the MSIDM. This offers insights on what impacts clinical outcomes such as duration and severity of symptoms, viral load and epithelial damage in response to an exposure following a vaccination or a primo-infection. The MSIDM also describes the long-lasting immunization process starting upon new strain encounter, either by vaccination or infection, in particular predicting the hemagglutination inhibition assays (HI titers). The model reproduces clinical observations made on both H1N1 and H3N2 viral subtypes, such as partial immune escape and lower vaccine efficacy against H3N2. It allows comparing the relative efficacy of different vaccines such as HA recombinant and Split vaccines.

Conclusion First analyses show how the model can help in exploring how antigenic distance can explain variability in vaccine efficacy from season to season, and how patient immune phenotype can explain variability in vaccine efficacy within a season. With further calibration and validation using in vivo data, the MSIDM could be used to test vaccination strategies taking into account antigenic drift and shift. It could also help in understanding how vaccine efficacy is impacted by viral- and host-related factors such as virus avidity for host cells and immunosenescence.

BlotIt - Optimal alignment of Western blot and qPCR experiments
PRESENTER: Jens Timmer

ABSTRACT. Biological systems are frequently analyzed by means of mechanistic mathematical models. In order to infer model parameters and provide a useful model that can be employed for systems understanding and hypothesis testing, the model is often calibrated on quantitative, time-resolved data. To do so, it is typically important to compare experimental measurements over broad time ranges and various experimental conditions, e.g. perturbations of the biological system. However, most of the established experimental techniques such as Western blot, or quantitative real-time polymerase chain reaction only provide measurements on a relative scale, since different sample volumes, experimental adjustments or varying development times of a gel lead to systematic shifts in the data. In turn, the number of measurements corresponding to the same scale enabling comparability is limited.

Here, we present a new flexible method to align measurement data that obeys different scaling factors. We propose an alignment model to estimate these scaling factors and provide the possibility to adapt this model depending on the measurement technique of interest. In addition, an error model can be specified to adequately weight the different data points and obtain scaling-model based confidence intervals of the finally scaled data points. Our approach is applicable to all sorts of relative measurements and does not need a particular experimental condition that has been measured over all available scales. An implementation of the method is provided with the R package blotIt including refined ways of visualization

A probabilistic graphical model for taxonomic profiling of viral and microbiome proteome samples
PRESENTER: Tanja Holstein

ABSTRACT. Probabilistic graphical models provide a concise representation for multivariate models whose dependance structure is given by an underlying graph. Their applications range from causal inference and path analysis to expert systems, and have been used, for example, in clinical settings or diverse research fields such a physics, economics and biology. Through their graph structure, graphical models provide a factorization of multivariate distributions that allows for efficient inference algorithms with clear attributions to Bayesian conditional and posterior probabilities. In proteomics, the presence of proteins is inferred from a list of peptides that were identified from mass spectra using a database search. Many proteins are homologous, meaning they share peptides, which leads to the so-called protein inference problem. The peptide-protein relationship can be represented as a bipartite graph. Using this structure, Bayesian inference and graphical models have been used successfully for probability-based protein inference. For proteomic samples of unknown taxonomic origin, the presence of certain taxa must additionally be inferred – complicated further because proteins share peptides not only within, but also between taxa. Previous taxonomic profiling approaches rely on strategies such as peptide-taxon match counting or the presence of taxon-unique peptides. We present PepGM, a graphical model that uses belief propagation to compute the marginal distributions of peptides and taxa. We represent the peptide-taxon relationships as a bipartite graph where nodes represent peptides and taxa, respectively. The resulting structure serves as scaffold for a factor graph. The unknown conditional probability distribution between peptides and taxa is represented using a noisy-OR model, whose parameters are evaluated using grid search. Our model allows for computing the marginal distributions of peptides and taxa. The posterior probabilities are computed through loopy belief propagation. Using various pathogenic viral proteome samples, we show that PepGM successfully classifies viral samples with strain-level resolution using unconstrained reference databases providing meaningful confidence estimates. The assigned confidence scores, going beyond simple heuristics, could be particularly useful in the clinical context, where therapeutic decisions depend on them. We additionally show that our approach is extensible to more complex metaproteomic samples, where multiple organisms are present.

Multiscale Modeling of Dyadic Structure-Function Relation in Ventricular Cardiac Myocytes
PRESENTER: Wilhelm Neubert

ABSTRACT. Cardiovascular disease is often related to defects of subcellular components in cardiac myocytes, specifically in the dyadic cleft, which include changes in cleft geometry and channel placement. Modeling of these pathological changes requires both spatially resolved cleft as well as whole cell level descriptions. We use a multiscale model to create dyadic structure-function relationships to explore the impact of molecular changes on whole cell electrophysiology and calcium cycling. This multiscale model incorporates stochastic simulation of individual L-type calcium channels and ryanodine receptor channels, spatially detailed concentration dynamics in dyadic clefts, rabbit membrane potential dynamics, and a system of partial differential equations for myoplasmic and lumenal free Ca2+ and Ca2+-binding molecules in the bulk of the cell. We found action potential duration, systolic, and diastolic [Ca2+] to respond most sensitively to changes in L-type calcium channel current. The ryanodine receptor channel cluster structure inside dyadic clefts was found to affect all biomarkers investigated. The shape of clusters observed in experiments by Jayasinghe et al. and channel density within the cluster (characterized by mean occupancy) showed the strongest correlation to the effects on biomarkers.

Studying conformational transitions of selected proteins using UNRES coarse-grained replica exchange molecular dynamics simulations with Lorentzian restraints.
PRESENTER: Iga Biskupek

ABSTRACT. Identifying the mechanism of conformational transitions of proteins is essential in understanding their biological functions. When investigating the conformational transitions, experimental methods provide only fragmentary information about the time evolution of the system such as the distance between donor and acceptor groups or distance distribution, therefore the use of the computational methods to interpret the experimental data is necessary. The replica exchange molecular dynamics (REMD) method was first introduced in the study of biomolecules by Okamoto and coworkers. It is a hybrid method combining MD simulations with the Monte Carlo algorithm. In REMD simulations, several copies (replicas) of the same system are simulated in parallel using MD simulations at different temperatures. This method generates a generalized ensemble of the simulated system. In this way, REMD is capable of overcoming high-energy barriers easily and sampling conformational space sufficiently, which allows exploration of the free energy landscape of proteins. In this research, we used the REMD method with UNRES coarse-grained force field and Lorentzian attractive terms. Two Lorentzian attractive interactions provide a double-well potential with a bounded energy barrier. We implemented this method to coarse-grained UNRES simulations, enabling us to study larger systems with lower computational expense in comparison to all-atom simulations. UNRES is a highly reduced protein model with only two interaction sites per residue. Owing to this reduction, it offers ~1000-fold speed-up compared to all-atom molecular dynamics. The study aims to test the UNRES implementation of the method of modeling conformational transitions with a double-well Lorentzian guiding function. For testing, we selected two proteins with two well-defined states. The first one is the open conformation of ATP-bound Hsp 70 DnaK chaperone (PDB ID: 4B9Q) and the closed conformation of this protein complexed with ADP substrate (PDB ID: 2KHO). The second system studied is the open (PDB ID: 4AKE) and the closed (PDB ID: 1AKE)form of adenylate kinase. Multiple conformational transitions between the above-mentioned forms were observed in simulations for both proteins. The free energy landscapes were constructed and transition pathways were identified.

Connecting the Neurovascular coupling and Electrophysiological signaling – a modeling approach
PRESENTER: Henrik Podéus

ABSTRACT. The brain is a vital organ with great structural complexity, consisting of some 100 billons of neurons and glial cells. This complexity makes it difficult to isolate the origins of different functionalities in the brain. It is also challenging to explore which and how irregularities cause diseases. To overcome the structural complexity, mathematical modelling has been used to explore how key aspect of the brain can be explained. Two such key aspects are the electrical signaling of neuron populations and the neurovascular coupling (NVC). The NVC is the mechanism that allows neurons to regulate the control of cerebral blood flow to meet the metabolic demand. Together, these processes constitute the pillars for the brain’s functionality. The electrical signaling is the framework for cognitive function and communication between neurons, while the NVC, initiated by the electrical firings, regulates the availability of metabolites to supply the electrical potentials needed for the signaling. These processes are thus highly dependent on each other. Despite this interdependence, the two processes have thus far only been modelled separately.

In this work we aim to combine the NVC with the electrical signaling in one model. This combination will be achieved by integrating the logic of the neuronal excretion of vasoactive substances into a spiking neural network, allowing the NVC coupling to be driven by the electrical firing of the network. By adding the means of blood vessel control to the spiking neural network model, a foundation for further additions of metabolic demands to the model has been made, which then would close the circle of electrical activity, NVC, and metabolite supply. With this framework established, the model representation of the brain is becoming more like the real brain, allowing for more powerful predictions and dissections of neuronal alterations and diseases.

Computational Models of Heterogeneous Mutations in Diffuse Large B-cell Lymphoma.
PRESENTER: Arran Pack

ABSTRACT. DLBCL is a very heterogenous disease, two patients may present with a completely different combination of mutations. Recent studies have been able to categorise patients based on genetic sequencing of their tumour, but this has yet to replace the binary Cell Of Origin COO classification system which is most commonly used. Currently patients receive a one-size-fits-all treatment of R-CHOP which cures only 60%. Patients with totally different mutations could respond poorly to traditional treatment, but benefit from other drugs: motivating an exploration of available drugs and cancer phenotypes. We present a mathematical model in which to conduct this exploration. Our mathematical model builds on and combines established models. It contains a downstream NF-κB section, which can predict the proportions of activated NF-κB dimers; and an upstream receptor-proximal section, which contains many proteins that are mutated in DLBCL patients. We can mutate the model by altering parameters in the upstream and observe its effect on the NF-κB dimers downstream. Using this model we can predict which mutations would confer resistance, or sensitivity, to a drug; and in how these cells may respond to varying microenvironmental stimuli. We hypothesise that different mutational profiles react differently to drugs and extracellular signalling. The Toll Like Receptor and B Cell Receptor pathways comprise our model’s upstream portion, and they contain several proteins which are commonly mutated in DLBCL. By simulating the impact of mutations effecting TLR and BCR signalling on the kinases that activate NF-κB we recapitulated the constitutive NF-κB activation that is frequently seen in DLBCL. We then linked models of TLR, BCR and NF-κB signalling together to create a comprehensive model in which combinations of mutations affecting receptor-proximal pathways can be investigated. We predict that commonly occurring mutations affecting MYD88, CD79B and CARD11 not only activate NF-κB as expected, but alter which NF-κB dimers are active, and alter how DLBCL cells will respond to microenvironmental stimuli. Recent clinical data suggests double mutant MyD88 + CD79b DLBCL may be hyper-sensitive to BTK inhibitor Ibrutinib. We simulate these combined mutations to generate a mechanistic explanation for this intriguing result. Ongoing work extending this approach to capture the broad mutational heterogeneity in DLBCL is designed to help identify how to get the right treatments to get into the right patients; improving on the current one-size-fits-all treatment for DLBCL, and providing a stepping stone towards personalised medicine in lymphoma.

Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction
PRESENTER: Le Yuan

ABSTRACT. Enzyme turnover numbers (kcat) are key to understanding cellular metabolism, proteome allocation and physiological diversity, but experimentally measured kcat data are sparse and noisy. Here we provide a deep learning approach (DLKcat) for high-throughput kcat prediction for metabolic enzymes from any organism merely from substrate structures and protein sequences. DLKcat can capture kcat changes for mutated enzymes and identify amino acid residues with a strong impact on kcat values. We applied this approach to predict genome-scale kcat values for more than 300 yeast species. Additionally, we designed a Bayesian pipeline to parameterize enzyme-constrained genome-scale metabolic models from predicted kcat values. The resulting models outperformed the corresponding original enzyme-constrained genome-scale metabolic models from previous pipelines in predicting phenotypes and proteomes, and enabled us to explain phenotypic differences. DLKcat and the enzyme-constrained genome-scale metabolic model construction pipeline are valuable tools to uncover global trends of enzyme kinetics and physiological diversity, and to further elucidate cellular metabolism on a large scale.

A bottom-up investigation of emergence in a self-organizing model of embryonic pattern formation
PRESENTER: Michael Zhao

ABSTRACT. Pattern formation, a process where disorganized groups of cells acquire organized structure, is an important feature of biological development. An important question then is how individual cells work together to form patterns. It was recently reported that cells in the developing embryonic mouse tail could self-organize into patterns after dissociation and randomization with features resembling in vivo patterns such as traveling waves of signaling activity. Here we combine imaging and modeling to investigate how the coupling between signaling and cell dynamics drives the self-organizing phenomenon with the goal of identifying how interactions between the individual cells lead to larger tissue-scale patterns. Live cell measurements of signaling oscillations, cell density changes, and morphogen gradient formation discriminates an ordered series of events leading up to the formation of self-organized signaling patterns, and our data suggests that a coalescence phenomenon drives nucleation points. We use our observations to develop a preliminary model coupling cell density changes with oscillation dynamics that can explain the previously reported self-organizing behavior of cells derived from the developing mouse tail.

FLOVELO - Pushing Boundaries of Cell Dynamics Inference with Maximum Flow Networks
PRESENTER: Julia Naas

ABSTRACT. Single-cell RNA-sequencing (scRNA-seq) technologies provide impressive new insights into biological samples on single cell resolution allowing a deep understanding of the developmental state of a cell. To infer dynamics of cellular processes, such as the cell cycle, additional temporal information is often obtained by performing time course experiments, but can also already be derived from one, static scRNA-seq measurement: Knowing that unspliced mRNA eventually is processed to spliced mRNA, the abundance of unspliced and spliced mRNA molecules gives insights about the future expression profile of a cell.

We present FLOVELO, a computational approach that recovers the most likely cell trajectory in two-dimensional unspliced-spliced mRNA expression space (hereafter Unspliced-Spliced Trajectory, UST) for each gene. Interpreting the distribution of cells as a probability measure, a UST appears as one-dimensional density ridge, which FLOVELO reliably detects by solving multiple, interconnected network flow problems.

Comparable methods, such as RNA velocity implementations velocyto and scvelo, are very much limited to model assumptions such as (piecewise) constant transcription and splicing rates and are known to potentially lead to incorrect biological conclusions when data comprises transcriptional bursts, multiple kinetics, or short developmental time spans. FLOVELO, as a model-free approach, can flexibly adapt to such transcriptional contexts as it does not rely on constrained estimation of rate parameters. On the contrary, it even allows for a much more informed reverse engineering of the underlying consortium of differential equation systems explaining the reconstructed UST.

Furthermore, comparing the UST shapes of multiple genes using optimal transport theory provides a novel, intuitive way to classify genes that are assumed to underlie similar regulatory control mechanisms. In the cell cycle context, this can be directly applied to distinguish between cell cycle-regulated and cell cycle-independent genes. Finally, gene-wise USTs can be combined into a gene shared cell trajectory, summarizing global cellular developmental trends in the respective sample. We will demonstrate FLOVELO’s performance with illustrative examples.

A powerful modelling approach tailored to cellular signalling
PRESENTER: Clemens Kreutz

ABSTRACT. In systems biology, ordinary differential equations (ODEs) are frequently applied for investigating dynamic processes such as signalling pathways. ODEs are typically defined by translating relevant biochemical interactions into rate equations. One disadvantage of such mechanistic dynamic models is that they can become very large in terms of the number of dynamic variables and parameters if entire cellular pathways are described. Moreover, analytical solutions of the ODEs are not available and the dynamics is nonlinear which are a challenges for numerical approaches as well as for statistically valid reasoning.

We recently introduced a complementary modeling approach based on curve fitting of a tailored retarded transient response function (RTF) [1]. This approach exhibits amazing capabilities in approximating ODE solutions in case of transient dynamics as it is typically observed for cellular signalling. Besides the broad and easy applicability, a benefit of the RTF is the clear-cut interpretation of its parameters as response time, as amplitudes, and time constants of a transient and a sustained part of the response. Dose-dependencies of these parameters are described via Hill functions, allowing for the calculation of half-maximal activating (EC50) or inhibitory (IC50) effects on these dynamic parameters.

The presented approach offers a data-driven alternative modelling strategy for situations where classical ODE modeling is cumbersome or even infeasible. Moreover, it enables valuable interpretations of traditional ODE models and is also applicable to the analysis of time-course and/or dose response data from omics experiments. Nine benchmark problems for cellular signaling were analyzed to demonstrate the approach in realistic systems biology settings. The performance of the approach is also demonstrated using dose-dependent time-course data of inflammasome activation.

[1] https://doi.org/10.3389/fphy.2020.00070

Integrational approaches for cross-species analysis of lung pathologies at single-cell resolution.
PRESENTER: Peter Pennitz

ABSTRACT. Single-cell ribonucleic acid sequencing (scRNA-seq) is becoming widely employed to study transcriptomics of heterogeneous cell-types in parallel. Due to the complex micro-anatomical structure of the lungs, which harbors more than 40 different cell-types necessary for the homeostasis and maintenance of organ functionality, pneumology research in particular benefits from this technique. The ongoing severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) pandemic has instigated an abundance of publicly available single-cell RNA sequencing data from various model organisms, enabling the dissection of the cellular immune response to infection. During coronavirus disease 2019 (COVID-19) and other pulmonary infections, the ability to analyze existing and newly generated data in sync can significantly enhance the quality and value of investigations significantly. Hence, we compared and qualitatively assessed the single-cell transcriptomes of publicly available and unpublished lung datasets derived from six different species: Human (Homo sapiens), African green monkey (Chlorocebus sabaeus), Pig (Sus domesticus), Hamster (Mesocricetus auratus), Rat (Rattus norvegicus) and Mouse (Mus musculus). By employing state-of-the-art tools and methodologies, we demonstrate that different cell isolation protocols adapted for different species do not hinder cross-species comparisons. We established a pipeline for inter-species data integration, applied a single unified gene nomenclature, transformed and normalized all datasets, performed cell-specific clustering and identified top marker genes for every species. Additional in-depth analyses using RNA velocity and intercellular communication estimation, based on ligand–receptor co-expression, were also performed. The code for all cross-species analyses was made publicly available via GitHub (https://github.com/GenStatLeipzig/pulmonologists_interspecies_scRNA). Collectively, integrational approaches of new and publicly available scRNA-seq data could contribute towards evaluation of current animal models in biomedical research and interpolating the results to human pathologies through exploration of species-specific versus universal responses in pulmonary disease.

Development of a mass-spectrometry based clinical proteomics pipeline allows absolute quantification of pathway components and provides insights in pathway activation
PRESENTER: Barbara Helm

ABSTRACT. The regulation of cellular decisions depends on activating a multitude of intracellular signaling pathways. Deregulation of these pathways fosters disease progression. Since there is no linear relationship between mRNA and proteins and it was observed that the abundance of signal transduction proteins is decisive for information processing, it is essential to quantify pathway components and their activation status reliably. High-performance mass spectrometry-based-proteomics offers the opportunity for in-depth analysis of proteins in complex mixtures. Data-depended acquisition (DDA) was the method for acquiring mass spectrometry data. However, the latest developments in data-independent acquisition (DIA) and parallel reaction monitoring (PRM) now facilitate their implementation and usage. As a proof of concept, we profiled the global proteome of ten lung cancer cell lines and three primary cell lines, applying DDA and DIA. DIA showed significantly higher proteome coverage than DDA and allowed the relative quantification of low abundant proteins, such as membrane receptors. Interestingly, we observed that the abundance of membrane receptors is very heterogeneous among the different cell lines, while the abundance of downstream molecules is relatively constant. To validate these findings, we developed a PRM panel to absolutely quantify different membrane receptors in the attomolar range, information necessary for calibrating dynamic pathway models. Quantitative phosphoproteomics is critical for investigating cell signaling. Therefore, a high-throughput, fully automated phosphoproteome pipeline was implemented and allowed to evaluate the degree of phosphorylation of key receptors implicated in lung cancer as EGFR. In sum, implementing new proteomics methods opens new possibilities for analyzing cells and tissue samples and identifying potential markers for cancer development.

Ionizing radiation triggers differential signaling dynamics of p-p53 and p-ERK1/2 in sensitive and resistant HNSCC cell clones

ABSTRACT. Radiotherapy is a main treatment strategy for head and neck cancer, and radioresistance of tumors remains a major problem as it is not well understood. To get a better understanding of how therapy resistance emerges through differential signalling activity, we performed time-course mass cytometry (CyTOF) analyses of irradiated (6 Gy) and non-irradiated head-and-neck squamous carcinoma (HNSCC) cells. As a model system of intra-tumoral heterogeneity, we made use of the heterogeneous Cal33 cell line (parental), a radiosensitive and a radioresistant subclone, and examined potential differences in signaling dynamics that could explain the divergent responses to irradiation. Cell cycle classification based on IdU, pH3, Geminin, and Cyclin B1 indicated a delay in cell cycle progression after irradiation, mainly characterized by an accumulation of cells in S-phase and G2-phase, 8 hours and 12 hours after irradiation, respectively. However, the cell cycle dynamics were largely comparable for the three cell lines studied, suggesting that their differential radiation sensitivity is not explicitly linked to distinct cell-cycle dynamics. Interestingly, we observed differential dynamics of p-p53 [S15] phosphorylation as characterised by: 1) a first pulse 12h after irradiation in the parental Cal33 and the radio-resistant subclone in cells with high p-H2AX [S139] signal, and 2) a second pulse 48h after irradiation, which was stronger in the radio-sensitive subclone. The cells exhibiting this second p-p53 pulse at 48h showed intermediate levels of phosphorylated p-H2AX [S139], suggesting that these cells did not completely repair the radiation-induced DNA damage by that time. Additionally, these cells showed high levels of p-ERK1/2 [T202/Y204]. We observed that following the 48h pulse in p-p53 and p-ERK1/2, the levels of cleaved Caspase-3 and pNF-κB[S536] increased in the radio-sensitive subclone. Altogether, these results allow us to hypothesize that the 12h p-p53 pulse induces DNA repair in the resistant subclone, while the second 48h p-p53 pulse accompanied by a pERK1/2 pulse occurring in cells with residual DNA damage leads to Caspase-3-mediated cell death in the sensitive subclone. In order to evaluate this, single-cell time-course perturbation experiment will be performed with pharmacological inhibition of Chk1 and/or MEK. Ideally, we’ll be able to further dissect the underlying mechanisms of radiation resistance and find therapeutic vulnerabilities that will allow target radiosensitization of the resistant subclone.

Defining Design Rules for Next-Generation Snakebite Antivenoms
PRESENTER: Natalie Morris

ABSTRACT. Snakebite envenomation is a priority neglected tropical disease, which globally results in around 100,000 deaths and 400,000 cases of disability per year. Venom is a complex mixture of protein toxins of various families and functions. Systemic envenomation is treated using antivenoms, which are currently produced by hyper-immunising large animals against the venom in question. The animal naturally produces protective toxin-neutralising antibodies, which can be harvested to formulate the serum-based antivenom product. There is an urgent need to innovate the way that we design and produce antivenoms, owing to limitations in the cost, efficacy, and safety of these conventional treatments.

The advent of recombinant protein expression and in vitro antibody selection has greatly expanded the antivenom design space, and has facilitated the production of toxin-neutralising recombinant antibodies in a range of conventional and alternative molecular formats. Different antivenom scaffolds may impart pharmacokinetic benefits to treatment under different circumstances. Whilst there are several next-generation formats under active investigation, there has been little work done to quantitatively compare the performance of different scaffolds. The pharmacokinetic venom-antivenom system is complex, owing to the variable distribution and elimination profiles of different venoms and antivenoms. Computational simulations of venom and antivenom pharmacodynamics can be used to explore this interplay and compare the function of different scaffolds under clinically relevant treatment scenarios. These simulations can facilitate the methodical testing of a much wider area of parameter space than would be feasible in vivo.

We have built a two-compartment pharmacokinetic model of systemic snakebite envenomation and treatment in rabbits, which tracks the movement of toxins through separate blood and tissue compartments. The model was parameterised with existing experimental data and enables the simulation of antivenom scaffolds ranging in size from 15 to 150 kDa. The model additionally enables control of other treatment parameters including antivenom dosing, affinity, treatment time, and venom type. We are performing a range of simulations, including local and global sensitivity analysis and global parameter optimisation, to better understand the most important features in antivenom design. We are exploring and defining the optimal combinations of antivenom molecular size, dosing ratio, and affinity parameters within these studies. Thus far, global sensitivity analysis has indicated that the most important antivenom parameters in the neutralisation of low molecular weight venoms are the antivenom-to-venom dosing ratio, and the on-binding affinity rate. Antivenom molecular size has a much smaller impact on treatment outcome. Global parameter optimisation has indicated that the most effective antivenoms constitute low molecular weight scaffolds, at high dosing ratios and with high kon binding affinities. This modelling approach can be used to elucidate the dynamics of envenomation-treatment systems, and help inform the development of low-cost, high coverage antivenoms for snakebite.

Co-expression analysis of transcriptome and proteome identifies LRR-VIII-1 kinase and MAPK-kinase (MEK1) regulatory modules associated with P-deficiency adaptation and P use efficiency in maize
PRESENTER: Mingjie He

ABSTRACT. Maize (Zea mays) is one of the most important crops worldwide. Crop productivity is widely constrained by limited phosphorus (P). Thus, improving P use efficiency (PUE) in newly developed cultivars is one of the long-term goals of breeding programs. However, the function of genes and their regulation for adaptation to P-limitation in maize is largely unknown, and the genetic potential for improving PUE is still under discovery. Therefore, we explored molecular-level regulatory networks under low-P supply using a multi-OMICs approach. We performed transcriptomic and proteomic analyses using six maize genotypes, which have close genomic backgrounds but several contrasting phenotypic traits, including PUE. We constructed co-expression networks for proteome and transcriptome data, and identified associations between co-expression modules and 31 traits. Within these networks, we further investigated protein kinases as potential regulators, and experimentally verified potential interactions between kinases and their substrates using the split YFP system. We propose the LRR-VIII-1 kinase (Zm00001d038522) as a regulator in roots. This kinase may play a fundamental role in adaptations to LP-stress, its regulatory modules are highly associated with tissue P-concentration and root-to-shoot ratios, linking with biological processes of ROS cleavage, flavonoid, and anthocyanin synthesis, cell elongation, and secondary cell wall organization. We propose MAPK-kinase (MEK1, Zm00001d043609) as a regulator of different LP-stress adaptations among different genotypes, and its regulatory module is significantly associated with genotype-specific traits, regarding root dry weight, root hair length, specific root length, as well as PUE. We show evidence that MEK1 can interact either with Sucrose synthase 1 (SH1, Zm00001d045042) or eEF1B-γ translation elongation factor 1-gamma 3 (eEF1B-γ, Zm00001d046352), suggesting it is involved in sucrose metabolism and translation elongation. These proteins are suggested as key candidates to develop breeding targets to improve product yield with less P-fertilizer inputs and improved carbon resource allocation. More importantly, these OMICs profiles contain a wealth of information to be mined by the community and may provide clues for further research beyond the work presented.

Microbial compound bioactivity descriptors enable large-scale prediction of small molecule impact on microbiota species and pathogens
PRESENTER: Nils Kurzawa

ABSTRACT. Chemical descriptors are the workhorse of chemoinformatics and have enabled increasingly accurate prediction of a wide range of compound properties. Recently, our group has introduced the Chemical Checker which extends the principle of numerical descriptors of chemical structures to various layers of compound bioactivity from targets to clinical parameters. Here, we present an extension of this concept to microbial compound bioactivities capturing the effects of compounds on microbes populating the human body and metabolomic responses of Escherichia coli upon drug treatment. Moreover, we provide an approach that allows inference of these descriptors for uncharacterized compounds and we make use of it by exploring diverse compound libraries for microbial bioactivity. Additionally, we show that microbial descriptors can be used orthogonally to chemical and human bioactivity features for different machine learning tasks such as antibiotic mode of action and ESKAPE species compound sensitivity prediction.

Development of a geometric blood vessel model to quantify morphological changes of endothelial cells in 3D during vascular remodeling
PRESENTER: Daniel Seeler

ABSTRACT. Vascular remodeling is a physiological process that continuously ensures sufficient nutrient supply of tissues by long-term changes of the blood vessel architecture. Here, endothelial cells (ECs) lining the blood vessel interior perceive fluid shear stress (FSS) exerted by blood flow. In response to high FSS ECs elongate and align in flow direction. These cell morphological changes lead to changes in vessel diameter and, subsequently, FSS. To better understand these coupled processes, we aimed to develop a geometric model linking EC shape and dorsal aorta (DA) geometry in zebrafish embryos and use it to quantify morphometric measures of EC morphology in 3D. We obtained endothelial cell contours in the DA of zebrafish embryos (N=7) at 48 hours post fertilization (hpf) and 72hpf by manually annotating 3D data points onto the EC-specific transgenic junctional marker pecam1-EGFP in Imaris software (Bitplane Inc.). As a pre-processing step, we fitted smoothing splines to each cell contour to reduce noise created by the manual annotation process. Then, we locally fitted cross-section shapes along the vessel axis. To describe the family of cross-section shapes, we developed a shape model which accounts for physiologically observed dorsal-ventral asymmetry. We ensured feasibility and locality of the estimation by considering the projection errors of all points in proximity to the cross-section with Gaussian weights decaying with distance. To improve the estimation in case of outliers and data sparsity, we (1) chose the width of the weight function adaptive in space and (2) constrained the deviation of the local cross-section shape’s parameters from the mean shape’s parameters. As a post-processing step, we employed a Gaussian filter in parameter space to smoothen the transitions between cross-section shapes along the vessel axis. We interpolated between the fitted cross-section shapes by triangulation. Each EC surface was triangulated such that its projected contour edges were included as triangle edges resulting in a 3D mesh with smooth boundary. Using these 3D meshes, we were able to quantify the increase of both, EC elongation and alignment in flow direction, and the decrease in EC compactness between 48hpf and 72hpf. Our estimation process has low projection errors and is robust to annotation errors. Performing the analysis in 3D avoids cartographic distortions, which would be caused by tube unrolling. While our geometric model currently only computes a static description of EC morphology, a dynamic model of EC shape in response to blood flow can be incorporated. This would enable us to study the dynamics of the coupling between EC morphology and vessel geometry. Our long-term aim is to compare EC morphology between wild-type zebrafish and disease models to improve our understanding of pathological vascular remodeling in humans.

Delivering Virtual Reality and Gaming Technologies to the Field of Systems Biology
PRESENTER: Eliott Jacopin

ABSTRACT. The technological innovations on data serialization, simulation engine architecture or XR devices made in the video game industry are largely participating to the establishment of Industry 4.0 (e.g. digital twins). We claim that systems biology can also greatly benefit from those technologies by further tightening the integration between data, modelling, simulation and visualization. We illustrate that claim with two tools under development. The first, ECellDive, takes advantage of the new paradigm introduced with the Metaverse to offer an integrated workspace in virtual reality. In the workspace, users can model, simulate and visualize biological systems in real-time collaboration with their colleagues. We demonstrate the capabilities of ECellDive by reproducing part of the Escher-FBA platform (EG Rowe et al. 2018, doi:10.1186/s12918-018-0607-5). We start by importing a model originally available on Escher-FBA (iJO1366) in our virtual scene and dive into it. Diving transfers us to a new scene containing the metabolic pathway encoded in iJO1366. The metabolic pathway is mapped across a few dozen meters and the user can explore it by teleporting in the virtual scene or physically moving to adjust its exact position. The user can highlight the structure of the network by grouping modules together automatically or manually. Grouping helps contextualizing the model by, for example, visualizing cellular compartment (outer space, inter-membrane space, cytosol) and metabolic subsystems (gycolysis, pentose phosphate pathway, citric acid cycle, etc…). The user can also do a Flux Balance Analysis (FBA) of the pathway and update the simulation results by knocking-out/activating reactions of interest. Finally, ECellDive is about collaboration: any changes can be exported and shared. But we can also join a session hosted by someone else in real-time to modify the same file. The second, ECellEngine, borrows technologies from video game engines to offer a real-time simulation environment for biological systems. Current simulation environments developed for systems biology can be seen as static processes: they take data and models as input, simulate them all in one go, and output the results. This becomes burdensome when analyzing complex models because we usually only have access to a replay of what happened via logs. A real-time architecture enables analyzing a model directly as it is simulated, however. It also makes it possible to interact with the simulation to change parameters on-the-fly and detect critical events responsible for switches in the model. Such advantages are the first step to obtaining digital twins of wet experiments. ECellEngine’s simulation solver is stochastic (Gillespie), and its architecture allows for comparison of concurrent simulations. Another benefit of this architecture is to manipulate the time of the simulation (pause, backward, forward, step-by-step) effectively improving our investigation and understanding of the mechanistic details of the model.

Inference and dynamical simulation of the gene-regulatory network in single cells
PRESENTER: Patrick Stumpf

ABSTRACT. During development the gene regulatory network within the cell achieves an extraordinary feat: the robust specification of all somatic cell types in the body from a single fertilized egg through differential expression of a constant set of genes. However, gene mutations that cause loss or gain of function may alter the activity of the GRN and affect the outcome of cell differentiation. A prominent example for this is found in blood cancer, where gene mutations lead to an increased proliferation of stem cells and decreased production of mature blood cells. The consequence is an insufficient oxygen supply to the body or an incapacitated immune system, both of which can be lethal. To better understand how gene mutations may lead to the altered production of cell types, we need to consider them in the context of the gene regulatory network and find a suitable computer model to predict cell fate. Since the gene regulatory network is largely unknown, network inference is typically used to predict gene interactions directly from data. For this purpose, large single-cell expression data obtained from RNA sequencing have proven useful, although the destructive nature of such measurements prevents the recording of time-resolved expression changes from individual cells. Recently, methods have been developed to extract temporal information from individual cells via RNA splicing dynamics. These methods have been used to visualize cell fate within a population as a phase plane. Here, using a simple machine learning model, we learn to encode the gene regulatory network from RNA splicing dynamics, enabling the simulation of cell trajectories for individual cells. The model enables extraction and manipulation of a gene regulatory network and the simulation of diseased cell trajectories due to gene mutation. We demonstrate that the model can learn the correct regulatory logic from synthetic single-cell gene expression data and recapitulate the appropriate dynamics of gene expression. We further apply this model to gene expression data obtained from bone marrow to understand myeloid blood cell differentiation in cancer. Our modelling strategy can be universally applied to any single-cell data to elucidate cell differentiation in health and disease. We anticipate that this approach will lead to new insights on cell differentiation dynamics for a large range of monogenic diseases.

The transcriptome dynamics of single cells during the cell cycle
PRESENTER: Daniel Schwabe

ABSTRACT. The cell cycle is among the most basic phenomena in biology. Despite advances in single-cell analysis, dynamics and topology of the cell cycle in high-dimensional gene expression space remain largely unknown. We developed a linear analysis of transcriptome data which reveals that cells move along a planar circular trajectory in transcriptome space during the cycle. Non-cycling gene expression adds a third dimension causing helical motion on a cylinder. We find in immortalized cell lines that cell cycle transcriptome dynamics occur largely independently from other cellular processes. We offer a simple method (“Revelio”) to order unsynchronized cells in time. Precise removal of cell cycle effects from the data becomes a straightforward operation. The shape of the trajectory implies that each gene is upregulated only once during the cycle, and only two dynamic components represented by groups of genes drive transcriptome dynamics. It indicates that the cell cycle has evolved to minimize changes of transcriptional activity and the related regulatory effort. This design principle of the cell cycle may be of relevance to many other cellular differentiation processes.

Non-linear mixed effects modeling of the T cell response in individuals infected with SARS-CoV-2
PRESENTER: Clemens Peiter

ABSTRACT. The COVID-19 pandemic has put a large toll on society. First discovered in late 2019, it rapidly spread across nations causing millions of deaths worldwide. Symptoms and disease progressions vary widely between patients, leading to difficulties in predicting outcomes. With current vaccines failing to prevent reinfections for newer virus variants, it is of utmost importance to understand underlying mechanisms of the SARS-CoV-2 specific immune response. Yet, the mechanistic understanding of the dynamics of the immune cell compartment is so far limited. In this work, we focus on the T cell immune response in individuals infected with SARS-CoV-2. We were supplied with data collected in KoCo19-Immu, a longitudinal study of COVID-19 infected individuals in the city of Munich, Germany. Preliminary analysis of the data showed variations between individuals in timing and magnitude of the T cell response. To explain this heterogeneity and analyze the interaction of virus and immune system, we develop a mechanistic model in a non-linear mixed effects modeling framework that captures the activation of T cell responses through viral infection. Unknown model parameters are then estimated from the data using the Stochastic Approximation Expectation Maximization (SAEM) algorithm. Finally, we compare different models to identify features of the model required for an accurate fit to the data. These features will allow us to obtain mechanistic insights into the development of the immune response against SARS-CoV-2.

Analyzing Optimal and Non-Optimal Resource Allocations in Whole-Cell Models
PRESENTER: Diana Széliová

ABSTRACT. Next-generation genome-scale metabolic models allow studying the reallocation of cellular resources upon changing environmental conditions. They allow modeling metabolic flux distributions and predicting expression profiles of the catalyzing proteome. Consequently, the biomass composition can no longer be assumed constant but needs to be computed to account for the variable resource allocation. Although computational methods for identifying optimal solutions are available, unbiased characterization of all feasible solutions was so far missing. Here we introduce elementary growth modes (EGMs) to comprehensively analyze whole-cell models.

EGMs generalize the concept of minimal functional units -- known from elementary flux mode analysis -- to resource balance models. Thus, they provide an understanding of all possible flux distributions and all possible biomass compositions.

First, we demonstrate the power of an EGM analysis by analyzing biomass variations upon nitrogen limitations across multiple growth rates and find that the accumulation of lipid and/or starch upon nitrogen starvation is a feature of balanced growth and not (necessarily) a result of active regulation.

Next, we asked if the experimentally observed ribosome composition of E. coli (2/3 rRNA + 1/3 protein) can be understood as an (evolutionary) resource allocation problem. However, this is only possible if the cellular transcription is constrained too. An observation that strongly questions currently established theories.

In summary, EGMs provide unprecedented comprehensive insights into resource allocation in next-generation genome-scale models.

Extracting knowledge from multi-omics and clinical datasets using graph machine learning
PRESENTER: Ferdinand Popp

ABSTRACT. Consortia like The Cancer Genome Atlas (Weinstein et al., 2013)(DOI: 10.1038/ng.2764) have profiled cancer patients using high-throughput modalities, e.g. genomics, transcriptomics, proteomics, etc. Understanding the relationship patterns among and within the different types of data is required in order to define granular subgroups for cancer patients with specific markers, thus enabling precision treatment decisions (Pierre-Jean et al., 2020)(DOI:10.1093/bib/bbz138). A promising way to address this challenge is the integration of different omics datasets combined with clinical record data. As most integration approaches use a grid-based model for patient-data representation, we focus on a graph-based model. Graph machine learning has recently gained traction and its applications have shown to augment cancer patient subtype discovery in a novel way (Fang et al., 2022)(DOI: 10.1038/s41746-021-00381-z). Graph autoencoders (GAE) are unsupervised learning frameworks that are utilized to learn low-dimensional continuous representations, known as embeddings. Utilizing GAEs, we generated embeddings for each patient from the input data. Then, we clustered the patients into subgroups based on their embeddings. Furthermore, we investigated these subgroups for differential features and clinical outcomes. Depending on the novel patient subgroups, we extracted biomarkers for predicting therapy success and survival. Two common subtypes of lung cancer: lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) have drastically different biological signatures, yet often they are treated similarly and classified together as non-small cell lung cancer (NSCLC) (Chen et al., 2021)(DOI: 10.1038/s41598-021-92725-8). We tested multiple graph machine learning architectures on publicly available NSCLC samples (approx. 1000 samples) integrating multi-omics data (genomic alterations, transcriptomics, epigenomics) and clinical information. This resulted in obtaining more granular patient subgroups which differed in clinical outcome.

Being noisy in a crowd: differential selective pressure on gene expression noise in model gene regulatory networks
PRESENTER: Nataša Puzović

ABSTRACT. Expression noise, the variability of the amount of gene product among isogenic cells grown in identical conditions, originates from the inherent stochasticity of diffusion and binding of the molecular players involved in transcription and translation. It has been shown that expression noise is an evolvable trait and that central genes exhibit less noise than peripheral genes in gene networks. A possible explanation for this pattern is increased selective pressure on central genes since they propagate their noise to downstream targets, leading to noise amplification. To test this hypothesis, we developed a new gene regulatory network model with inheritable stochastic gene expression and simulated the evolution of gene-specific expression noise under constraint at the network level. Stabilizing selection was imposed on the expression level of all genes in the network and rounds of mutation, selection, replication and recombination were performed. We observed that local network features affect both the probability to respond to selection, and the strength of the selective pressure acting on individual genes. In particular, the reduction of gene-specific expression noise as a response to stabilizing selection on the mean expression is higher in genes with higher centrality metrics. Furthermore, global topological structures such as network diameter, centralization and average degree affect the average expression variance and average selective pressure acting on constituent genes. Our results demonstrate that selection at the network level leads to differential selective pressure at the gene level, and local and global network characteristics are an essential component of gene-specific expression noise evolution.

Mathematical modeling of cytokine interplay in human monocytes during LPS stimulation
PRESENTER: Niloofar Nikaein

ABSTRACT. Inflammation is one of the vital mechanisms through which the immune system responds to harmful stimuli. During the course of inflammation, both anti- and pro- inflammatory signaling pathways are activated to regulate proper responses. The consequent immune responses are finely tuned by the interplay between the expressed anti- and pro-inflammatory cytokines. Imbalance in this interplay may result in immune disorders. However, this complex interplay is not yet fully understood. Here, we use a mathematical modeling approach to study the interplay between two prominent pro- and anti- inflammatory cytokines i.e., tumor necrosis factor (TNF) and interleukin 10 (IL-10), during the course of inflammation ex vivo, in human monocytes. These two cytokines are involved in the NFκB signaling pathway, a central pathway driving inflammation, which is used as the basis for our mathematical model. The system of non-linear Ordinary Differential Equations (ODEs) was trained and evaluated based on several biologically relevant scenarios, generated by a primary human monocyte cell culture model derived ex-vivo from several donors. The two scenarios, which were designed to generate training data sets, involved stimulating the human cell culture model with: 1) 10ng/mL of lipopolysaccharides (LPS), and 2) with both 10ng/mL of LPS and 100ng/mL of IL-10, at time point 0h. The validation data set was generated by stimulating the cell culture model with 10ng/mL of LPS at time point zero and 100ng/mL of IL-10, 4 hours after the administration of LPS. In each scenario, four cytokines (TNF, IL-10, IL1Ra, and IL1β) were measured at five timepoints. The proposed mathematical model successfully regenerates experimental data in all different scenarios and explains dynamics of TNF and IL-10 cytokines. The model is a step towards understanding mechanisms governing inflammatory responses and can be used for designing test experiments to further expand the knowledge in the area. Future work will involve developing the model to a Non-Linear Mixed Effects (NLME) model to account for individual variations between different donors.

Mutually Antagonistic Protein Pairs of Cancer
PRESENTER: Ertugrul Dalgic

ABSTRACT. Cancer could be viewed as a result of switch like behavior of cells, which, could be best understood by a systems level view. Antagonist protein pairs with mutual inhibition have critical roles for generating bistability. Two proteins of such antagonist pairs, negatively regulate each other directly or indirectly. Mutually acting antagonist proteins could show contrasting expression or activity levels in two different stable states. Unlike extensive analysis of gene expression, search for protein level antagonistic pairs has been limited. Here, potential cancer type specific antagonist protein pairs with mutual inhibition were obtained by a large scale analysis. Two proteins underlying a bistable switch could show opposite behavior in two different cancer types. Mutually antagonistic protein pairs were identified by selecting pairs of proteins which are ON-OFF in at least one cancer type, and OFF-ON in at least one other cancer type. Some proteins were found to have high number of antagonistic relationships with other proteins and participate in most of the associations. The proteins with highly antagonistic profile could not be attained from a differential expression or a correlation based analysis. Protein-protein and protein-DNA interactions between the antagonist proteins were also investigated. Mutually antagonistic protein pairs with direct or indirect interaction were identified. The identified proteins and their connections, provide potentially novel mechanisms that could play critical and cancer type-specific roles. Integrative analysis of mutually antagonist protein pairs contributes to our understanding of systems level changes of cancer.

Virtual Cell Modeling and Simulation Software
PRESENTER: Michael Blinov

ABSTRACT. Virtual Cell (VCell, http://vcell.org) is free open-source software for modeling cell biological systems. Biology-based interface is designed for researchers with no programming/scripting experience: they enter reactions and reaction rules in an intuitive graphical way, and VCell automatically creates the math for you. Models and simulations can be accessed from anywhere using the VCell database; models can be shared among collaborators or made publicly available. Simulations can be run locally (on a user’s computer) or using our server – a job can be send to a server and results can be retrieved any time later. VCell provides a variety of simulation frameworks. The same reaction network can be simulated using deterministic (compartmental ODE) and/or stochastic reactions (SSA solvers) simulators. By adding a geometry for compartments and diffusion for species, the same reaction network can be simulated as a reaction-diffusion-advection PDE with support for 2D kinematics and/or using spatial stochastic (reaction-diffusion with Smoldyn, http://smoldyn.org ). Geometries from 2D or 3D microscope images or from idealized analytical expressions can be used, and membrane flux, lateral membrane diffusion and electrophysiology could be incorporated. If elements of the model are defined by reaction rules (using a novel graphical user interface for defining, visualizing and verifying rule-based models), the model can be simulated using BioNetGen (http://bionetgen.org) engine and/or network-free agent-based simulations. The VCell models can be exchanged with other tools using standard formats such as SBML and BNGL, as well as exported as MatLab, SED-ML and COMBINE archive. The latest developments include a better support of SBML features, possibility to fully annotate the models, view models online, interact with ImageJ, and send simulations to runBiosimulations (https://run.biosimulations.org) service for online simulations and visualization with modified parameters.

PlantEd - a serious game about plant growth that aims to support metabolic modeling with citizen science

ABSTRACT. Plant growth is a game of survival. To win the game, the plant has to adjust its developmental programs to the changing environment and in result survive and successfully disperse seeds. This is realized by constant regulation of plant metabolism to reach specific objectives according to the developmental stage, time of the day, resource availability and environmental parameters. Several studies successfully used whole-plant metabolic models to simulate some aspects of that process and dynamic Flux Balance Analysis (dFBA) provided an effective computational framework. However, currently the ability to simulate plant growth remains in the hands of experts. Therefore, in PlantEd we implement dFBA as an engine for a simple real time strategy game, linking molecular complexity of the system with challenging game mechanics and user-friendly interface. The game enables players to explore survival strategies of plants and collectively participate in science without deep knowledge of plant metabolism and physiology.

Discovery of potential functional paths by integration of phospho-proteomics data in the PPI network using a RWR framework
PRESENTER: Christine Brun

ABSTRACT. Understanding how cellular signaling is flowing from the molecular to the cellular level is a key step to identify regulators of different diseases and revisiting the development of new potential drug targets. For years, biological approaches of signaling did not allow to probe and control signaling at the sub-cellular level with enough accuracy in space and time to directly witness the transfer of information in biological networks. To analyze datasets where signaling is controlled spatio-temporally by optogenetic, we have developed a method that traverses the space of Random Walks with Restart (RWR) models, searching for the optimally biased walk in a given context. It will allow integrating data of differentially phosphorylated proteins obtained from longitudinal phospho-proteomics assay, in response to two different modes of optogenetic activation of the Src kinase, in order to reconstruct potential functional paths in the Protein-Protein interaction (PPI) network.

On the relation between input and output distributions of scRNA-seq experiments
PRESENTER: Daniel Schwabe

ABSTRACT. Single-cell RNA sequencing determines RNA copy numbers per cell for a given gene. However, technical noise poses the question how observed distributions (output) are connected to their cellular distributions (input). We model a single-cell RNA sequencing setup consisting of PCR amplification and sequencing, and derive probability distribution functions for the output distribution given an input distribution. We provide copy number distributions arising from single transcripts during PCR amplification with exact expressions for mean and variance. We prove that the coefficient of variation of the output of sequencing is always larger than that of the input distribution. Experimental data reveals the variance and mean of the input distribution to obey characteristic relations, which we specifically determine for a HeLa data set. We can calculate as many moments of the input distribution as are known of the output distribution (up to all). This, in principle, completely determines the input from the output distribution.

Investigating brain development using tissue and single-cell gene expression data

ABSTRACT. Background: A convergence of recent technological advances in transcriptomics and computational modeling is enabling the investigation of human brain development in healthy and neuropsychiatric disease conditions. Numerous tissue/spatial and single-cell transcriptomic data sources have emerged to aid the development of brain organogenesis models. Moreover, various time-series gene expression analysis approaches have contributed to modeling the dynamic molecular systems.

Data and Methods: We used a molecular systems approach to reconstruct the regulatory activities in the developing human brain using time-series gene expression and gene regulatory data. We reconstructed the dynamic regulated interaction network using developmental transcriptome data from BrainSpan, which consists of time-series data from post-conception to adulthood in 16 brain regions. In addition to this spatial transcriptome data, we used single-cell transcriptomic data from the PsychENCODE project, which consists of various types of neurons and neuroglia differentiated in the fetal and adult brains. An integrated analysis was conducted using the spatial transcriptomic and static regulatory interaction data to reconstruct dynamic developmental trajectories regulated by transcription factors and miRNAs. Single-cell transcriptomic data was used to develop the cell differentiation trajectories of brain organogenesis.

Results: The results for the spatial transcriptomic analysis are available to download as an interactive visualization tool at: DOI:10.48610/5f24ed4. We identified transcription factors and differential genes associated with human brain organogenesis and cell differentiation. Most regulatory activities were observed during the prenatal stages of brain development. The identified significant regulators can be used to interrogate the disease-associated transcription factors obtained from various neuropsychiatric genetic disease association studies.

INSIDe: Integrative modeling of the spread of serious infectious diseases

ABSTRACT. The modeling of the spread of SARS-CoV 2 and in particular of local outbreaks has been crucial for analyzing the pandemic and its guiding policy. Building surveillance models and pipelines with reliable forecasts is crucial for the prevention of future infectious disease outbreaks. Yet, the insights and forecasts provided by existing models are necessarily limited by their resolution and the data used for inference. Models which are based solely on a single data source, like reported numbers of new infections, might have biased forecasts. Case reports might not be representative, whereas serological studies are costly and might have poor time-resolution. Wastewater monitoring, however, has proven to serve as an early indicator for the rise of reported infections and hospitalizations. The current challenge is the integration of different data sources and the interconnection of their respective modeling and simulation frameworks. We will address this with a modular, open-source platform allowing for: (i) the assembly and simulation of complex models consisting of multiple submodels (ii) the data-driven inference of unknown model parameters (e.g. the effect of NPIs) and the design of observation (testing) strategies. The INSIDe platform will combine three state-of-the-art software frameworks: ++SYSTEMS for the fine-grained simulation of flow patterns in wastewater systems, MEmilio for the simulation of the spatio-temporal spread of infectious diseases and pyABC for data-driven modeling of multi-scale processes. Combining these frameworks, we can facilitate the integration of different information. Integrative modeling will improve the assessment of the current state of epi-/pandemics, achieve more robust and reliable predictions, reduce uncertainty and thus allow decision makers to employ more precise non-pharmaceutical interventions (NPIs) to prevent outbreaks of a disease.

Seroepidemiology And Modeling Of SARS-CoV-2 In Ethiopia: Longitudinal Cohort Study Among Front-line Health-care Workers And Community
PRESENTER: Simon Merkt

ABSTRACT. Background: African countries were spared from an overwhelming burden of COVID-19 during the so-called first wave of the pandemic in 2020, but increasingly experience impacts on health systems during the second and third wave in 2021. Due to limited surveillance information the true COVID-19 burden in a country such as Ethiopia remains unknown. We aimed to investigate seroepidemiology of SARS-CoV-2 among frontline healthcare workers (HCW) and communities in Ethiopia. Methods: We conducted a population-based, longitudinal cohort study involving HCW, urban residents, and rural communities in Jimma and Addis Ababa. Serology was performed in three consecutive rounds to obtain seroprevalence and incidence estimates within the cohorts. Moreover we constructed SEIR models for the progression of the SARS-CoV-2 epidemic in Ethiopia and used Baysian approaches for their calibration. Results: SARS-CoV-2 seroprevalence among HCW increased dramatically during the study period. This corresponded with national Covid-19 disease data. The models predicted saturation level of 50%-70% for wild type virus that was confirmed by third round data. However, assuming the introduction of variant strains and re-infections, saturation levels of 80%-90% were estimated. Conclusion: SARS-CoV-2 spread in Ethiopia has been highly dynamic among HCW and urban communities. It can be speculated that the greatest wave of SARS-CoV-2 infections is currently evolving in rural Ethiopia, thus requires attention in respect to healthcare burden and disease prevention. These findings should also greatly impact Covid-19 vaccine strategies in African countries, as for most individuals this will represent booster immunization after prior SARS-CoV-2 exposure. Likely efficient one shot administrations in combination with seroprevalence assessment might be cost-effective, and especially applicable in the context of limited vaccine availability.

Disentangling the internal composition of tumour activities through a hierarchical factorization model

ABSTRACT. Genomic heterogeneity represents one of the most distinctive molecular features of any type of cancer, having a considerable impact on the efficacy of available medical treatments, often leading to relapse and subsequent deterioration of patients' health. Tumourigenesis emerges as a strongly stochastic process, producing a variable landscape of genomic configurations organised into cell subpopulations or dominant clones, building the global identity of the tumour. In this context, matrix factorisation techniques represent a suitable approach, as they are able to efficiently capture complex patterns of variability. These methodologies aim to obtain a finite set of latent patterns that represent the basic building blocks of observations, rendering in cancer samples the different molecular strategies that tumours implement to develop the hallmarks of cancer. Furthermore, these patterns are shared between samples belonging to the same phenotypic group, providing valuable insight into the main differences and commonalities between cancer subtypes. From this perspective, we present a protocol[1] designed to explore the different levels of genomic heterogeneity in a cohort of cancer patients. To this end, the protocol is based on a hierarchical factorisation model conceived from a systems biology perspective, which integrates the topology of signalling pathways. For a set of altered biological processes, the model simultaneously decomposes two different matrices representing the activity of genes and signalling pathways, respectively, obtaining for the same group of patients two sets of mutually compatible latent components. The protocol was evaluated using a set of simulations specifically designed to recapitulate the cellular hierarchy between genes and pathways, showing a high degree of accuracy when recovering both the previously introduced components and their weights into the simulated samples. In addition, the analysis performed on a real cohort of breast cancer patients recapitulated the internal composition of some of the most relevant altered biological processes in the disease, such as the internal structure of the Her2 subtype in the regulation of epidermal growth factor, the inner composition of the Basal subtype in the Notch signalling pathway and the differences between the Luminal A and Luminal B subtypes in the regulation of oestrogen response and the cell cycle regulation, describing gene- and pathway-level strategies and their combinations across the different breast cancer subtypes. We envisage that hierarchical matrix factorization designs will be essential to better understand the different levels of heterogeneity in tumour cells, revealing how patients who develop the same hallmarks of cancer start from largely different initial genomic configurations.

[1] Carbonell-Caballero, J., López-Quílez, A., Conesa, D., & Dopazo, J. (2021). Deciphering Genomic Heterogeneity and the Internal Composition of Tumour Activities through a Hierarchical Factorisation Model. Mathematics, 9(21), 2833.

Patient- and cell line-specific modeling of diffuse large B-cell lymphoma
PRESENTER: Fabian Konrath

ABSTRACT. Diffuse large B-cell lymphoma is the most common Non-Hodgkin lymphoma. Due to the intrinsic molecular heterogeneity, patients with DLBCL respond differently to therapy. While around 60% of patients are cured with immunochemotherapy, 40% eventually succumb to their disease. To better understand differences among patients and more importantly to improve treatment efficiency, it is necessary to take patient-specific molecular information into account and establish personalized treatment options. To gain a more detailed understanding of patient-specific alterations and their effect on the cellular outcome, we use different modeling approaches. By employing a logical modeling approach, we developed patient-specific models that allowed for individualized prediction of the effect of drugs and drug combinations (Thobe et al., 2021). Here, we use an ordinary differential equation-based model that describes the signaling pathways and gene regulations involved in the activation and differentiation of B-cells and that allows to predict cell fate decisions in a time-resolved manner. We implemented mutations, somatic copy number alterations and structural variants of genes that are found to be altered in a representative cohort of patients with DLBCL and lymphoma cell lines. The combination of those genetic alterations is specific for individual patients and cell lines and therefore allows to create patient and cell line-specific models. To train the cell line models and improve their predictive power, we leveraged publicly available functional genomics data obtained from CRISPR knockout screens in combination with drug response data. With our approach, we aim to elucidate the impact of individual alterations on cell fate decisions and thereby predict druggable vulnerabilities.

Neural Circuits Underlying Autism Spectrum Disorders
PRESENTER: Jon Chang

ABSTRACT. Autism spectrum disorder (ASD) is a common and highly heritable psychiatric disorder and the genetic risk factors have been well studied, however, the perturbed neural circuits responsible for characteristic behaviors are poorly understood. Our analysis uses genetic mutations to ascertain the neural circuits perturbed in ASD, and we find that a strongly interconnected system of neural structures may be the basis for the behavioral phenotypes associated with the disorder. We observe that distal projections constitute a disproportionately large fraction of the network composition, suggesting that the integration of diverse brain regions is a key property of the circuit. We also implicate key cortical and subcortical structures sharing strong functional connections, and we observe that cortical perturbations are associated with more severe intellectual phenotypes. Overall, we present a method that, to our knowledge, is the first unbiased approach to comprehensively discover and identify the neural circuitry affected in ASD.

From genome sequencing to the first draft of the genome scale metabolic model of the symbiont fungus Leucoagaricus gongylophorus LEU18496.

ABSTRACT. In this work, we report the draft of the genome-scale metabolic model of the basidiomycete fungus Leucoagaricus gongylophorus LEU18496, a symbiont of the ant Atta Mexican, using its annotated genome. The annotation was obtained from the hybrid genomic assembly, which combines the data from the sequencing platforms MiSeq Illumina and GS+ FL Roche 454. A total of 11,690 predicted genes were found by Augustus using the 48,287 contigs (N50=5241) obtained from the hybrid assembly; the training of the tool was carried out suing the protein sequences from Leucoagaricus gongylophorus AC12 genome. The obtained functional annotation showed 96.25% of sequence alignments versus the proteins reported in different databases for the Basidiomycota division. Out of the total annotated proteins, it was possible to assign EC numbers for 3150 proteins and it was possible the identification of 391 CAZymes, 52 FOLymes and 38 possible proteases. Based on the annotation of Leucoagaricus gongylophorus LEU18496, three drafts of the metabolic models were constructed using different computational tools. The draft models included: 1219 reactions, 897 metabolites and 455 genes using Carveme. 632 reactions, 800 metabolites and 356 genes using Merlin, while Aureme found 863 reactions, 1030 metabolites and 863 genes. Due to these differences, it is necessary to work on the refinement of the metabolic models. Additionally, to the reconstruction of the genome scale metabolic mode, a genome comparative analysis was carried out using four genomes of Leuocagaricus. From the analysis of the pangenome for these species, a total of 18,052 groups of genes composes the pangenome, 383 groups of genes belong to the core-genome and 17,669 to the accessory genome.

Using graph theory and recurrent neural network models to investigate the anxiety circuit

ABSTRACT. Psychiatric illnesses like anxiety are increasingly viewed as disorders of underlying neurological circuits. Understanding the mechanisms by which these circuits give rise to symptoms could enable the discovery of a novel electrophysiological biomarker that would help clinicians diagnose and treat the disorder. Here, we use graph theory and modeling with recurrent neural networks to assist with biomarker discovery and reveal mechanisms underlying anxious behavior. One-photon calcium imaging data was recorded in freely-behaving mice from three neuroanatomical regions implicated in anxiety - the locus coeruleus, dentate gyrus, and basolateral amygdala. The mice underwent an aversive contextual processing behavioral paradigm in which they were placed in two distinct environments - one in which they experience an aversive foot shock, while the other is neutral - while their freezing behavior was simultaneously recorded. The studies described employ graph theory analyses to examine how underlying correlations in the dynamics within these regions show distinct motifs in the neutral and anxiogenic contexts. By generating graphs using each single-neuron time-series as nodes and thresholded pairwise Pearson’s correlations between the time-series as weights, we found that differential functional connectivity patterns arise upon exposure to anxiogenic and neutral contexts in this mouse model. In addition, we assessed changes in the functional connectivity of graphs constructed depending on the binarized behavior (freezing/ moving). Building on the analyses, recurrent neural networks (RNNs) were constructed in order to guide experimentation and investigate mechanisms underlying the anxious responses to stimuli in animal models. The RNNs were able to approximate the dynamics of individual neurons in the recorded populations. With an additional output unit, we trained the RNNs to reconstruct the binarized behavioral time-series in addition to the neural recordings. We will use these RNNs to probe the mechanisms of aversive contextual processing and to propose experiments that could be used to validate the predictions made about these mechanisms.

Leveraging the Molecular Signatures of Cancer for Dynamic Network Modeling
PRESENTER: Enes Sefa Ayar

ABSTRACT. Molecular heterogeneity and drug resistance are among the obstacles in developing treatment strategies in cancer. Therefore, transforming patient-specific molecular data into clinically interpretable knowledge is fundamental in personalized medicine. However, not all molecular alterations drive cancer. Distinction of drivers from latent drivers and passengers, their cooperativity and exclusivity, and the temporal order of accumulation of molecular alterations is a crucial yet daunting, unsolved task. Early alterations in temporal order can inform about network rewiring and direct identification of drug targets. The challenge in temporal modelling of cancer and respective elucidation of the chronological order of molecular alterations, especially in terms of the evolution of networks beyond the alterations, remains elusive. The focus of this study is directly devoted to address this challenge. We developed an integrative computational modeling approach to reveal the network-based history of tumor progression and to design personalized therapeutic strategies based on the validated models. The core method in this approach is the graph-based cellular automata (GCA), which is a discrete dynamic model. We adapt GCA to integrate known cancer biology with large scale omics datasets and molecular interactions. The reference graph is a tissue specific interactome which consists of both protein-protein interactions and the regulatory network of transcription factor to gene interactions. State of each protein is determined at each time step based on its current state and the state of its neighboring proteins. Initially, proteins are assumed to have a state of being for cancer-driving or neutral/against cancer cell development. The known passenger and driver mutations are used at this stage. At the end of the simulation, the network level modeling provides clusters or the modules that are against cancer formation or instead cancer driving. The rules in the simulation will be determined by the known biology of the molecular alterations. Mutually exclusive and co-occurring alterations from the dataset are utilized to implement the transition rules in the modeling stage. While co-occurrence of some mutations dilutes the severity of the phenotype, some mutations have high strength in influencing the cellular state that are usually driver mutations. Importantly, it may be also possible to understand how some tumors become resistant from being sensitive to a drug with the help of temporal network data. We applied this model to the cell line dependent data in DepMap which consists of both molecular alterations (mutation profiles, transcriptomic and proteomic data) and drug response. For this purpose, we applied a rigorous comparison of the resulting dynamic networks to construct a network-based taxonomy of the tumors both within each cancer type and cross-cancer-types. We expect that this approach from molecular alterations to dynamic networks will transform the already available large datasets to gain new clinically relevant insights and improve personalized medicine.

TREAT-SGS - a multi-omics approach to treat the rare neuro-developmental disorder Schinzel–Giedion syndrome

ABSTRACT. Schinzel–Giedion syndrome (SGS) is a rare developmental disorder that causes neuronal and brain architectural defects as well as painful epileptic seizures in children leading to their premature death. The molecular cause of SGS has been identified to be a de novo gain of function (GOF) mutation in SETBP1, that leads to an over-activation of downstream pathways in a number of cell types, including activation of AKT, inhibition of DNA repair, and impact on the cell cycle. The TREAT-SGS consortium is aiming to alleviate epileptic seizures in SGS patients by correcting the abnormal activity of SETBP1 and its accumulation within the cell. The consortium is using a range of methods including single-cell omics profiling, single-cell exocytosis in vitro using human iPSC-derived cells, and in vivo using a SETBP1 GOF mouse model. Here, we present an overview of current activities within the project and how they may lead to the effective treatment of this rare disease.

GEARS: Predicting transcriptional outcomes of novel multi-gene perturbations
PRESENTER: Yusuf Roohani

ABSTRACT. Motivation: Transcriptional response to genetic perturbation can reveal fundamental insights into the functioning of a cell. It is central to numerous biomedical applications from identifying genetic interactions involved in cancer to methods for regenerative medicine. Recently, large-scale CRISPR-based perturbational screens (e.g. PerturbSeq) have emerged as an important tool for uncovering these insights. While single-cell transcriptional outcomes of perturbation can now be sampled experimentally, perturbing all possible combinations of genes remains slow, laborious, and expensive. The combinatorial explosion in the number of possible multi-gene perturbations makes computational methods indispensable for prioritizing which perturbations to test experimentally.

However, existing computational approaches face many limitations in fulfilling this potential. They are either limited by the complexity of genetic interactions that they can learn or in their ability to predict outcomes of perturbing combinations of genes not experimentally perturbed. Here, we present GEARS (Graph-Enhanced gene Activation and Repression Simulator), a geometric deep learning method that can predict transcriptional response to both single and multi-gene perturbations using single-cell RNA-sequencing data from perturbational screens.

Results: GEARS is uniquely able to predict outcomes of perturbing combinations consisting of novel genes that were never experimentally perturbed by leveraging geometric deep learning and a knowledge graph of gene-gene relationships. This significantly expands the space of possible combinatorial perturbation outcomes that can be computationally predicted using the same amount of experimental data. GEARS also predicts new biologically meaningful phenotypes that are different from experimentally-observed phenotypes used for model training.

GEARS shows a performance improvement greater than 45% in predicting genetic perturbation outcomes across 3 different datasets. GEARS’ predictions were also found to be significantly more directionally consistent, thus highlighting its ability to detect the correct nature of regulatory relationships. GEARS is able to predict outcomes for combinatorial perturbations consisting of arbitrarily many genes. GEARS has more than 50% higher precision than existing methods in predicting four key genetic interaction subtypes (e.g. synergy, epistasis, redundancy) and can identify the strongest genetic interactions twice as well.

Conclusion: GEARS uses deep learning to combine prior knowledge of biological processes with large single-cell perturbational datasets to create a reliable model of key cellular processes. As CRISPR-based perturbational screens become ubiquitous for discovering drug targets, GEARS is uniquely positioned to exponentially multiply the information gained from these screens. Moreover, since it can predict emergent transcriptional behavior, GEARS is also very useful for discovering tractable routes for engineering cell identity. Thus, GEARS is a systems biology model that can not only impact the discovery of novel small molecules for targeting disease but also push the frontier in the design of the next generation of cell and gene-based therapeutics.

MeDaX - our vision for bioMedical Data eXploration
PRESENTER: Judith Wodke

ABSTRACT. An immense amount of (bio)medical data is collected in clinical everyday life, providing an enormous potential for research and evidence-based medicine. However, this data is usually not standardized and, often, simply not accessible. Systematical sharing and usage of especially clinical data is prevented by several reasons, such as i) data complexity and heterogeneity, ii) lack of appropriate tools for storage and comparison, and iii) data security and protection of personal information.

Focusing on integration of standardized data formats (e.g. HL7/FHIR [1], OpenEHR [2], bio-ontologies [3], or COMBINE standards [4]) and generation of FAIR [5] data, we will connect diverse (bio)medical data and semantic information in an integrated, formalized, and standardized knowledge graph. Graph databases are an appropriate tool for processing highly interconnected, heterogeneous data [6-10]. Data integration will be accomplished via ETL (extract, transform, load) processes. Data and information sources include the data integration center at Universitätsmedizin Greifswald, local population studies [11,12], biomedical ontologies [13], and public information portals [14,15]. Implementation of our graph database will be accompanied by integrating and advancing methods for data provenance, quality assurance, and similarity measures. Once data connectivity and accessibility are established, we will design and implement methods and software for data analysis and prediction.

In summary, the MeDaX junior research group, will develop an innovative and efficient research platform for biomedical data exploration. This includes i) pipelines for semi-automated storage of and access to (bio)medical data in our graph database, ii) methods for data provenance, quality control, and similarity measures, and iii) tools for data analysis and prediction.

Committed to responsible and reproducible science, our results and code will be made publicly available and measures for data privacy will be considered at all project stages. To maximize benefits for researchers, clinicians, and most importantly patients, we are interested in cooperations providing us with information on their requirements.

References [1] D. Bender, K. Sartipi, Proceedings of the 26th IEEE international symposium on computer-based medical systems 2013 [2] D. Kalra et al., Studies in health technology and informatics 2005 [3] M. Salvadores et al., Semantic web 2013 [4] D. Waltemath et al., J integrative bioinf 2020 [5] M. D. Wilkinson et al., Scientific data 2016 [6] C. T. Have, L. J. Jensen, Bioinformatics 2013 [7] S. G. Finlayson et al., Sci data 2014 [8] I. Balaur et al., J Comp Biol 2017 [9] A. Fabregat et al., PLoS comp biol 2018 [10] D. S. Himmelstein et al., Elife 2017 [11] U. John et al., Sozial-und Präventivmedizin 2001 [12] H.J. Grabe et al., J translational medicine 2014 [13] N.F. Noy et al., NAR 2009 [14] nfdi4health, https://nfdi4health.de/ [15] S. Thun et al., BMC Med Inform Decis Mak 2020

Using multi-omics data and machine learning to unravel alternative splicing regulation
PRESENTER: Ulf Schmitz

ABSTRACT. Background Intron retention (IR) is a form of alternative splicing that is widespread in cells of vertebrates, insects and plants and is involved in a multitude of cell-physiological processes. IR expands gene regulatory complexity by adding new mRNA isoforms, increasing sophistication in gene expression fine-tuning via nonsense-mediated decay, and by introducing non-linear network-level dynamics. The importance of IR in humans has come into focus following landmark discoveries of dynamic IR programs in immune cell differentiation and aberrant IR patterns in cancer.

Results To investigate how IR is regulated in primary immune cells we integrated transcriptomics (mRNA-Seq) data with epigenomics data including genome-wide DNA methylation (WGBS), histone modifications (ChIP-Seq), and nucleosome occupancy (NOMe-Seq) data. Using machine learning we trained two complementary models to determine the role of epigenetic factors in the regulation of IR in cells of the innate immune system. Our results suggest that intrinsic characteristics are key for introns to evade splicing and that epigenetic marks can modulate IR levels. However, cell type-specific IR profiles are largely mediated by changes in chromatin accessibility, whereby predisposed introns in nucleosome free regions are more likely to be retained. We show that increased chromatin accessibility, as revealed by nucleosome-free regions, contributes substantially to the retention of introns in a cell-specific manner. Dynamically retained introns are involved immune response mechanisms including mmune cell adhesion and activation.

Conclusions Our results have profound implications for the analysis of other forms of alternative splicing regarding their conservation, regulation and role in normal physiology as well as in diseases such as leukemia. Our findings about epigenetic IR regulation coincide with an increasing number of studies describing pathogenic alterations in splicing regulation and therapeutic approaches targeting aberrant splicing. Therefore, our findings could inform novel epigenetic therapy development.

An interconnected multi-level mechanistic model of the human brain

ABSTRACT. 1. Introduction In the pursuit of gaining a more comprehensive understanding of the brain, we aim to expand and integrate a set of existing and newly developed mechanistic models that describe different aspects of the neuronal and hemodynamic functions of the brain. The goal is to have an interconnected multi-level, multi-scale model that can explain mechanisms on different levels of cerebral physiology. Starting at the level of ion channel kinetics, where neuronal homeostasis can be explored, and zooming out to large intraneuronal signalling networks, including descriptions of how such signalling activity; i) affect the metabolic control, and ii) the hemodynamic control of cerebral tissue, allowing changes in local vessels connect to a global whole body vessel tree. 2. Materials and Methods The interconnected model of the brain is constructed using ordinary differential equations (ODEs) and incorporate these equations with large-scale neuronal network modelling structures (NEURON and NetPyNE) [1, 2]. The interconnected model utilizes both qualitative and quantitative information form a wide variety of experimental measurements, such as measurement of action potentials (AP), magnetic resonance spectroscopy (MRS), functional magnetic resonance imaging (fMRI), as well as electrophysio-logical measurements, both on an ion channel level and a cell population level in the form of local field potential (LFP), multi-unit activity (MUA) and electro-encephalography (EEG) measurements. 3. Results The interconnected model can currently offer a detailed mechanistic description of the neurovascular coupling [3], with connections to metabolic responses [4] and neuronal network activity is in development. Further, a versatile ion channel structure and a mechanistic interpretation of neuron facilitation are being integrated. The existing models can describe experimental data as well as independent validation data, not used for model training. The model's fit to data is further validated by statistical hypothesis testing.

4. Discussion and Conclusions By integrating these aspects, we aim to achieve a model that can offer a detailed intracellular description that also reflects the physiological structure of the human brain. The model framework could be used to study and predict different diseases and physiological alterations, such as if facilitation of neurons can cause epilepsy, how Alzheimer’s disease affects the signalling patterns between neuron populations, and how stroke affects the cerebral tissue. Such an interconnected model would also allow for qualitative information to be gained from multi-species measurements. 5. References [1] Carnevale, N. T., & Hines, M. L. (2006). The NEURON book. Cambridge University Press. [2] Dura-Bernal et al (2021). NetPyNE, a tool for data-driven multiscale modeling of brain circuits. eLife 2019;8:e44494. [3] Sten S (2021). A multi-data based quantitative model for the neurovascular coupling in the brain. bioRxiv. [4] Sundqvist N (2022) Mechanistic model for human brain metabolism and the neurovascular coupling bioRxiv.

On data-driven learning of an effective energy landscape from trajectories of nonlinear dynamical biological systems
PRESENTER: Sandip Saha

ABSTRACT. To construct robust dynamical and predictive state-space models (e.g., ODE, PDE, Boolean) has been and still is at the core of systems biology. Yet, it has proven very challenging due to data limitations. As a result, the putative model architectures are under-determined for a given biological system, and the parameter space is, as a rule, huge. Here we explore the idea of constructing effective energy/potential-based models capturing a system's dynamical landscape without specifying each putative state variable and their associated parameters. Such a potential-like function provides sufficient information about a system's dynamics and stability properties. However, constructing such energy-based models is a challenging problem for an arbitrary system. Physics Informed Neural Networks (PINNs) have emerged as a promising tool to predict systems' future with the help of Neural networks under the guidance of basic physical principles such as the Principle of Least Action, the system Lagrangian, or Hamiltonian. Here we explore how to construct potential landscapes in a purely data-driven sense by suitably employing PINNs for systems biology. We have designed a pipeline to extract approximate potential landscapes by integrating several models such as Lagrangian Neural Network (LNN), Neural New-Physics Detector (NNPhD), and AI Feynman 2.0 and respecting consistency of assumptions. We validated the concept theoretically and computationally for the Double Pendulum model case. We are currently targeting the construction of a Waddington epigenetic landscape using simulated data generated from coupled transcriptional-epigenetic dynamical systems biology model (Matsushita and Kaneko, PRR, 2020). This pipeline enables us to construct the potential landscapes from any data which contains phase-space information.

Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level
PRESENTER: Natalia Savytska

ABSTRACT. The transcriptional activity of Transposable Elements (TEs) has been involved in numerous pathological processes, including neurodegenerative diseases such as amyotrophic lateral sclerosis and frontotemporal lobar degeneration. The TE expression analysis from short-read sequencing technologies is, however, challenging due to the multitude of similar sequences derived from singular TEs subfamilies and the exaptation of TEs within longer coding or non-coding RNAs. Specialised tools have been developed to quantify the expression of TEs that either relies on probabilistic re-distribution of multimapper count fractions or allow for discarding multimappers altogether. Until now, the benchmarking across those tools was largely limited to aggregated expression estimates over whole TEs subfamilies. Here, we compared the performance of recently published tools (SQuIRE, TElocal, SalmonTE) with simplistic quantification strategies (featureCounts in unique, fraction and random modes) at the individual loci level. Using simulated datasets, we examined the false discovery rate and the primary driver of those false positive hits in the optimal quantification strategy. Our findings suggest a high false discovery number that exceeds the total number of correctly recovered active loci for all the quantification strategies, including the best performing tool TElocal. As a remedy, filtering based on the minimum number of read counts or baseMean expression improves the F1 score and decreases the number of false positives. Finally, we demonstrate that additional profiling of Transcription Start Site mapping statistics (using a k-means clustering approach) significantly improves the performance of TElocal while reporting a reliable set of detected and differentially expressed TEs in human simulated RNA-seq data.

COVIDpro: Database for mining protein dysregulation in patients with COVID-19
PRESENTER: Augustin Luna

ABSTRACT. Background The ongoing pandemic of the coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) still has insufficient treatments. This is partially due to our incomplete understanding of the molecular dysregulations of the infected patients. We aimed to generate a repository and data analysis tools to examine the modulated proteins underlying COVID-19 patients for the discovery of potential therapeutic targets and diagnostic biomarkers.

Methods We built up a web server containing proteomic expression data from COVID-19 patients and equipped it with a user-friendly data analysis and visualization toolset. Specifically, we manually curated the proteomics data of COVID-19 patients published before May 2022. Relevant proteomic data was collected by manual curation of all proteomic data deposited on both ProteomeXchange and data found via PubMed search to produce a comprehensive dataset. Protein expression by disease subgroups across projects was compared. We also visualized differentially changed pathways and proteins. Moreover, circulating proteins that differentiated severe cases were identified as putative predictive biomarkers.

Findings We report a web server COVIDpro (https://www.guomics.com/covidPro/) containing the proteomics data generated by 41 original studies from 32 hospitals worldwide, involving 3077 patients covering 19 types of clinical specimens; the majority from plasma and sera. 53 protein expression matrices were collected, reporting a total of 5434 samples and 14,403 unique proteins. Our analyses showed that the lipopolysaccharide-binding protein, which was identified by the majority of the studies, was highly expressed in the blood samples of severe patients. A panel of significantly dysregulated proteins was identified to separate patients with severe disease from non-severe disease.

Tribus reveals the single-cell tumor architecture and effect of chemotherapy in ovarian cancer
PRESENTER: Julia Casado

ABSTRACT. Multiplexed imaging at single-cell resolution is becoming increasingly useful to decipher the role of cellular microenvironment in cancer and other complex diseases. To identify spatial patterns of single cells on a tissue we must first assign accurate descriptions to each cell in a step known as cell-type phenotyping. This step is challenging due to (i) laborious annotation of ground truth, (ii) segmentation artifacts, (iii) fluorescence noise and batch effects, and (iv) difficulty to reproduce human-biased thresholding. Here we present Tribus, an interactive, knowledge-based classifier that avoids hard-set thresholds and manual labeling, is robust to noise, and takes less iterations from the user than standard labeling of clustering results. Interactive analysis is done via integration with the Napari image viewer, and each analysis creates a detailed report to enable reproducibility. In this study we show that Tribus compares to human knowledge in public benchmarking datasets where manual cell type annotations are supported by the pathology community. We applied Tribus on a dataset consisting of cyclic immunofluorescence (CyCIF) images of six matched ovarian cancer samples collected before and after neoadjuvant chemotherapy. Accurate cell-type phenotyping enabled a high resolution analysis of cellular phenotypes and their spatial patterns, as well as their temporal dynamics during platinum-taxane chemotherapy. Tribus - an easily integratable open-source package - thus enables accurate phenotyping of single cells to facilitate biological discovery from highly multiplexed images.

SynBioSuite: Web-based modeling and analysis of biological systems
PRESENTER: Lukas Buecherl

ABSTRACT. Systems biology is increasingly relying on computational tools for the modeling, editing, sharing, and analyzing of biological cell functions. Yet, this process often involves multiple tools and can be a tedious and manual process. This work presents SynBioSuite, a cloud-based tool for streamlining the process of modeling and analyzing biological functions. The utility of this tool will be demonstrated by constructing and analyzing a model of the phage lambda genetic switch.

A robust clustering strategy for patient stratification in complex diseases
PRESENTER: Sara Palomino

ABSTRACT. A major limitation in understanding complex diseases is patients’ heterogeneity. Such heterogeneity is present in many types of cancer1, autoimmune diseases such as Multiple Sclerosis2, or diabetes (where up to five sub-groups have been identified3), among many others. The challenge associated is that proper diagnosis, risk assessment and prognosis will depend on the appropriate characterization of such heterogeneity, especially in diseases where environmental and lifestyle factors are also of relevance.

Therefore, it is necessary to characterize the heterogeneity by identifying the possible sub-types of patients as well as the clinical and molecular features that define them4. To characterize patient heterogeneity, clustering methodologies such as k-means – among other tools – allow the unsupervised identification of clusters; however, clinical data are afflicted by several handicaps that complicate their analysis, including mixed data types, missing values or correlated features.

In order to address all such limitations at once, we designed a novel clustering strategy: ClustAll. Briefly, ClustAll first analyzes the relation between the clinical variables within a dendogram. Then, for each “grouping of the variables” defined by the dendrogram, a patient clustering is conducted: (i) groups of variables are summarized through a Principal Component Analysis, and (ii) several combinations of methodologies and distances are considered for the clustering of patients. Finally, we use Jaccard distance to compute distances between bootstrap-validated clusterings; as a result, we identify all possible robust clusterings of patients. Therefore, ClustAll does not aim just to determine the best clustering but to identify all possible robust clusterings of the population (less dependent on parameters). We applied ClustAll in a large cohort of Liver Cirrhosis patients with Acute Decompensation from the PREDICT study (n=766)5. ClustAll identified three patient sub-groups, which were characterized by their clinical features and, finally, used those to generate a classifier. To validate the results, we applied the classifier in two additional independent cohorts. The classifier was robust in all three cohorts, and survival was significantly different in the sub-groups identified. Our next goal is to add molecular markers as an extra layer of information for the stratification.

Such an approach may also uncover important risk factors in prognosis of complex diseases with high heterogeneity. In conclusion, we believe that ClustAll addresses a growing need in the field of Precision Medicine6, and it is valuable resource that facilitates further development of the field.

[1] Almendro V, et al. Cancer Res. 2014;74(5):1338-1348. [2] Kotelnikova E, et al. PLoS Comput Biol. 2017;13(10):1-26. [3] Dennis JM, et al. Lancet Diabetes Endocrinol. 2019;7(6):442-451. [4] Roca J, et al. J Transl Med. 2014;12 Suppl 2(Suppl 2):S3. [5] Trebicka J, et al. J Hepatol. 2020;73(4):842-854. [6] Gomez-Cabrero D, Tegnér J. Iterative Systems Biology for Medicine. Curr Opin Syst Biol. 2017;3.

Composing and perturbing large-scale constraint-based metabolic models with COBREXA.jl

ABSTRACT. COBREXA.jl is a constraint-based metabolic modeling and analysis (COBRA) toolbox for Julia [1] that facilitates the construction and running of large-scale analyses on HPC platforms. This poster elaborates the design features of COBREXA.jl, and demonstrates how the package simplifies implementation of complex analyses and efficient construction of large community models.

COBREXA.jl allows many different model representations (such as matrix-like and object-oriented structured models), and provides a system of modifiers that are used to intuitively combine existing algorithms and model perturbations as "building blocks" on top of various model types. Apart from improving the efficiency in parallel processing, this allows the users to easily create complex novel workflows for meaningful use-cases.

Further, we show how this approach simplifies creating complex models composed from smaller parts, such as community and multi-organ models, constrained by various multiomic measurements. Subsequently, we run a large-scale parallel virtual screening that estimates viability of a community in different conditions and with diverse constraints on the community. The available analysis extensions improve the correspondence of the simulation results with reality, benefiting applications in bioengineering and personalized medicine. In the poster, we highlight optimizations that enable the scalability of the process.

Additionally, we summarize our observations from the construction of large community models using COBREXA.jl. We identify specific interoperability deficiencies in the published model metadata that impair the feasibility of performing construction tasks automatically. The poster suggests several ways to reduce the ambiguity in metadata annotation that would further aid both the model reproducibility and the ability to easily create biologically accurate multi-organ and multi-organism models.

[1] Kratochvíl, Heirendt, Wilken, ... & Gu. (2022). COBREXA.jl: constraint-based reconstruction and exascale analysis. Bioinformatics, 38(4).

A disease network-based deep learning approach for characterizing melanoma
PRESENTER: Xin Lai

ABSTRACT. Multiple types of genomic aberrations occur in cutaneous melanoma, and some can impact the prognosis of the disease. Hence, the integration of genomics data with clinical outcomes could facilitate the identification of the most relevant genomic features for melanoma progression. We developed a systems medicine approach that integrates genomics data with a disease network and deep learning model for the prognostic classification of melanoma patients and assessed the impact of different genomic features. Specifically, the deep learning model utilizes clusters (“communities”) identified in the network to effectively reduce the dimensionality of genomics data into a patient score profile. Using this profile, we identified three disease subtypes that differ in survival time. Subsequently, we quantified and ranked the impact of genomic features on the patient score profile using a machine-learning technique. Follow-up analysis of the top-ranking features provided us with a biological interpretation at both pathway and molecular levels, such as their mutation and interactome profiles in melanoma and their involvement in signal transduction, immune response, and cell cycle pathways. Taken together, we demonstrate the power of network-based artificial intelligence to provide personalized prognostic assessment for melanoma patients. The generic nature of the approach suggests that it is applicable to other cancer types.

Characterization of cardiac fibroblast remodeling dynamics after myocardial infarction
PRESENTER: Laura Sudupe

ABSTRACT. The heart tissue healing process after myocardial infarction (MI) is orchestrated by activated cardiac fibroblasts (CFs). High throughput technologies have demonstrated CF heterogeneity during ventricular remodeling in the last few years, where each subpopulation's particular role is becoming increasingly important. Recently, we have described the reparative cardiac fibroblasts (RCF). This activated subtype of CFs is related to the tissue healing process after MI. Our analysis identified RCFs, characterized the associated markers, and quantified their prevalence during MI recovery. However, it was clear that at 7 days-post infarct (7dpi) the RCFs were already differentiated, and was not possible to characterize their spatial location or early differentiation dynamics fully. Therefore, we use a Col1α1-GFP mouse model for MI to investigate the ventricular remodeling process by exploring fibroblasts after MI at single-cell resolution and the entire heart by spatial transcriptomics. With this approach, we aim to decipher the spatial location and dynamics of activation of the CFs, after MI, particularly the RCFs. We first defined the window of activation (WoA) of the RCF transcriptomic signature between 3 and 5 dpi using bulk RNA-seq. Secondly, using single-cell profiling of the WoA, we characterized RCF dynamics (primarily through the top marker gene Cthrc1). As a result, we identified two significant gene expression dynamics in the RCF-specific signature cluster. Finally, we localized those dynamics using 10x genomics FF Visium spatial profiling on transversal sections from healthy and 3, 5 (female and male) dpi hearts. To this end, we manually characterized the different areas of interest (RZ, remote zone; BZ, border zone; and IZ, infarcted zone) and then used an enrichment score analysis to quantify each dynamic prevalence in the different time points. Finally, we validated our data in both a preclinical model for MI, such as pigs, and in patients with varying failures of heart. In summary, we characterized RCF subtype-specific signatures that advance separately in the different time points of WoA. Our work uncovers a spatial-dependent response in the damaged tissue that implies a complex mechanism in the remodeling process, thus extending the scope and reach of systems biology.

Genome-wide analysis of genetic variants affecting metabolomic abundance in pigs divergent for feed efficiency
PRESENTER: Haja Kadarmideen

ABSTRACT. Amount of feed eaten to achieve a certain body weight (in kg), expressed as feed conversion rate or residual feed intake (RFI), after adjustments for certain covariables, is widely used in livestock industry as indicator of feed efficiency. feed efficiency determines economic viability of livestock industry and has substantial impact on sustainability of pork production. Both genetic variation (single nucleotide polymorphisms or SNP) and metabolomic variation affect phenotypes such as RFI, however metabolites are considered as the crucial link between genetic architecture and phenotype manifestations. In this study, we set up an experiment with 108 pigs wherein each of the pig was genotyped by 68K PorcineSNP80 Bead Chip array and subject to metabolomic profiling by liquid chromatography-mass spectrometry (LC-MS). All pigs had data on RFI measured at two time points during growth period. Our previous studies using this cohorts identified 45 metabolites as candidate biomarkers predictive of feed efficiency. Thus, each animal had RFI phenotype, metabolomic data and genomic data (MetaboLights accession MTBLS1384 https://www.ebi.ac.uk/metabolights/MTBLS1384 and GEO Accession- GSE144064: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE144064). In this particular study we aimed at detecting SNPs that influences the metabolite concentrations (metabolite genome-wide association study or mGWAS), estimate SNP heritability of top 45 metabolites, and investigate genetic correlations between RFI and metabolites. mGWAS found 152 genome-wide significant SNPs (p-value < 1.06 × 10-6) in association with 17 metabolites that included 90 significant SNPs annotated to 52 genes. There were 51 significant SNPs on chr 1 associated with isovalerylcarnitine and propionylcarnitine and were found to be in strong linkage disequilibrium (LD). SNPs in strong LD was annotated to genes FBXL4, and CCNC. SNP-based heritability (h^2_SNP) of each metabolite and RFI, and their additive genetic correlations (r_g) were estimated by GCTA software. The results showed that SNP-based heritabilities were high (h^2_SNP ≥ 0.5) for 10 metabolites, moderate (0.2 ≤ h^2_SNP < 0.5) for 18 metabolites and low (0.1 ≤ h^2_SNP < 0.2) for 8 metabolites from two sampling points. Implications of this result are that highly to moderately heritable metabolite biomarkers can be used as a proxy for feed efficiency phenotype. In addition, RFI was in strong genetic correlation (r_g close to -1 or 1) with inosine, isoleucyl proline, leucyl methionine, proline, riboflavin, 1-myristoyl-sn-glycero-3-phosphocholine, acetylcarnitine and lysoPC(16:0), which means they can be used as biomarkers for selecting pigs in breeding for feed efficiency.

Control of COVID-19 Outbreaks under Stochastic Community Dynamics, Bimodality, or Limited Vaccination

ABSTRACT. The COVID-19 pandemic shows that controlling COVID-19 outbreaks remains challenging even in countries with high vaccination levels. To identify limits of control for and effective measures against future outbreaks mathematical models are required. Not least because interactions in-between humans as well as virus transmission itself are rather stochastic than deterministic processes, which affects conclusions drawn only from empirical data on COVID-19 spreading.  By building an open-source geospatially referenced, demographic, agent-based model (GERDA) we could repeatedly simulate Covid-19 outbreaks in detailed communities. Based on this SIR+ simulations, we showed that COVID-19 outbreak dynamics are community-specific and depend on heterogeneity and stochasticity of human-human interactions. When comparing different vaccination strategies, we found that the herd immunity threshold depends strongly on the applied vaccination strategy.  Further, if vaccine supply is limited different vaccination strategies are optimal for reducing fatalities or for confining an outbreak. Prioritizing highly interactive people diminishes the risk for an infection wave, while prioritizing the elderly minimizes fatalities. 

The inherent stochasticity of virus spreading can, on the one hand, also lead to bimodal outcomes, which renders the effect of limited non-pharmaceutical interventions in these scenarios uncertain.  On the other hand, the stochasticity reduces the suitability of the reproduction number R0  as a predictor for the behavior of the system or the infectiousness of the virus, in low-incidence scenarios.

Exploring the “dark proteome” in hepatocellular carcinoma

ABSTRACT. A key remaining frontier in our understanding of biological systems is the “dark proteome”—that is, proteins encoded by long noncoding RNAs (lncRNAs) where the molecular function is largely unknown. The key aspect of this work is that it combines big data mining and pathology to explore the “dark proteome” in hepatocellular carcinoma (HCC), a highly aggressive cancer with limited therapeutic options. Experiments modulating lncRNAs-encoded microprotein expression confirmed a role in proliferation and metastasis in liver cancer. Considering that there are very few accurate molecular biomarkers for HCC detection, understanding function for the entities involved and their potential role in diagnosis and patient stratification will bring substantial impact in HCC therapy. In this study we identified a subset of HCC-specific lncRNAs are translated into small functional proteins. Here, abnormal chromatin remodeling in HCC triggers the expression of lncRNA-encoded microproteins. We generated specific antibodies for C20orf204-189AA and Linc013026-68AA, two of HCC-specific lncRNA-encoded microproteins. Both proteins promote cancer cell proliferation. At the molecular level, we show that C20orf204-189AA participates in ribosomal RNA transcription, while Linc013026-68AA may be phosphorylated by Epidermal Growth Factor Receptor (EGFR) and extracellular signal-regulated kinase (ERK). Remarkably, C20orf204-189AA protein was detected in 70% of primary HCCs but not in but not in control livers, suggesting that HCC-specific lncRNA-encoded proteins may represent a novel class of biomarkers and HCC targets. Our finding also sheds light on the role of the previously ignored ’dark proteome’, that originates from noncoding regions in the maintenance of cancer. Publications: 1. Polenkowski, M., Allister, SB., Burbano de Lara, S., Pierce, A., Geary, B., El Bounkari, O., Wiehlmann, L., Hoffmann, A., Whetton, AD., Tamura, T. and Tran, D.D.H. THOC5 Complexes With DDX5, DDX17 and CDK12 Are Essential in Primitive Cell Survival to Regulate R Loop Structures and Transcription Elongation Rate. http://dx.doi.org/10.2139/ssrn.4175592 (under revision in iScience)

2. Polenkowski M, Burbano de Lara S., Allister MB, Nguyen TNQ, Tamura T and Tran DD. Identification of novel micropeptides derived from hepatocellular carcinoma-specific long noncoding RNA. Int. J. Mol. Sci. 2022, 23(1), 58 (IF=6.2)

3. Burbano De Lara, S., Tran, D.D, Allister, A.B, Polenkowski, M., Nashan, B, Koch, M, Tamura, T. C20orf204, a hepatocellular carcinoma-specific protein interacts with nucleolin and promotes cell proliferation. Oncogenesis. 2021 Mar 17;10(3):31. (IF = 6.5)

Cells use molecular working memory to navigate in changing chemoattractant fields
PRESENTER: Robert Lott

ABSTRACT. Cellular migration is guided by local gradients of chemoattractants, which in the complex environments of tissues and organisms change over time and space. However, single cells are able to resolve the conflicting local information and generate persistent directional migration over large distances. We have identified a molecular mechanism relying on a metastable signaling state that enables cells to maintain transient polarisation of the signaling activity and shape in the direction of the last encountered signal, while remaining responsive to changes in signal localisation. We provide experimental evidence from live-cell imaging of the epidermal growth factor receptor phosphorylation that this transient memory arises from a remnant of the polarized signaling state,a dynamical ‘ghost’, and further drives memory guided directional migration. We thus identified a basic mechanism that underlies cellular polarization and navigation in changing chemoattractant fields.

Inferring Gene Networks Rewiring of Naive and Primed Human Embryonic Stem Cells During Cellular Reprogramming Using Single-Cell RNA-seq Data
PRESENTER: Mahdi Alshoyokh

ABSTRACT. One of the recent achievements in stem cell research is the capture of naïve human embryonic stem cells through reprogramming primed human embryonic stem cells (hESCs) (Messmer et al., 2019). Yet, the genomic circuits and the modifications of their wiring during cellular reprogramming are largely unknown. Answering this question will give us a glimpse of the different dynamics of gene interactions and genes involved in naïve and primed hESCs transition. In this study, we applied the principles of network biology and constructed gene co-expression networks (GCNs) from single-cell RNA-seq data of naïve and primed hESCs (Messmer et al., 2019). We used three methods to detect co-expressed genes in our GCNs of naïve and primed hESCs. Those were mutual information, Pearson’s correlation, and Spearman’s rank correlation. We chose these methods for their simplicity and ability to preserve the information content of genetic networks (Kiani et al., 2016). In addition, our GCNs were based on four pre-selected gene groups. Initially, we found that GCNs of naïve and primed hESCs exhibits distinct network topological structures. Then, a rewiring analysis of naïve and primed hESCs GCNs was conducted using the DyNet tool in Cytoscape. The analysis showed significant rewiring and changes in network structures and behaviours. Thus, demonstrating that naïve and primed hESCs are distinct cellular states. In addition, NANOG, SOX2, and KLF genes circuitry were more active in naïve GCNs and formed more edges with other genes. In addition, the latter genes were one of the most rewired nodes in our GCNs. We also found that KLF5 is a major player in our naïve hESCs GCNs. In addition, NANOG and SOX2 formed an edge between each other only in naïve hESCs GCNs. The observed activities of NANOG, SOX2, and KLF circuitry in our GCNs were consistent with published stem cell literature (Dunn et al., 2014; Xiao et al. 2016). Importantly, we observed very similar results in naïve and primed hESCs GCNs regardless of the network inference method. In conclusion, our method demonstrates the power of combining GCNs with single-cell RNA-seq data to unfold the rewiring of dynamic gene networks during cellular reprogramming and development, potentially allowing us to understand other biological and pathological processes and pathways.

New data-driven gene representations using deep autoencoders at multiple omics to identify candidate disease genes and robust classifiers

ABSTRACT. Traditional knowledge-driven approaches for biomarker discovery within the field of systems medicine by us and others often utilized colocalization of disease genes in disease modules. Theresults, however, strongly rely on the quality of the available molecular interaction networks, which are known to be partially incomplete and affected by research biases. In this context, novel data-driven methodologies centred around deep artificial neural networks (DNNs) have begun to consolidate. Autoencoders (AEs) are a type of unsupervised DNN that reconstructs its input in its output, after reducing its dimensionality. Here, we hypothesized that the emergent encoding of AEs trained on huge transcriptomic, methylomic and genomic repositories, including hundreds of thousands of samples, could encompass complex non-linear relationships of biological relevance. To constrain the hyperparameters of the AEs, we also tested for co-localization patterns in protein-protein networks to ensure that our selected representation prioritized genes within functional modules. Next, we used them to discover candidate disease genes, and transfer learning using the latent variables as robust features for machine learning tasks for each of the omics.

Effect of dispersal by inundation on soil bacterial communities depends on soil developmental stage
PRESENTER: Xiu Jia

ABSTRACT. Dispersal is crucial for the dynamics and assembly of bacterial communities during ecological succession. However, the relative importance of dispersal is often not directly measured. Here, a microcosm experiment was performed to directly evaluate the effect of dispersal by seawater inundation on bacterial communities from soils naturally subjected to different inundation regimes, i.e., the early and late stages of a salt marsh ecological succession located on the island of Schiermonnikoog, the Netherlands. Bacterial communities were characterized through 16S rRNA gene sequencing over a treatment period of 20 days. Our results show that bacterial communities from two successional stages responded differently to inundation. Community structure changed systematically with time in the early-stage soil but was relatively stable in the late-stage soil. In the early-stage soil, the richness of bacterial communities significantly increased over time, mainly driven by the increase of low-abundant bacteria. The different influence of inundation on two successional stages may be attributed to both contemporary conditions and historical contingency. Taken together, our results highlight that bacterial communities in the early-successional stage of salt marsh are sensitive to inundation and vulnerable to accelerated sea-level rise.

Analysis of multivariate longitudinal metabolomics data from meal challenges using RM-ASCA+
PRESENTER: Balazs Erdos

ABSTRACT. Meal challenges are increasingly used to study metabolic perturbations in the field of precision nutrition. Time-series of high-dimensional metabolomics data supports a systematic view into metabolic resilience. However, current methodology allows limited use of such data due to a lack of tools to deal with temporal dynamics. Therefore, analysis of this type of data typically concludes on the temporal and cross-species relationships independently. Comprehensive analysis must take into account the interrelatedness of the metabolites across species within an individual as well as across time. Here, we extend the RM-ASCA+ methodology to allow quantification of temporal dynamics observed in frequently sampled time-courses while accounting for the multivariate property and the experimental design in the data. We demonstrate the extended RM-ASCA+ methodology on experimental data containing time-series of metabolomics following meal challenge tests.

Towards a Computational Toolkit for Cell Fate Decision Detection
PRESENTER: Ali Balubaid

ABSTRACT. Cellular differentiation and progenitor commitment to lineages occur by a drastic change in gene expression profiles. However, the cellular strategies and the associated transcriptional signatures by which cells exit and enter stable cellular states remain unclear. With the notable proliferation of single-cell data in recent years, a new opportunity is presented to progress our understanding of cell fate decisions. Consequently, different metrics have been formulated to explore the governing dynamics depending on how our research community has conceptualized such transitions.

One early idea is to frame the transition and commitment in the language of dynamic systems theory, which provides a collection of correlation-based metrics. Here cellular differentiation is conceptualized as critical transitions, which can be analyzed using cell-to-cell correlation and gene-to-gene correlation. The ratio between the two correlations has been termed the transition index (Mojtahedi et al., 2016). Another powerful notion regards the cell as an automaton with information processing capacity (Nurse, 2008). This cues in metrics from information theory, such as Shannon entropy (Dussiau et al., 2022).

Here we perform a comparative analysis of these two approaches. We find that both approaches as now well-suited for single-cell data. However, their respective metrics are sensitive to noise due to zero inflation. Consequently, this prevents the reliable application of these metrics to the rich repertoire of single-cell data. To mitigate this problem, we decompose the covariance matrix and extract the corresponding eigenvalues. The subsequent eigenvalue-based indices are of higher robustness in detecting critical transitions (Chen et al., 2019). Furthermore, to augment an information-based analysis, we explore mathematical extensions of entropy like self-organization, emergence, and complexity (Gershenson et al., 2012), which allows a more in-depth exploration of the underlying gene dynamics and regulatory restructuring.

We implement the different metrics on simulated single-cell RNA data of different transition types – linear, bifurcating, and trifurcating trajectories (Pratapa et al., 2020). We then extend to model systems such as the hematopoietic system (Dussiau et al., 2022). Finally, we explore mouse development to unveil stages of interest (Mittnenzweig et al., 2021).

In conclusion, we demonstrate the utility and limitations of the introduced correlation-based and information-based metrics. Furthermore, we pinpoint circumstances in which they fail and then introduce variants of the established metrics. Finally, we show how such exploration not only reveals interesting dynamics at the cellular level but can also highlight events at a physiological level.

FISHing for Correlation

ABSTRACT. he eukaryotic cell division cycle is quite well investigated, the main players and most interactions are known. Most important components of the protein-protein interaction network are a set of phase-specific cyclins interacting with Cdks, inhibitors and transcription factors. The expression of the targets is well investigated on the population level, but neither on the single cell nor single molecule level. We employ multiplexed FISH labeling in S.cerevisiae to obtain time-resolved or cell cycle position-assigned transcript numbers in single cells. The mRNAs are labeled with three different dyes which enables to image three different mRNA species the same time. Based on this approach, we determine the correlation of the expression and the mutual information between transcripts of different species in single cells.

Patterns of differentially essential genetic interactions characterize functional modules across cancer types

ABSTRACT. The main goal of this study is to exploit the cancer dependency map (DepMap) to establish a map between differential essential genetic interactions and the cancer cell line context in which they gain or lose their essentiality. A novel strategy for identifying gene pairs corresponding to genetic interactions with shifting essentiality across contexts is proposed, and interactions sets with context dependent overlap are revealed as context specific functional modules. We aim to at least in part characterize the underlying genetic, proteomic and phenotypic features associated with differential essentiality - thus providing mechanistic hypotheses for cancer development and targeting. Preliminary analyses indicate that some of these interaction modules, when enriched for biological processes, point to mechanical aspects such as adhesion and cell motility, processes which could be linked to metastatic potential. While genetic rewiring (context specific synthetic lethal interactions identified from heterogenic loss of function screens) has been probed in other labs recently , systematically studying context dependent changes in essentiality of genetic interactions reveals new aspects in our understanding of the genetics of cancer.

Molecular regulators of catecholamine response in human pulmonary microvascular endothelial cells

ABSTRACT. Endothelial dysfunction is a systemic disease state of endothelial cells (ECs) occurring in a broad variety of pathologies ranging from atherosclerosis to cancer and more recently COVID-19. Sustained high levels of catecholamines associate with endothelial dysfunction and vascular permeability. Indeed, circulating adrenaline levels predict mortality in trauma patients. Yet, the molecular mechanisms that drive ECs into a pathological state in trauma patients upon elevated catecholamine levels are not well characterized. 

Here we identified the transcriptomic, metabolic and lipidomic responses to high levels of catecholamines in human pulmonary microvascular endothelial cells (HPMECs). We treated HPMECs with a wide range of equimolar concentrations of adrenaline and noradrenaline (0.5, 5 and 50 μM) and sampled cultures for molecular profiling at 4 hours and 24 hours after exposure. We identified a total set of 308 differentially expressed genes upon catecholamine exposure across all conditions. In particular, we found GRAMD1B, AREG, PDK4 and CXCR4 as the strongest transcriptional responders to treatment. Functional enrichment of responding genes distribute across three major axes: signalling, metabolism and proliferation/differentiation. Representative enriched functions within upregulated genes include cell proliferation, protein kinase B signalling and steroid metabolism. Within the set of repressed genes, we found that enrichment in characteristic functions like inflammatory response, response to interleukin-1 and lipopolysaccharide signalling. These identified functions recapitulate well the known response of ECs to catecholamines and point to novel regulation in metabolic functions. Next, towards the identification of the main transcriptional regulators of the differential response, we used a regulon enrichment approach using previously identified regulons in DoRothEA. Furthermore, given the importance of metabolism to endothelial biology, we carried out lipidome quantification of cell cultures under conditions of interest.  Finally, we plan to integrate transcriptome and lipidome profiles by the generation of constrained genome-scale metabolic models to predict differential metabolic fluxes under distinct conditions. 

Overall, this integrative analysis collectively builds a whole-scale picture of how signalling pathways downstream catecholamines receptors in ECs trigger a transcriptional state transition that consequently results in metabolic changes that drive ECs out of homeostasis. A precise understanding and prediction of ECs molecular states are fundamental to the discovery of more efficient clinical interventions in trauma patients.

Quantification of imaging biomarkers in the extracellular matrix of left and right sided colon cancer tissues
PRESENTER: Bharti Arora

ABSTRACT. Location of tumour within the colon is gaining traction as a crucial factor in determining the disease progression, prognosis and management. Studies focussing on clinicopathological features, protein/ genetic biomarkers, composition of gut microbiota and response to therapy, have reported distinctive features in the tumour originating in the left side of colon (LSCC) as opposed to the right sided colon cancer (RSCC). However, the characteristics of tumour microenvironment, particularly, the distribution, texture and density of extracellular matrix (ECM) have not been studied. We used 2-photon laser scanning microscopy (2PLSM) to visualise the intrinsic signal emitted by collagen present in the ECM of human colon tumour tissues in a label-free setting and to identify the imaging biomarkers that can quantitatively distinguish the structure of collagen fibres in the LSCC v/s RSCC. Formalin fixed 50 µm vibratome tissue sections obtained from human RSCC (n=6) and LSCC (n=4) during surgical procedures were scanned by 2PLSM, by acquiring the second-harmonic generation (SHG) signal from collagen fibres and 2-photon excited fluorescence (TPEF). The Ti:Sa laser was tuned at 870 nm; emitted light was filtered by bandpass filters (434/20 nm for SHG; 525/50 nm for TPEF) and collected by photomultiplier detectors in back- and forward direction. 2D overviews were acquired within the tumour stroma. Adjacent paraffin sections were analysed for morphology by H&E and Masson’s Trichrome staining. The collagen content in tumour tissues was quantified by surface rendering of the fibres in IMARIS 9.8.0 (Bitplane). Since fibrillar collagen reorganization has been linked with tumour progression, texture analysis was performed to reveal details about the local orientation and coherence of collagen fibres. This was obtained through a structure tensor-based methodology, wherein the local principal fibre direction and coherence were extracted via a sliding window approach. For statistical analysis, the t-test was performed using Graph Pad Prism 9 with a p-value of 0.05 (*) as a margin for statistical significance. We observed that the distribution of collagen in LSCC is denser than RSCC, which is also evident from the distributions in local orientation and coherence of collagen fibres. The standard deviation of orientation of collagen fibres, which correlates to the waviness of the fibres, is higher in RSCC compared to LSCC. The mean coherence can differentiate healthy tissues from the tumour tissues; however, it cannot distinguish LSCC from RSCC. The observed dense stroma in LSCC might explain the findings that RSCC responds better to some chemotherapies in comparison to the LSCC and correlates with the metastatic potential of the tumour. Our study highlights the relevance of using 2PLSM in extracting imaging biomarkers of collagen, so as to help the clinicians understand the role of tumour ECM in the pathophysiology of colorectal cancer as well as to stratify CC patients.

Analysis of Heterogeneity in the Tumor Microenvironment in the 4NQO HNSCC Mouse Model
PRESENTER: Lina Kroehling

ABSTRACT. Oral squamous cell carcinoma (OSCC) is the sixth most prevalent cancer. Lysine-specific demethylase 1 (LSD1) expression, a nuclear histone demethylase, progressively increases with tumor grade and stage in clinical OSCC and blocking LSD1 inhibits preneoplasia. However, the mechanisms through which LSD1 promotes carcinoma are unknown. This study evaluates whether LSD1 inhibition attenuates OSCC by affecting the prevalence of specific cell types within the tumor, and cell-cell interactions occurring between these subtypes, through single cell RNA-seq analyses.

Bias and reproducibility in a Computational Neurobiology PhD’s journey

ABSTRACT. The aim of this poster is to present some of the key questions I have used during my PhD to enquire the ethical standards and reproducibility of computer models in Computational Neurobiology. Any research can be seen as a journey, with different milestones. Here, I divide the milestones of research as “design, data collection, data analysis and reporting”, and highlight some of the key questions we can ask ourselves through our research journey, in order to make it more ethical, accessible and reproducible. This poster serves as a visual representation of questions to be asked about the bias we carry into our research, as well as what starting key resources can be used to make our research more reproducible.

Rather than presenting the results of a study, this poster suggests examples of how to include reproducibility as a key characteristic of a PhD as well as how it is possible to think about biases of our own research as we go along. The presented questions and topics can be taken and applied by anyone in their research journey.

MODalyseR—a novel software for inference of disease module hub regulators identified a putative multiple sclerosis regulator supported by independent eQTL data
PRESENTER: Hendrik de Weerd

ABSTRACT. Motivation

Network-based disease modules have proven to be a powerful concept for extracting knowledge about disease mechanisms, predicting for example disease risk factors and side effects of treatments. Plenty of tools exist for the purpose of module inference, but less effort has been put on simultaneously utilizing knowledge about regulatory mechanisms for predicting disease module hub regulators. Results

We developed MODalyseR, a novel software for identifying disease module regulators and reducing modules to the most disease-associated genes. This pipeline integrates and extends previously published software packages MODifieR and ComHub and hereby provides a user-friendly network medicine framework combining the concepts of disease modules and hub regulators for precise disease gene identification from transcriptomics data. To demonstrate the usability of the tool, we designed a case study for multiple sclerosis that revealed IKZF1 as a promising hub regulator, which was supported by independent ChIP-seq data. Availability and implementation

MODalyseR is available as a Docker image at https://hub.docker.com/r/ddeweerd/modalyser with user guide and installation instructions foun

Benchmarking Phage-Host Prediction Tools using Real Metagenomics Data
PRESENTER: Levi van Doorn

ABSTRACT. Viruses are the most abundant life entities on the planet. They can influence the composition of microbial communities and nutrient flow within ecosystems through infection of their hosts. Identifying which host or hosts a virus infects can help us understand their roles and expand our understanding of microbial ecosystem functioning. Currently, the hosts of most viruses remain unknown. Computational tools play an important role in answering the question of which virus infects what host. Many different computational tools have been developed for phage-host prediction. These tools use different methods, reference databases, and biological signals to make their phage-host predictions. This work aims to provide an independent benchmarking of the performance of nine different tools at different taxonomic levels of prediction. We compared their performance using real metagenomic datasets from a tomato soil biome. Soil has a very high microbial diversity and many of the organisms are still unknown. Understanding the microbiome of the soil better could be beneficial for agricultural practices. The samples consist of paired viral and microbial size fractions. The hosts of viruses identified in the viral size fraction are predicted using the nine tools, and compared to the microbes identified in the microbial size fraction. Performance of the tools is then assessed based on the similarity between the predicted host abundance profile and the microbial abundance profile and considers the Precision, Recall, F1-score, and the number of predictions made. As expected, the overall trend shows that all tools perform less well at lower taxonomic ranks. We find that iPHoP is the best performing tool among the tested tools (average precision: 49.8%, average recall: 85.5%, average F1 score: 59, average prediction percentage: 95.8%). While iPHoP does not have a higher precision or F1 score compared to RaFHA and WiSH. The higher prediction percentage and recall make iPHoP the best-performing tool across taxonomic ranks.

This Benchmarking of computational phage-host prediction provides independent insight into the performance of the tools. These results are only a snapshot of the complete analysis which will take the abundance of the predicted and presents microbial hosts into account and will include data from human gut and marine biomes.

Characterizing all feasible metabolic conversions of individual cells and microbial communities with elementary conversion modes.

ABSTRACT. The unbiased characterization of all feasible steady-state flux distributions in metabolic models remains limited to small-scale models due to the combinatorial explosion of possibilities. In many applications, such as when studying metabolic interactions in microbial communities, it is not necessary to consider the details of intracellular metabolism. Instead, it suffices to look only at all possible overall conversions each cell can catalyze and study their interactions. Elementary conversion modes (ECMs), easily computable with ecmtool, achieve such a characterization. However, in its current implementation, ecmtool is memory-intensive and cannot be aided appreciably by parallelization. We integrate mplrs -- a scalable parallel vertex enumeration method -- into ecmtool. This integration not only accelerates the computation of ECMs but also drastically reduces the memory requirement of ecmtool and enables its use in both standard and high-performance computing environments benefiting much more from large-scale parallelization than ever before. We demonstrate the new capabilities by enumerating all feasible conversions of the near-complete metabolic model of the minimal cell JCVI-syn3.0 growing on complex media. While an elementary flux mode analysis is out of reach, ecmtool characterized all metabolic interactions within 2.5 weeks using 60 threads. Despite JCVI-syn3.0's minimal character, we find that the model gives rise to 4.2e9 ECMs and still contains several redundant sub-networks. Thus, ecmtool's improved scaling paves the way to, e.g., unbiasedly study emerging properties of microbial metabolic interactions in (small) communities, which we demonstrate by analyzing the metabolic capabilities of a biogas-producing microbial community.

Dissecting the Developing Mouse Brain at Spatial Single-Cell Resolution Using PASTA-seq
PRESENTER: Leon Strenger

ABSTRACT. Spatially resolved transcriptomics allows to study spatial heterogeneity in tissues; however, for a detailed analysis of complex processes a resolution on at least cellular level is crucial. We developed Patterned Array Spatial Transcriptomics Assay sequencing (PASTA-seq), a spatial transcriptomics method with sub-cellular resolution using a patterned Illumina flow cell with a distance of 0.5 μm between barcode clusters. Each barcode sequence and its associated position on the flow cell are obtained in a first sequencing run. In the second sequencing run the PCR-amplified barcoded cDNA molecules, obtained from locally bound mRNA from a tissue section, are read together with their associated barcodes, providing spatial information for each molecule. Here, we present data from a ~6mm2 area mouse E13 brain section to showcase the strengths of PASTA-seq. Processing and quality control of the raw data are performed with Spacemake, a pipeline for analysing spatial transcriptomics sequencing data. We obtain a total of 62M transcripts over 10M spots which are binned into 49000 hexagons of the size of an average cell (~100μm2) to perform spatial analysis with single-cell resolution. By clustering these data we can identify the different brain regions, in particular fore- and hindbrain, and corresponding marker genes. A preliminary further sub-clustering analysis shows a layered structure that hints towards a developmental lineage of cell types. The obtained clustering and gene expression patterns resemble the ones in the well-curated Allen Brain Atlas indicating that our analysis results are valid.

Cell Type view into tomato stem elongation in Shade avoidance response
PRESENTER: Linge Li

ABSTRACT. In nature and cultivation, plants compete with their neighbours for the limited light. Many plants can out-grow to avoid further shading, this is called as shade avoidance response (SAR). In shade condition, far-red light will be enriched comparing to normal light condition.We use two cultivars of tomatoes M82 and Moneymaker as model system. Our research question: how is tomato developmental plasticity of cellular anatomy regulated in far-red light? We compared the phenotypic response of these two tomatoes cultivars upon low R:FR condition (WL+FR) vs control (WL). We found that internode 1 elongated the most after 2 weeks of WL+FR. Furthermore, we quantified all the cell types features with microcope and found pith has the most significant response. Transcriptomic analysis was performed in internode and pith cell layers. The analysis revealed GO enrichment catagory of auxin. Therefore we are currently using auxin to simulate SAR in WL, also looking into evolutionary perspective to pith elongation response.

Amino acid metabolism of Chinese Hamster Ovary cells: Comparison of growth and production phases in a fed-batch

ABSTRACT. Metabolic pathways in Chinese Hamster Ovary (CHO) cells are sub optimally regulated regarding nutrient uptake rates, which, along with the excessive supply of amino acids (AA) in media, leads to the formation of by-products. Some of these by-products, such as ammonia, are toxic to cells, and their accumulation in the media negatively affects cell growth, productivity, and product quality. Here we use 13C-MFA to analyze AA metabolisms in order to identify reformulated media compositions that reduce formation of toxic by-products. Two CHO cell lines (producer and non-producer) were grown in a fed-batch culture with temperature shift. Cultures were frequently sampled in exponential and stationary phase for cell density, viability, cell diameter, productivity, cell dry mass, and AA/metabolite (glucose, lactate, ammonia) consumption/production rates. Additionally, carbon isotope labeling was used to gain insight into intracellular flux distributions. Comparing cell lines in both phases, in general the producer showed moderately higher exchange rates for high-flux AAs and metabolites and no considerable differences for low-flux AAs. Comparing phases for both cell lines, major differences were observed between the exponential and stationary phases. Specifically, exchange rates significantly dropped for most measured metabolites and AAs, except for alanine and lactate, which switched to consumption, and rates remained constant for ammonia and few low-flux AAs. In exponential phase, the producer showed higher glutamine consumption compared to the non-producer, which coincides with higher ammonia and glutamate production. However, in stationary phase glutamine consumption decreased for both cell lines, but the production of glutamate increased for the non-producer without showing a higher glutamine consumption. We hypothesize that this results from the non-producer having lower flux from glutamate to alpha-ketoglutarate, which would cause glutamate accumulation shown as higher glutamate production. Alternatively, the producer utilizes glutamine in another pathway that does not produce glutamate. Currently, we are in the process of verifying our hypothesis with additional 13C experiments.

Construction and Decomposition of Cellular Energy Landscapes using Hopfield Neural Networks

ABSTRACT. The dynamics of cellular circuits govern biology, from developmental processes to cellular reprogramming. Ever since Waddington’s conception, these circuits of genes have been thought to control and remodel an effective “energy” landscape controlling biological programs. Here we first ask how to construct such energy landscape models from systems that are inherently non-linear and non-symmetric in their interactions. Recent work on low-dimensional polynomial non-linear systems have demonstrated the feasibility of constructing a more general potential for non-gradient systems (Stumpf 2018). However, these advances require knowledge of explicit generative polynomial dynamical equations. Furthermore, since biological systems are not only non-linear but also non-symmetric in their interactions, a non-gradient part exists in their corresponding energy landscapes. Recent efforts, using different assumptions, have attempted to separate the gradient part from the flux component. We propose two novel ways of doing such a decomposition of the energy landscape. We use a continuous Hopfield Neural Network model supplemented with hill-type sigmoid kinetics. This is sufficient to reconstruct the energy landscape and the time-dependent dynamics of the system in such a landscape. Here we use the genetic switch (a two-gene inhibitory circuit model), and a genetic oscillator (a three-gene-circuit model). Our reconstructed landscape model captures the dynamics of these models. Next, using the partial derivatives of the derived energy function, we show two different decompositions of the energy landscape. First, we disentangle a pure (symmetric) gradient dynamics component where the remainder corresponds to a flux (curl) component. Our curl component captures the dynamical contribution of the asymmetric interactions in a biological system. Next, we demonstrate an orthogonal-residual decomposition of the energy landscape by taking advantage of the gradient of the energy function. Interestingly, the symmetric and the orthogonal part in respective decomposition are not identical. We investigate what these two decompositions correspond to in the case of the two investigated model systems. We compare this decomposition with the work of Jin Wang using a probabilistic Fokker-Planck formulation as a basis for decomposition into a gradient and flux part. Contrary to Wang’s formulation, our model does not require the numerical solution to the Fokker-Planck equation for low dimensional systems, nor the stochastic simulation for the inference of parameters of the probability distribution with independency assumptions for high dimensional systems. Our work sheds technical insight for reconstructing Waddington landscapes and how different decompositions and contributions of the landscape correspond to the interactions between the elements of a biological system. Subsequently, we plan to study using simple optimization techniques for the inference of the Hopfield network of our system from single cell RNA sequencing data. This could be useful for cellular differentiation, cellular development, and cellular reprogramming studies.

Evaluation of methods to compute quasi-potential functions and their use as systems biology models
PRESENTER: Subash Balsamy

ABSTRACT. Discovering nonlinear predictive models from data without access to governing equations from first principles is at the heart of science and a central problem in systems biology. Instead of posing the model inference problem in terms of finding a large, parametrized state-variable model, we ask whether the dynamical landscape, e.g., a quasi-potential, can be computed from nonlinear models(1). Specifically, we are interested in landscape models that capture the attractors and stability properties of models of the biological system. There are several methods available in the literature for computing a quasi-potential. Here we analyze; 1) Large Deviation Theory (LDT), 2) Normal decomposition, 3) Probabilistic Landscape, 4) Symmetric-antisymmetric decomposition, 5) Lyapunov function, and a 6)Data-driven neural network method. To evaluate the performance of the methods, we use two well-established computational model systems. Interestingly, all the available methods require the existence and feasibility of using a perturbation technique (LDT) to find the transition paths between different states. Decomposing the underlying force field into two parts plays a significant role in constructing quasi-potentials. One part is the pure gradient corresponding to the quasi-potential, similarly to a Lyapunov function. The remainder is generally assumed to play no role in the global stability of the system but could drive the flow or transients in the system. Therefore, we analyzed to what extent the reminder part could predict the transition events in the quasi-steady state systems where the Critical Slowing Down (CSD) metrics failed. Finally, we analyze the problem of finding transition paths between the cellular states in these models. First, the different methods find similar but not identical aspects of the potential landscape due to the different assumptions associated with the techniques. Five of the six methods require knowledge of the systems equations. This is a severe limitation to their use as system biology models since we, as a rule, do not have access to such equations for biological systems. Interestingly, the machine learning approach requires only access to representative data and it can work with larger dimensions beyond the usual 2-3 dimensional systems. The prospect of formulating systems biology models using efficient potential functions holds the promise of mitigating the problem of finding parameters for large state-space models involving numerous state variables. This may be particularly useful when modeling cells. Cell development can, for example, be understood to be governed by a lower-dimensional epigenetic energy landscape. Furthermore, cells can evolve as points within or near stable attractors. Thus, studying the global stability among other viable attractors gives a promising way of understanding biological development and cellular differentiation at a coarse-grained level. Yet, there is a need to develop new data-driven methods that can work with sparse data from systems with a dimensionality beyond two-three axis.

Metabolic Atlas - extending metabolic networks for model organisms with enzyme turnover predictions with GotEnzymes
PRESENTER: Mihail Anton

ABSTRACT. Models in systems biology are used to understand biological processes by facilitating data interpretation, analysis, and prediction. By joining thousands of reactions, metabolites, and genes into large metabolic networks, genome-scale metabolic models (GEMs) have become valuable tools to study metabolism. The incorporation of enzymatic parameters is, for example, one of the ways to further improve the prediction of GEMs, in addition to the integration of omics data.

Metabolic Atlas, through the web platform freely available at https://metabolicatlas.org, presents the entire content of open GEMs for easy browsing and analysis. This is achieved through both tabular and map views (2D and 3D). In addition, Metabolic Atlas aims to meet the needs of the community through the development of specific tools and features through iterative releases.

The history of Metabolic Atlas began with a focus on the human metabolic model (Pornputtapong et al., 2015). The present website has been re-developed from the ground up, following open-source standards, by first integrating Human1, an integration and extensive curation of the most recent human metabolic models (Robinson et al., 2020), and Yeast8, a consensus metabolic model for S. cerevisiae (Lu et al., 2019). Following a database redesign and the addition of a performant 3D viewer, 5 more models have been integrated (Wang et al., 2021).

Version 3, the latest major release, consists in the addition of GotEnzymes, an extensive database with enzyme parameter predictions (Li et al., 2022). The 25+ million predictions of turnover numbers spanning across 8099 organisms is aiming to further facilitate computational applications and metabolic engineering. Metabolic Atlas links these predictions to the 7 integrated GEM of model organisms.

Single-cell spatial atlas of high-grade serous ovarian cancer

ABSTRACT. -Background & objective- Every year 450 women are diagnosed, and 320 women die from ovarian cancer in Finland. High-grade serous ovarian cancer (HGSC) is the most common and most lethal subtype. Notably, preliminary evidence suggests that DNA-repair homologous recombination (HR) deficiency tumors have a distinct tumor-immune microenvironment (TME), harboring an increased number of tumor-infiltrating lymphocytes as compared to HR-proficient tumors. Our objective is to characterize changes in the TME by the genotypes in HGSC.

-Methods- The dataset consisted of 1000 tissue microarray cores collected from both the tumor center and the tumor border from 250 HGSC patients. We performed cyclic immunofluorescence (tCycIF) utilizing 34 different protein markers. Image analysis was performed using software Ilastik, Cellprofiler, Matlab and Python scripts. We assessed the BRCA1/2 mutation status and the BRCA1/2 promoter hyper-methylation. For the non-BRCA1/2 mutants were performed sWGS and estimated the CCNE1 amplification status and HR-deficiency using copy number profiles and bioinformatics. The RNA expression of 340 relevant genes was assessed using Nanostring technology.

-Results- Using highly multiplexed imaging, we captured in total 4.8 million single cells. The cells were further annotated and categorized into distinct functional subpopulations within tumor, immune and stromal compartments. Interestingly, the CD8+, the CD20+, the CD11c+ and CD15+ immune cell infiltration was higher in BRCA1/2 mutated tumors as compared to CCNE1, and associated with longer overall survival. The functional clusters of cancer and stromal cells showed heterogeneity among the tumor genotypes.

-Conclusion- Integration of multi-omics data with the single-cell spatial features will reveal the TME landscapes of HGSC with the potential to discover new biomarkers for precision oncology.

Mitotic Memory as Spontaneous Symmetry Breaking in the Cell
PRESENTER: Arran Hodgkinson

ABSTRACT. During development, lineage committed cells undergo numerous cell divisions. Mitosis represents a challenge to the inheritance of transcriptional states.During mitosis, chromosomes become condensed, packed with nucleosomes and Pol II, whilst most transcription factors are expelled from this particularly hostile chromatin landscape. How a cell maintains transcriptional fidelity across cell divisions is a fundamental question in biology, in healthy organisms as well as for relentlessly dividing cancerous cells. It is now clear that not all traces of transcriptional activation or repression are erased during mitosis (Festuccia et al., Development 2017). Live imaging of transcription dynamics provides the extent to which transcriptional status is inherited between cell generations, a phenomenon called mitotic memory. Using this technique in Drosophila embryos, our team has recently visualized transcriptional memory for the first time in a multicellular organism (Ferraro et al., Curr Biol 2016). When a mother nucleus is transcriptionally active, its descendants have a higher probability to activate transcription in the following cycle, compared to descendants of inactive mothers. With this tool in hand, we now seek to employ a mathematical model of mitotic memory to be able to formulate hypotheses on the potential supports and timescales of this memory. Indeed the support of this mitotic memory could involve mitotic retention of transcription factors or epigenetic modification on histone tails (bookmarking). Because too long a memory would prevent activation of new genes or shut down of old ones when this is needed, mitotic memory should be short-term and the processes involved should be dynamic (Bellec, Radulescu, Lagha, Curr Opinion Sys Biol 2018). Of course, this does not preclude other forms of cellular memory that are long-term. Previous mathematical models developed by our team (Dufourt et al., Nat Com 2018) were based on Markov chains and described transcriptional activation by a small number of discrete limiting transitions. Here we derive a mitotic memory model from very general principles using statistical field theory. In our model, the cell's transcriptional state is represented as one of the attractors of a multistationary Phi4 model. Contrary to our previous Markov chain model, which does not cope with mitotic events, the new model is the first to describe memory transmission across multiple mitoses. In this model, we interpret mitosis using the general concept of symmetry breaking. Seen as such, our model provides the normal form for a whole class of mitotic memory models. We validate our model by using single nuclei data recording the inheritance of transcriptional states down cell lineages, in vivo. For this we employ the MS2/MCP technique and live imaging of developing early Drosophila embryos (Ferraro et al., 2016 Curr Biol; Dufourt et al., Nat Com 2018; Bellec et al., Nat Com 2022).

Multiple-omics based metabolic modelling reveals effects of drug-induced liver toxicity
PRESENTER: Zita Soons

ABSTRACT. Adverse drug events are a major burden in drug development and clinical care. Common pre-clinical assays are based on artificial in vitro exposure times and concentrations which typically do not represent in vivo conditions. The inability to translate previous findings to reliable predictions of human toxicity risks emphasizes the need for a physiologically relevant in vitro models to investigate the mechanisms of drug induced toxicity. In the HeCaTos study, we demonstrate the use of primary human liver micro-tissues to model liver-toxicity by exposing the primary human hepatocytes to physiologically relevant drug concentration-time profiles. The pharmacokinetics profiles were designed to mimic a therapeutic exposure according to the drug label or a toxic exposure corresponding to concentration-viability IC20 values after 14 days of incubation. The physiologically-relevant assay was conducted for ten well known hepatotoxic drugs and time-resolved alterations were studied over a 14 day time course. Mass-spectrometry-based proteomics data combined with mRNA sequencing-based transcriptomics analyses were applied to characterize the cellular alterations after therapeutic and toxic drug exposure to decipher mechanisms underlying drug-specific-toxicities. Overall, we found that the accumulated exposure to drugs over time was driving the cellular response rather than the application of either a toxic or therapeutic dose itself. We will illustrate this based on a toxicity timeline for initiation of apoptosis for each drug. In addition, functional enrichment analysis revealed that metabolic processes related to central carbon and nitrogen metabolism were amongst the top altered processes. Hence, we reconstructed context-specific genome-scale metabolic models using iMAT combined with probabilistic simulations integrating transcriptomics, proteomics, substrate availability, and cell viability measurements to unravel the underlying mechanisms. In our presentation, we will discuss key findings. The study illustrates a generic framework for further investigations to elucidate the impact of drug concentrations-time profiles on organ- specific cellular biochemistry and mechanisms of drug-induced liver toxicity in the future.

BioCypher: an ontology-driven framework for flexible harmonisation of large-scale biomedical knowledge graphs

ABSTRACT. Although biomedical knowledge is increasingly abundant and available, it is fragmented across providers and research groups. Large-scale pipelines have been built by individual researchers and companies to harmonise the data supplied by complementary primary datasets. These pipelines integrate the heterogeneous primary data into large, harmonised knowledge graphs. However, each of these large secondary sources still operates by their own arbitrary schema and technological foundation, making maintenance and interoperability difficult. The ways in which researchers interact with these platforms are equally heterogeneous and arbitrary, most often in the form of a web interface or software package. As a first step towards the solution of these challenges, we propose a biomedical data interface, which we call BioCypher.

BioCypher aims to facilitate integration and use of biomedical prior knowledge and data via several mechanisms, all implemented in an open-source Python package: 1) Encoding biomedical "objectness" of knowledge graph entities based on a comprehensive public ontology system (the Biolink project); 2) Translation and integration between different data sources and identifier systems using ontological hierarchy and established mapping facilities; 3) Easy-to-use, fast and flexible build mechanism for the creation of individualised task-specific knowledge graphs to allow for rapid prototyping and application-oriented performance.

The BioCypher workflow is simple and hinges on specification of knowledge graph entities via a configuration YAML file that maps the heterogeneous input data to the respective ontological classes (e.g., "Gene" or "SmallMolecule"). Via this file, the user specifies which types of entities and relationships should be represented in the output knowledge graph. Using a simple adapter script, the input data is passed to BioCypher, which then performs harmonisation, integration, and the knowledge graph build procedure.

As a first step, we are focused on migrating secondary knowledge sources such as our own database OmniPath, the Clinical Knowledge Graph, the Dependency Map project, the Open Targets knowledge graph, and others. However, the long-term goal of BioCypher is to represent each of the primary knowledge sources with their own adapter, which can then be combined "on the fly" by specifying a mode of representation, essentially allowing for recreation and recombination of these secondary knowledge sources in a harmonised manner. Down the line, this would allow centralisation of the primary knowledge collections and distributed storage and computing for individualised knowledge graphs.

Once centralised, individualised knowledge graphs can be made available to a wider range of biomedical researchers, not only bioinformatics specialists with access to high-performance computing. Knowledge graph workflows could be provided in a parallelisable cloud environment, for example via Jupyter notebooks. The ontological harmonisation can also allow for further accessibility measures, such as graphical user interfaces or dialogue systems, usable by all biomedical researchers.

Unraveling the role of network motifs to decipher the origin of robust decision-making in biological systems
PRESENTER: Amitava Giri

ABSTRACT. Living cells make precise decisions under any given physiological conditions. This kind of robust decision-making has been shown to be dynamically organised by complex tri-stable, Mushroom or Isola kinds of bifurcations related to a specific regulatory gene. How these dynamical features emerge from the complex gene regulatory networks organising such cellular processes and what are the minimal network motifs to achieve such complex dynamical features remain poorly understood. Herein, by employing bifurcation analysis and Waddington’s potential landscape analysis, we demonstrate that Mushroom and Isola bifurcations can be realised with four minimal network motifs that are constituted by combining a positive feedback motif with various incoherent feed-forward loops (IFFL). Our study reveals that the intrinsic bi-stable dynamics originating from the positive feedback motif can be fine-tuned by altering the extent of the incoherence of these minimal networks to produce these complex bifurcations. To further examine the relevance of these findings, we investigated the transcriptional network involving Nanog, Oct4 and Gata which dynamically governs the cell-fate determination of embryonic stem (ES) cells. We showed that a tri-stable Oct4 and a mushroom-like steady-state dynamics Nanog orchestrate the ES cell differentiation regulation. The model reconciles a range of experimental observations and predicts ways to fine-tune the developmental dynamics of ES cells by altering the nature and extent of positive feedback and incoherent feed-forward interactions existing in the Nanog regulatory network.

Latent variable models of fungal growth considering the data measurement process
PRESENTER: Tara Hameed

ABSTRACT. Fungal infections are common and are often treated by antifungal drugs. With the growing threat of antifungal resistance, extensive research is underway to develop novel antifungal drugs, mainly through in vitro or in vivo experiments. To evaluate and compare the efficacy of antifungal drugs, we often want to estimate the rate of fungal growth in each treatment group by fitting mathematical models of fungal growth to collected time-course fungal burden data. A popular method for collecting time-course fungal burden data is by measuring optical density (OD) due to its speed and ability to be automated. However, OD data is often affected by measurement errors. Ignoring the measurement errors may bias the derivation of the growth rates. Employing flexible non-parametric models to overcome this issue is not appropriate as they do not explicitly include a fungal growth rate as a model parameter, and fitting the data to non-parametric models would not provide a biologically interpretable fungal growth rate. To address this issue, we developed a latent variable model of fungal growth that incorporates the measurement process of the OD data. In this model, the measured fungal burden is treated as an observed variable distributed around a latent true fungal burden whose dynamics is described by logistic growth for example. We tested the model’s ability to infer a fungal growth rate from the data we collected under different treatment conditions in vitro, using a Bayesian workflow through prior predictive, fake data and posterior predictive checks. We then evaluated the models’ performance to predict the fungal growth using 5-fold cross validation. Our model outperformed baselines (random walk and a basic logistic model) in terms of estimated log predictive density, and inferred a biologically interpretable fungal growth rate. Incorporating the data measurement process in a fungal growth model improved inference and interpretability of fungal growth rates.

Web-based analysis of indirect calorimetric data arising from metabolic phenotyping systems
PRESENTER: Stephan Grein

ABSTRACT. Indirect calorimetry is widely applied in pre-clinical and clinical research to quantify energy expenditure. While the principles and technologies are well established, users would profit from a versatile tool for data exploration and analyses, including suitable statistical routines to account for differences in body mass and body composition. Therefore, we developed a standalone open-source web application framework for the analysis of indirect calorimetry data, arising typically from metabolic phenotyping systems. This web application can be deployed either to cloud infrastructures or to self-hosted services at departmental facilities providing full control and sovereignty over research data. By making use of a reactive graphical user interface the user on the one hand can conduct a fast explorative data analysis and interactive visualization / plotting of analysis results as well as in-depth analysis through established methods e. g. ANOVA or ANCOVA provided through GNU R with regards to energy balance, total energy expenditure (TEE), resting metabolic rate (RMR) and activity energy expenditure (AEE). Since our code base is in GNU R, one can readily make use of additional analysis packages as respR. Based on modular design, availability in public repositories, i. e. Github, and conformity with the FAIR principles, our application provides transparency, repeatability and reproducibility of analysis to the energy metabolism community. Through a separation and modularization of the user interface from the backend code (providing the analysis tools) the application can be extended ad libitum. In particular we have streamlined the user interface for rapid data analysis workflows (which can be saved for reproducibility and used as a template in subsequent analysis runs). When calculating the RMR from indirect calorimetric measurements, activity-related data are instrumental to discern between RMR and AEE as main contributors to TEE. Lack of activity-related data thus jeopardizes the determination of RMR. Therefore, we added a novel functionality to the web application allowing for the extraction of RMR and AEE from indirect calorimetric measurements without the need for activity-related data. We used standardized data sets (containing activity data) to validate our novel method for calculating RMR without activity data. Input data provided in diverse formats can be consolidated into common file formats (e. g. CalR, Sable, TSE) and re-exported or analyzed with our framework, to be shared with collaboration partners. Export functionality of plotting results and high-quality graphics export have been directly embedded in the web application. By providing a holistic and open-sourced data analysis framework for indirect calorimetric data, we strive to improve trust and rigor of data analysis in the field of energy metabolism.

Quantitative Spatial Profiling Reveals Tumor Microenvironment Heterogeneity and Prognostic Biomarkers Associated with Immune Population Architectures
PRESENTER: Haoyang Mi

ABSTRACT. Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive disease with poor 5-year survival rates, necessitating identification of novel therapeutic targets. Elucidating the biology of the tumor immune microenvironment (TiME) can provide vital insights into mechanisms of tumor progression. In this study, we developed a quantitative image processing platform to analyse sequential multiplexed immunohistochemistry data from archival PDAC tissue resection specimens. A 27-plex marker panel was employed to simultaneously phenotype cell populations and their functional states, followed by a computational workflow to interrogate the immune contextures of the TiME in search of potential biomarkers. The PDAC TiME reflected a low-immunogenic ecosystem with both high intratumoral and intertumoral heterogeneity. Spatial analysis revealed that the relative distance between IL-10+ myelomonocytes, PD-1+ CD4+ T cells, and Granzyme B+ CD8+ T cells correlated significantly with survival, from which a spatial proximity signature termed imRS was derived that correlated with PDAC patient survival. Furthermore, spatial enrichment of CD8+ T cells in lymphoid aggregates was also linked to improved survival. Altogether, these findings indicate that the PDAC TiME, generally considered immuno-dormant or immunosuppressive, is a spatially-nuanced ecosystem orchestrated by ordered immune hierarchies. This new understanding of spatial complexity may guide novel treatment strategies for PDAC.