previous day
all days

View: session overviewtalk overview

09:00-11:00 Session 17B: WILDCARDS *.* [Part II]

Summary: This session will entail talks across any hot topic or landmark work selected by all session chairs and organizers of ICSB 2022. They can be from any field of systems biology or associated fields. We will consider both contributed wildcard talks and approach researchers who has or is conducting exciting groundbreaking work.

Data-driven modeling of cell-cell communication networks
PRESENTER: Philipp Burt

ABSTRACT. Differentiation of CD4+ T helper cells into specialized effector cell subsets such as Th1, Th2, Tfh cells is a key element of the adaptive immune system but deriving data-driven mathematical descriptions for specific physiological scenarios is still a challenging task because the complexity of the process and the sparsity of available kinetic data. Traditionally, the primary Th cell lineage emerging from the differentiation process is associated with decision-making processes that affect the type and strength of an immune response. Recently, that view has been challenged by the use of single-cell technologies which revealed surprisingly strong phenotypic heterogeneity among Th cells even within each lineage, cumulating in the notion of a quasi-continuous phenotypic space instead of discrete lineage identities. Here, using time-course transcriptomics of in vitro generated Th1, Th2 and Th1/2 hybrid cells over the entire differentiation period, we found evidence for strong separation between the canonical cell lines, and to a lower degree also between the canonical cell lines and the hybrid cells [1]. Thus, our data suggest that despite phenotypic plasticity, strong association to lineage identities is revealed if detailed differentiation dynamics are considered. Response-time modeling [2] offered a way to integrate both views in a quantitative description of Th cell differentiation. Specifically, we implemented a model by considering stochastic variation in terms of an a priori arbitrary variance in the response time to a stimulus, while lineage identity is preserved by using lineage-specific model parameters. To design a data-driven response-time model of Th cell differentiation in acute and chronic inflammation, we first combined and analyzed several available single-cell data sets that allowed to infer response-time distributions for several parameters. Model analysis of conceptual proliferation and differentiation circuits using the measured distributions revealed qualitative and quantitative properties regarding robustness, reaction times and magnitude of the Th cell response which go beyond analysis with traditional rate-equation models [3]. Finally, we found that by annotating our model based on high-resolution in vitro data, also sparsely annotated ex vivo data from acute and chronic viral infection scenarios are well-described. Our fully annotated model predicted different windows of opportunity for influencing the Th1/Tfh fate-decision, i.e. through timed cytokine blockade, in acute and chronic infection.

References: [1] Burt*, Peine*, …, Löhning**, Thurley**. High-resolution kinetic gene expression analysis of T helper cell differentiation reveals a STAT-dependent, unique transcriptional program in Th1/2 hybrid cells (2022). Bioarchiv. Accepted in Frontiers Immunology. *equal contribution. **corresponding.

[2] Thurley, Wu, Altschuler, Modeling Cell-to-Cell Communication Networks Using Response-Time Distributions (2018). Cell Systems 6:355.

[3] Burt and Thurley, manuscript in preparation.

Modelling of Pharmacokinetics and Pharmacodynamics for first-in-man dose selection of targeted alpha therapy
PRESENTER: Andrea Gruber

ABSTRACT. In preclinical development of a new drug candidate, the prediction of a safe and efficacious dose is an important element to facilitate the selection of the dosing steps in a phase I trial. To support this critical question, we aim to understand the dose-exposure-response relationship, hence the pharmacokinetics (PK) and pharmacodynamics (PD) of a drug. By applying a PKPD model with preclinical in vitro and in vivo data, we can gather valuable insights and are then able to better translate our preclinical knowledge to humans. I will present an example of a PKPD model for targeted tumor therapy with alpha radiation. The example compound consists of a tumor targeting antibody, conjugated to the alpha radiation emitting radionuclide Thorium-227. The preclinical model was evaluated with blood and tumor concentration-time profiles in mice as well as tumor volume measurements over time. Once translated to humans we can use it for model-based simulations of the effect of different radioactive and antibody doses on tumor growth inhibition and we can therefore predict a suitable dose range for the first in man study.

A quantum circuit model for inferring gene regulatory networks with application to single-cell transcriptomic data

ABSTRACT. Quantum computing holds the promise to achieve certain types of computation that would otherwise be unachievable by classical computers. The advent in the development of quantum algorithms has enabled a variety of applications in chemistry, finance, and cryptography. In this work, we present a quantum circuit model for constructing gene regulatory networks (GRNs) from single-cell transcriptomic data. The model is based on the idea of using qubit-qubit entanglement to simulate the interactions between genes. Each qubit in the circuit represents a gene, and qubits are entangled to simulate the interaction between genes. The strength of gene interactions is estimated using the rotation angle of controlled unitary gates between qubits. We provide preliminary results that suggest our quantum single-cell GRN (qscGRN) modeling method is competitive and warrants further investigation. Specifically, we present the preliminary results derived from the single-cell RNA sequencing (scRNA-seq) data of human lymphoblastoid cell lines, focusing on genes in the nuclear factor-kappa B (NF-κB) signaling pathway. We demonstrate that our qscGRN model can recover known and detect novel regulatory relationships, setting the stage for further investigations on GRNs, given that relationships between fully interconnected genes are approached more effectively by quantum modeling than by statistical correlations. Our quantum circuit model enables the modeling of vast feature space occupied by cells in different transcriptionally activating states, simultaneously tracking activities of thousands of interacting genes and constructing more realistic single-cell GRNs without relying on statistical correlation or regression. We anticipate that quantum computing algorithms based on our circuit model will find more applications in data-driven life sciences, paving the way for the future of predictive biology and precision medicine.

Foundations for a theory of IP3 induced Ca2+ signalling - re-inventing a classic.
PRESENTER: Martin Falcke

ABSTRACT. Recent research revealed the qualitative properties of IP3 induced Ca2+ spiking common to all cell types like an exponential stimulation response of the average interspike interval (ISI) Tav, random spike timing with a robust relation between Tav and the ISI standard deviation s, large cell-to-cell variability of Tav, sensitive dependency on diffusion properties and non-oscillatory local dynamics. Current theory of Ca2+ spiking does not capture these as-pects despite its paradigmatic role for cellular biophysics. Our theory remedies this deficiency. The mathematical structure of Ca2+ spiking is a first passage process driven by Ca2+ induced Ca2+ release (CICR) to spike generation combined with deterministic dynamics of a global inhibitory variable and a response function of the IP3 pathway. The first passage process provides the ISI distribution, cell-to-cell variability, sensitive response to diffusion pro-perties and the robustness of the moment relation. The dynamics of the inhibitor sets the slope of the moment relation. The first passage process and the IP3 pathway shape the exponential stimulation response and strongly affect Tav. We also provide an understanding of the specific appearance of cell-to-cell variability in the stimulation response relation. Cell type specific properties enter mainly via the agonist sensitivity and inhibitor dynamics.

What are the functions of the short open reading frame-encoded peptides in monocytes? An interactomic approach.

ABSTRACT. Over the past decades, hundreds of thousands of short open reading frames (sORFs) have been identified on most eukaryotic RNAs. Some of these ubiquitous elements are conserved across species and may encode functional peptides. Most of these peptides, called sORF-encoded peptides (sPEPs), have failed to be annotated notably due to their short length (< 100 residues) and the use of alternative start codons (other than AUG). So far, the roles of only few sPEPs have been characterized. sPEPs whose function has been determined are involved in a wide range of key biological processes such as apoptosis, DNA reparation, mTOR signaling, transcriptional regulation, antigen presentation in eukaryotes or cardiac activity regulation in D. melanogaster. This broad range of functions suggests sPEPs may constitute a new pool of therapeutic targets. Nonetheless, most of sPEPs remain unknown or poorly characterized. In order to scrutinize their roles, we studied the way sPEPs interact with canonical proteins (RefProts), whose functions are known. To that extent, we gathered and homogenized publicly available data characterizing the human sORFs into a database, MetamORF, that has been exploited to study the functions of sPEPs. 10,475 sPEPs encoded by sORFs identified in human monocytes have been recovered from this database. The interactions between these sPEPs and RefProts expressed in monocytes have then been inferred using mimicINT, a method we developed. mimicINT is a computational method that allows predicting protein-protein interactions. It identifies the short linear motifs (SLiMs, motifs of 3-10 contiguous amino acids, usually located in disordered regions) and the globular domains, two major protein interaction interfaces, on sPEPs and RefProts. mimicINT then uses experimentally validated patterns of SLiM-domain and domain-domain interactions to infer a network of sPEP-protein interactions. Finally, Monte-Carlo simulations are used to assess the SLiM functionality by estimating the likelihood of each SLiM observed in the sequences to occur by chance. We inferred the first sPEP-RefProts interactome in human monocytes that contains nearly 150,000 binary interactions. The SLiMs and domains used by sPEPs as interaction interfaces suggest sPEPs may be involved in a broad range of processes, notably related to metabolism and immunology. Topological analysis of the network allowed us to annotate the sPEPs with Gene Ontology terms, most of which were related to metabolism and central regulatory functions (signal transduction, etc.) We noticed that sPEPs that are encoded by genes annotated to a particular biological process are preferentially interacting with the RefProts of this same biological process (BP) for most of the BPs (for 72% BP generic GO terms). These results suggest that sPEPs may be involved in the regulation of many biological processes, either ubiquitous (protein metabolism etc.) or specific to monocytes (immune responses).

Reconstruction of a catalogue of enzyme constrained models of metabolism using GECKO 2.0 and its applications to eukaryote metabolism
PRESENTER: Iván Domenzain

ABSTRACT. Genome-scale metabolic models (GEMs) have been widely used for quantitative exploration of the relation between genotype and phenotype. Streamlined integration of enzyme constraints and proteomics data into such models was first enabled by the GECKO toolbox, allowing the study of phenotypes constrained by protein limitations. In this work, we upgrade the toolbox in order to enhance models with enzyme and proteomics constraints for any organism with a compatible GEM reconstruction. The functionality of GECKO is expanded with an automated web-based framework, ecModels container, for continuous and version-controlled update of enzyme-constrained GEMs. With this, a catalogue of high-quality models for the budding yeasts Saccharomyces cerevisiae, Yarrowia lipolytica and Kluyveromyces marxianus, the model bacterium Escherichia coli and Homo sapiens metabolism, have been reconstructed.

Moreover, we present applications of these models for: 1) accurate prediction of specific growth rates and exometabolome exchange of 11 different cancer cell-lines, by solely using glucose exchange data and 2) prediction of metabolic engineering strategies, driving an experimental 70-fold increase in intracellular heme levels in S. cerevisiae cells.

Overall, we facilitate the utilization of enzyme-constrained GEMs in basic science, metabolic engineering and synthetic biology purposes.

Related papers: https://doi.org/10.1038/s41467-022-31421-1 https://www.science.org/doi/full/10.1126/scisignal.aaz1482 https://doi.org/10.1073/pnas.2108245119

The COmputational MOdeling in BIology NEtwork in 2022: Evolving needs in standardisation and model reuse
PRESENTER: David Nickerson

ABSTRACT. The “Computational Modeling in Biology” Network (https://combine-org.github.io/) [1] coordinates the development and dissemination of community standards and formats in systems biology and related fields.It ensures that previously independent standardisation initiatives develop a set of interoperable and non-overlapping standards covering all aspects of modeling in biology and medicine. The global effort is led by the COMBINE coordination board with representatives of all COMBINE standards. Building on the experience of mature projects, which already have stable specifications, software support, user-base and community governance, COMBINE helps foster and support fledgling efforts. As those efforts mature, they may become part of the core set of COMBINE standards.

Our presentation will introduce the COMBINE governance and core standards for modeling (CellML [2] , NeuroML [3], SBOL [4], SBML [5]), graphical network representation (SBGN [6], SBOL Visual [7]), simulation encoding (SED-ML [8]), dissemination of modeling studies (COMBINE archive [9]), and handling of metadata (OMEX [10]). We will discuss the available resources and community support. An overview of ongoing projects to extend the COMBINE standards will be given, as well as examples for implementation We will explain how to contribute to the standards’ development and evaluation. We show how COMBINE supports FAIR and TRUSTed efforts and infrastructures in systems medicine, thereby fostering reuse of simulation studies, reproducibility and interoperability of model-based results. We share our experiences of cooperating with model repositories such as Biomodels [11] or the Physiome Model Repository [12], and with international efforts such as the COVID19 Disease Map [13]. The development of open software and libraries, standard-enabled workflows [14], and the contribution to open science efforts (EOSC, FAIRsharing [15]) are also part of our work.

[1] Waltemath D et al. https://doi.org/10.1515/jib-2020-0005 [2] Cuellar AA et al. https://doi.org/10.1177%2F0037549703040939 [3] Gleeson P et al. https://doi.org/10.1371/journal.pcbi.1000815 [4] McLaughlin JA et al. https://doi.org/10.3389/fbioe.2020.01009 [5] Keating SM et al. https://doi.org/10.15252/msb.20199110 [6] Beltrame L et al. https://doi.org/10.1093/bioinformatics/btr339 [7] Baig H et al. https://doi.org/10.1515/jib-2021-0013 [8] Waltemath D et al. https://doi.org/10.1186/1752-0509-5-198 [9] Bergmann FT et al. https://doi.org/10.1186/s12859-014-0369-z [10] Neal ML et al. https://doi.org/10.1515/jib-2020-0020 [11] Malik-Sheriff RS et al. https://doi.org/10.1093/nar/gkz1055 [12] Yu T et al. https://doi.org/10.1093/bioinformatics/btq723 [13] Ostaszewski M et al. https://doi.org/10.15252/msb.202110387 [14] Myers CJ et al. https://doi.org/10.1042/BST20160347 [15] Sansone S-A et al. https://doi.org/10.1038/s41587-019-0080-8

Spatially-Coordinated Collective Protein Behaviors Enable Robust Circadian Timekeeping
PRESENTER: Seokjoo Chae

ABSTRACT. The mammalian circadian clock is based on a self-sustaining transcriptional-translational negative feedback loop. This machinery is expected to suffer from the heterogeneous arrival time distribution of clock protein from the noisy intracellular environment at the nucleus. However, mammals exhibit robust daily rhythms of physiological and behavioral processes, including sleep and hormone secretion. We explore under which conditions the circadian clock compensates for the heterogeneity by utilizing a modeling approach. We show that the spatially-coordinated collective behaviors of protein reduce the heterogeneity in the arrival time distribution at the nucleus, allowing the circadian clock to maintain its rhythm under a wide range of heterogeneity.

A workflow for the creation of regulatory networks integrating miRNAs and lncRNAs associated with exposure to ionizing radiation using open source data and tools

ABSTRACT. The importance of microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) in the DNA-damage response (DDR) is well established and several studies have already investigated changes in the expression of genes, miRNAs or lncRNAs after exposure to ionizing radiation (IR). However, their interplay following IR exposure has yet to be elucidated. Due to the fact that the generation of omics data using IR exposed cells is very expensive and time-consuming, our methodology focuses on using open source data and tools. We hereby propose a workflow that enables the creation of regulatory networks by integrating transcriptomics data as well as regulatory data in order to better understand the interplay between genes, transcription factors (TFs), miRNAs, and lncRNAs in the cellular response to IR. We preprocessed and analyzed publicly available gene expression profiles and then applied our consensus and integration approach using open source data and tools. To exemplify the benefits of our proposed workflow, we identified a total of 32 differentially expressed transcripts corresponding to 20 unique differentially expressed genes (DEGs) and using these DEGs, we constructed a regulatory network consisting of 106 interactions and 100 nodes (11 DEGs, 78 miRNAs, 1 DEG acting as a TF, and 10 lncRNAs). Over representation analyses (ORAs) furthermore linked our DEGs and miRNAs to annotations pertaining to the DDR and to IR. Our results show that MDM2 and E2F7 function as network hubs, and E2F7, miR-25-3p, let-7a-5p, and miR-497-5p are the four nodes with the highest betweenness centrality. In brief, our workflow that is based on open source data and tools and that generates a regulatory network, provides novel insights into the regulatory mechanisms involving miRNAs and lncRNAs in the cellular response to IR.

Mathematical modelling of the regulatory mechanisms of keratinocyte differentiation for epidermal homeostasis

ABSTRACT. The epidermis is formed by layers of keratinocytes with increasing levels of differentiation towards the outer skin. Regulation of keratinocyte differentiation across the epidermis is crucial for the homeostasis of the skin barrier that protects our body from environmental stressors and dehydration. Keratinocytes need to be terminally differentiated when they reach the outer layer to express components that constitute the skin barrier. Timely keratinocyte differentiation across all epidermal layers requires robust regulatory mechanisms. Keratinocyte differentiation is triggered by skin barrier damage via changes in extracellular calcium at a lower epidermal layer. Inflammation and immune responses can modulate this differentiation process by interfering with the calcium-activated signalling networks. Understanding how these micro-environmental conditions shape keratinocyte differentiation is necessary to unravel how epidermal homeostasis is maintained. However, such regulatory mechanisms are still not clearly understood due to difficulties in performing quantitative experiments at the individual cell level in a stratified multi-layered tissue. Here we investigate the key regulatory mechanisms of keratinocyte differentiation through mathematical modelling. We developed a minimal mechanistic model of keratinocyte differentiation by integrating experimental results from manually curated 96 publications and then applying model reduction and global parameter optimization. The key regulatory structure of the model is characterized by positive feedback with cooperativity between Np63 and Stat3, two master regulators of keratinocyte differentiation. This control structure gives rise to a history-dependent and switch-like dose-response behaviour between extracellular calcium and the expression of terminal differentiation markers which is consistent with the in vitro differentiation of keratinocytes observed in calcium switch experiments. Our model analysis demonstrated that immune responses and inflammation perturb keratinocyte differentiation by shifting the thresholds for differentiation and de-differentiation towards lower calcium concentrations, suggesting that environmental aggressors increase the sensitivity of the keratinocytes to skin barrier perturbations. How skin microenvironments trigger the dynamic regulation of keratinocyte differentiation will help understand the consequences of skin diseases such as atopic dermatitis and psoriasis on skin barrier homeostasis.

Numerical approaches for the rapid analysis of prophylactic efficacy against HIV
PRESENTER: Lanxin Zhang

ABSTRACT. HIV remains a major public health threat. Currently, neither a cure, nor an efficient vaccine are available. However, antiretroviral drugs have been used suc- cessfully to prevent HIV infection. An important method for HIV self-protection is pre-exposure prophylaxis (PrEP). To improve PrEP, many next-generation regimens, including long-acting formulations, are currently under investigation. However, the identification of parameters that determine prophylactic efficacy from clinical, ex vivo or in vitro data is extremely difficult. Clues about these parameters could prove essential for the design of next-generation PrEP com- pounds. Mathematical models that integrate pharmacological, viral- and host factors are frequently used to complement our knowledge about prophylactic efficacy of antiviral compounds. Stochastic simulation methods are currently the gold standard for estimating prophylactic efficacy from these models. However, to obtain meaningful statistics, many stochastic simulations need to be conducted to accurately determine the sample statistics. To remedy the shortcomings of stochastic simulation, we developed a numerical method to directly compute the efficacy of arbitrary prophylactic regimen in a single run, without the need for sampling. Based on several examples with dolutegravir (DTG) -based short- and long-term PrEP, as well as post-exposure prophylaxis, the correctness of this new method and its outstanding computational performance is demonstrated. For example, a continuous 6-month prophylactic profile is computed within a few seconds on a laptop computer. Due to the method’s computational performance, we envision that the approach can greatly expand the scope of analysis with regards to estimating prophylactic efficacy, by allowing to analyse the long-term effect of prophylaxis, as well as performing sensitivity analysis.

Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction

ABSTRACT. Enzyme turnover numbers (kcat) are key to understanding cellular metabolism, proteome allocation and physiological diversity, but experimentally measured kcat data are sparse and noisy. Here we provide a deep learning approach (DLKcat) for high-throughput kcat prediction for metabolic enzymes from any organism merely from substrate structures and protein sequences. DLKcat can capture kcat changes for mutated enzymes and identify amino acid residues with a strong impact on kcat values. We applied this approach to predict genome-scale kcat values for more than 300 yeast species. Additionally, we designed a Bayesian pipeline to parameterize enzyme-constrained genome-scale metabolic models from predicted kcat values. The resulting models outperformed the corresponding original enzyme-constrained genome-scale metabolic models from previous pipelines in predicting phenotypes and proteomes, and enabled us to explain phenotypic differences. DLKcat and the enzyme-constrained genome-scale metabolic model construction pipeline are valuable tools to uncover global trends of enzyme kinetics and physiological diversity, and to further elucidate cellular metabolism on a large scale.

Exploring the missing heritability in SPG7 heterozygous carriers with Whole Genome Sequencing
PRESENTER: Marie Coutelier

ABSTRACT. SPG7 biallelic mutations are the most frequent cause of autosomal recessive spastic paraplegia. The associated clinical picture is either a pure spastic paraplegia, or a complex phenotype encompassing mitochondrial features, optic atrophy, and cerebellar signs. In the recent years, the phenotype has been widened to cerebellar ataxia more generally, with or without pyramidal signs; and to clinical presentations associating extrapyramidal features, mimicking Parkinson’s disease or Multisystemic Atrophy of the cerebellar type in some cases.

In 731 patients with cerebellar ataxia, we sequenced known ataxia genes, either with amplicon-based panel sequencing (n=412) or whole exome sequencing (n=319). We found biallelic mutations in 23 patients (3.1%), often associating spastic or mitochondrial presentations. We also identified 19 heterozygous carriers of loss of function or previously described missense variants in SPG7, without a second mutation. Dominant transmission has been discussed in the literature. While it is suggested in some patients, a recent report described a deep intronic change responsible for an alteration of SPG7 expression, in trans with a missense mutation, advising genetic reexamination of heterozygous carriers.

We performed short-read Whole Genome Sequencing in 13 patients from 12 pedigrees, with a phenotype characteristic of SPG7-related cerebellar ataxia, associating either spasticity or parkinsonism, and an established causative but monoallelic SPG7 variant. In two patients, we identified deletions encompassing exons or enhancers that could explain the missing heritability and were missed with exome sequencing. In three other index cases, we identified conserved and rare mutations in SPG7 brain enhancers. In a family with two patients, three coding variants explain the presentation. Finally, one patient carried a CAG expansion in NOP56, detected with ExpansionHunter.

While further functional validation is required, whole genome sequencing appears to be a promising approach to reach molecular diagnosis in patients carrying heterozygous variants in recessive genes, when the clinical presumption is sufficiently high.

Dissecting the Developing Mouse Brain at Spatial Single-Cell Resolution Using PASTA-seq
PRESENTER: Leon Strenger

ABSTRACT. Spatially resolved transcriptomics allows to study spatial heterogeneity in tissues; however, for a detailed analysis of complex processes a resolution on at least cellular level is crucial. We developed Patterned Array Spatial Transcriptomics Assay sequencing (PASTA-seq), a spatial transcriptomics method with sub-cellular resolution using a patterned Illumina flow cell with a distance of 0.5 μm between barcode clusters. Each barcode sequence and its associated position on the flow cell are obtained in a first sequencing run. In the second sequencing run the PCR-amplified barcoded cDNA molecules, obtained from locally bound mRNA from a tissue section, are read together with their associated barcodes, providing spatial information for each molecule. Here, we present data from a ~6mm2 area mouse E13 brain section to showcase the strengths of PASTA-seq. Processing and quality control of the raw data are performed with Spacemake, a pipeline for analysing spatial transcriptomics sequencing data. We obtain a total of 62M transcripts over 10M spots which are binned into 49000 hexagons of the size of an average cell (~100μm2) to perform spatial analysis with single-cell resolution. By clustering these data we can identify the different brain regions, in particular fore- and hindbrain, and corresponding marker genes. A preliminary further sub-clustering analysis shows a layered structure that hints towards a developmental lineage of cell types. The obtained clustering and gene expression patterns resemble the ones in the well-curated Allen Brain Atlas indicating that our analysis results are valid.

Evaluation of methods to compute quasi-potential functions and their use as systems biology models
PRESENTER: Subash Balsamy

ABSTRACT. Discovering nonlinear predictive models from data without access to governing equations from first principles is at the heart of science and a central problem in systems biology. Instead of posing the model inference problem in terms of finding a large, parametrized state-variable model, we ask whether the dynamical landscape, e.g., a quasi-potential, can be computed from nonlinear models(1). Specifically, we are interested in landscape models that capture the attractors and stability properties of models of the biological system. There are several methods available in the literature for computing a quasi-potential. Here we analyze; 1) Large Deviation Theory (LDT), 2) Normal decomposition, 3) Probabilistic Landscape, 4) Symmetric-antisymmetric decomposition, 5) Lyapunov function, and a 6)Data-driven neural network method. To evaluate the performance of the methods, we use two well-established computational model systems. Interestingly, all the available methods require the existence and feasibility of using a perturbation technique (LDT) to find the transition paths between different states. Decomposing the underlying force field into two parts plays a significant role in constructing quasi-potentials. One part is the pure gradient corresponding to the quasi-potential, similarly to a Lyapunov function. The remainder is generally assumed to play no role in the global stability of the system but could drive the flow or transients in the system. Therefore, we analyzed to what extent the reminder part could predict the transition events in the quasi-steady state systems where the Critical Slowing Down (CSD) metrics failed. Finally, we analyze the problem of finding transition paths between the cellular states in these models. First, the different methods find similar but not identical aspects of the potential landscape due to the different assumptions associated with the techniques. Five of the six methods require knowledge of the systems equations. This is a severe limitation to their use as system biology models since we, as a rule, do not have access to such equations for biological systems. Interestingly, the machine learning approach requires only access to representative data and it can work with larger dimensions beyond the usual 2-3 dimensional systems. The prospect of formulating systems biology models using efficient potential functions holds the promise of mitigating the problem of finding parameters for large state-space models involving numerous state variables. This may be particularly useful when modeling cells. Cell development can, for example, be understood to be governed by a lower-dimensional epigenetic energy landscape. Furthermore, cells can evolve as points within or near stable attractors. Thus, studying the global stability among other viable attractors gives a promising way of understanding biological development and cellular differentiation at a coarse-grained level. Yet, there is a need to develop new data-driven methods that can work with sparse data from systems with a dimensionality beyond two-three axis.


Summary: This session covers the exciting recent advances that are emerging at the intersection of machine learning and systems biology. An example is in formulation of predictive machine learning models, in which model structure and/or selection of parameters can be effectively guided by reference maps of biological structures and their functional state transitions. Such calibration is a critical component of formulating biological models, including models of molecular structures, pathways, cells, tissues, and human populations. Calibration requires not only in-depth understanding of the applied model and phenomena but also application of proper optimization algorithms, where the long-term goal is to find avenues for incorporating and applying methods of machine learning and artificial intelligence. 

Location: Grenander I+II
Hypergraph for predicting adverse drug reactions

ABSTRACT. Drug-drug interactions (DDIs), i.e. adverse drug reactions (ADRs) caused by drug combinations, are a serious problem in pharmaceutical and medical sciences. Now computationally predicting DDIs is a highly paid-attention problem in not only bio- and chemo-informatics but also machine learning. Existing methods for solving this problem represent DDIs by a graph, with nodes for drugs and each edge for a DDI between the drugs connected by the edge (DDI), which is labeled by a binary vector showing the DDI types. The cutting-edge approach for learning the DDI graph is graph neural networks (GNNs), where multiple labels on edges are rather independently used, regardless that relationships among labels would be important for prediction, particularly for minor labels. We thus model DDIs by a hypergraph, where each hyperedge is a triple with two drugs and one DDi type. We then build learning methods of hypergraph neural networks, considering the above problem of GNNs. In this talk, I will describe the motivation and idea behind our hypergraph neural networks and optimization methods. I will further report the performance advantage of our hyper graph neural networks over existing methods which were obtained empirically by using benchmark datasets.

Building a Mind for Cancer

ABSTRACT. Most drugs entering clinical trials fail, often related to an incomplete understanding of the mechanisms governing drug response. Machine learning techniques hold immense promise for better drug response predictions, but most have not reached clinical practice due to their lack of interpretability and their focus on monotherapies. To address these challenges I will describe development of DrugCell, an interpretable deep learning model of human cancer cells trained on the responses of thousands of tumor cell lines to thousands of approved or exploratory therapeutic agents. The structure of the model is built from a knowledgebase of molecular pathways important for cancer, which can be drawn from literature or formulated directly from integration of data from genomics, proteomics and imaging. Based on this structure, alterations to the tumor genome induce states on specific pathways, which combine with drug structure to yield a predicted response to therapy. The key pathways in capturing a drug response lead directly to design of synergistic drug combinations, which we validate systematically by combinatorial CRISPR, drug-drug screening in vitro, and patient-derived xenografts. We also explore a recently developed technique, few-shot machine learning, for training versatile neural network models in cell lines that can be tuned to new contexts using few additional samples. The models quickly adapt when switching among different tissue types and in moving to clinical contexts, including patient-derived xenografts and clinical samples. These results begin to outline a blueprint for constructing interpretable AI systems for predictive medicine.

Prediction of enzyme kinetic parameters and substrate scopes using artificial intelligence
PRESENTER: Alexander Kroll

ABSTRACT. The Michaelis constant KM and the enzyme turnover number kcat are crucial parameters when studying enzyme kinetics and cellular physiology. The function of all enzyme-encoding genes as well as kinetic parameters for enzymatic reactions are required for genome scale metabolic models that account for cellular resource allocation. We developed general prediction models for KM, kcat, and for enzyme-substrate pairs, which can be applied to any enzyme with known protein sequence. The machine learning models for KM and kcat achieve coefficients of determination of R²=0.53 and R²=0.42, respectively, on independent test sets. Our binary prediction model that predicts whether a metabolite is a substrate for a given enzyme achieves an accuracy over 90% on an independent test set. In part, this accuracy was achieved by representing enzymes through a modified transformer model with a trained, task-specific token, and by representing small molecules with graph neural networks. Our methods outperform previous approaches developed for the tasks of predicting the substrate scope of enzymes and of predicting Michaelis constants KM, which are moreover limited to small groups of enzymes and require dense training data sets. Our prediction of the turnover number kcat leads to a similar performance as in a previous study, which was however limited to enzymatic reactions from Escherichia coli and which requires much more detailed input data, such as metabolic flux estimates and active site information, which are not available for the vast majority of enzymes. To allow an easy use of our trained models, we implemented python functions and webservers.

DeepCellRegMap: An interpretable deep learning framework for mapping genetic effects at cell subtype resolution using large-scale single-cell sequencing data
PRESENTER: Danai Vagiaki

ABSTRACT. Expression quantitative trait loci (eQTL) studies aim to associate genetic variants with changes in gene expression patterns. Recent eQTL studies utilise single-cell RNA sequencing (scRNA-seq) data to capture genetic effects at cellular resolution [Cuomo 2020, Jerber 2021, Yazar 2022]. Nevertheless, those studies require a priori defined cell types, which limits their ability to capture genetic effects on more granular and/or continuous cell-states. Moreover, they rely on linear (mixed) models, thus they are afflicted by the burden of multiple testing and cannot assess jointly the effect of variants both closely (cis-eQTLs) and distantly (trans-eQTLs) located to genes. Furthermore, these methods do not incorporate sequence-driven epigenomic variation, e.g. from chromatin accessibility data, which can provide mechanistic regulatory links for inferred genetic effects. To tackle the aforementioned issues, we developed DeepCellRegMap, an interpretable deep learning framework that allows fast, large-scale mapping of genetic effects on both discrete and continuous (sub-)cellular phenotypes. DeepCellRegMap builds on and extends CellRegMap [Cuomo 2022], a context-specific eQTL mapping framework based on linear mixed models. DeepCellRegMap uses an autoencoder based model with linear decoder, which permits end to end integration of genetic and single-cell transcriptomic data from multiple individuals, experiments and cell-types. The model explicitly decomposes the variation in observed scRNA-seq data into canonical gene expression patterns and context-specific genetic effects, while accounting for batch effects and non-genetic sources of inter-individual variability. Optionally, the model can also accept single-cell chromatin accessibility data as input for a more robust definition of the cell-state and the simultaneous quantification of genetic effects on chromatin state using sequence based deep-learning. We apply the model on two large previously published datasets, each comprising more than one million cells from hundreds of different donors [Jerber 2021, Yazar 2022] and identify both cis and cis-mediated trans genetic effects. In addition, we show how genetic variants affect expression in a context-specific manner by disrupting and/or altering the accessibility of putative regulatory regions.

References Cuomo ASE, Seaton DD, McCarthy DJ, et al. (2020) Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nature Communications, 11(1):1572 Jerber J., Seaton DD, Cuomo ASE, et al. (2021) Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nature Genetics, 53(3):304–12 Yazar S., Alquicira-Hernandez J., et al. (2022) Single-cell eQTL mapping identifies cell type–specific genetic control of autoimmune disease, Science, 376 Cuomo ASE, Heinen T., et al. (2022) CellRegMap: A statistical framework for mapping context-specific regulatory variants using scRNA-seq, Molecular Systems Biology

A novel drug response network analysis on supercomputers uncovered gastric cancer drug sensitivity mechanism
PRESENTER: Heewon Park

ABSTRACT. Understanding drug sensitivity and related markers identification are critical tasks in precision medicine, and drug sensitivity prediction has drawn a large amount of attention to understand the mechanism related drug sensitivity of cancer cell lines, because cancer-related mechanisms caused by disturbance in complex gene regulatory system. However, relatively little attention has been paid to the gene network-based prediction. Furthermore, existing studies on network-based analysis were based on pre-constructed gene networks, thus we cannot extract cell line status specific gene networks, leading to difficulty in understanding drug sensitivity related mechanisms and marker identification. We propose a novel computational methodology for clinical characteristic (e.g., drug sensitivity of cell lines) predictive gene network estimation, called a PredictiveNetwork. The objective function of the PredictiveNetwork consists of loss functions for gene network estimation and prediction, and thus we can estimate gene network and predict clinical characteristic, simultaneously. It implies that the network is estimated to be optimized for not only network estimation but also explain the clinical characteristic, thus we can identify clinical characteristic prediction specific molecular interplays. We extend the PredictiveNetwork to network-based classification and develop a Gene regulatory network-based classifier (GRN-classifier) that estimates the gene network to minimize errors for both network estimation and classification of cell lines, in line with the PredictiveNetwork. The proposed strategies need a huge amount of computation for parameters tuning and thus require the use of supercomputers (e.g., Fugaku: No.1 in Top 500; No. 2 in 2022 June). The proposed strategies are applied to gastric cancer drugs response predictive network estimation and related marker identification, especially we focus on drug resistance molecular interplays identification. The PredictiveNetwork is applied to gastric cancer drugs (doxorubicin, mitomycin-c, 5-Fluorouracil (5-FU), and docetaxel) response predictive network estimation, and GRN-classifier is applied to classify 5-FU -sensitive/resistant and 5-FU target/non-target cell-lines. Our analysis results suggest that active regulatory system between AKR family (e.g., AKR1C1 and AKR1C3) is a crucial clue to uncover mechanism of acquired gastric cancer drug resistance.

LipiDetective: a Deep Learning Model for the Detection of Lipid Species in Mass Spectra
PRESENTER: Vivian Würf

ABSTRACT. Technical advancements in the field of mass spectrometry now enable a higher resolution of the lipid species contained within a sample than ever before. However, most labs still use mainly in-house solutions for the identification of lipids from mass spectra leading to low reproducibility and comparability between labs. It is imperative that the lipid identification in a lipidomics dataset is as precise as possible to ensure that correct conclusions will be drawn. An accurate identification of molecular lipid species from mass spectra via artificial neural networks could help to accelerate the identification process and harmonize inter-lab results.

The aim of this pilot study was to test the ability of artificial neural networks to identify lipids based on their characteristic fragmentation patterns in mass spectra. To explore the feasibility of this approach, a pilot model called LipiDetective was implemented and trained on a reference dataset containing MS2 spectra of 57 different phospholipid standards. Since fragmentation patterns are influenced by collision energy the standards were measured at various collision energies starting from 10.0 eV up to 50.0 eV. In the framework of this project the performance of feedforward, convolutional, LSTM, BERT, and transformer networks was evaluated. To test the quality of the predictions, two methods of cross validation were implemented. One splits the data into train and test set via leaving out whole lipid species and the other by leaving out spectra gained at certain collision energies. A comparison of the different models showed that the tri-transformer reached the highest accuracy. This model is an adjusted version of a regular transformer in which the problem of identifying glycerophospholipids was split into three subproblems by introducing a separate model for the independent prediction of the headgroup and two side chains. The tri-transformer achieved an accuracy of 94% for cross-validation via collision energy and 42% for cross-validation via lipid species when solving the problem in the form of a regression task. By reformulating the problem as a traditional classification task this was further improved to 98% and 53% accuracy for validation via collision energy and lipid species respectively. As the goal is to differentiate between 57 lipid species, the probability of guessing the correct lipid is around 1,75%. This means the tri-transformer performs considerably better than random chance. Additionally, visualizing the attention weights for each layer also showed that the model seems to pay high attention to human interpretable peaks that can be directly correlated to specific lipid fragments.

In general, a higher resolution is still necessary to consistently distinguish fatty acids that differ by a single double bond. Nevertheless, the tested deep learning models, especially the tri-transformer, show great potential for their application to the problem of identifying lipid species from mass spectra.

NETISCE: A dynamical systems and control theory approach to cell reprogramming

ABSTRACT. The search for effective therapeutic targets in fields like regenerative medicine and cancer research has generated interest in cell fate reprogramming. This cellular reprogramming paradigm can drive cells to the desired target state from any initial state. However, methods for identifying reprogramming targets remain limited for biological systems that lack large sets of experimental data or a dynamical characterization. We present NETISCE, a novel computational tool for identifying cell fate reprogramming targets in static networks. NETISCE identifies reprogramming targets through the innovative use of control theory within a dynamical systems framework. In combination with machine learning algorithms, NETISCE estimates the attractor landscape and predicts reprogramming targets using signal flow analysis and feedback vertex set control, respectively. Through validations in studies of cell fate reprogramming from developmental, stem cell, and cancer biology, we show that NETISCE can predict previously identified cell fate reprogramming targets and identify potentially novel combinations of targets. NETISCE extends cell fate reprogramming studies to larger-scale biological networks without the need for full model parameterization and can be implemented by experimental and computational biologists to identify parts of a biological system that are relevant for the desired reprogramming task.

Fast and scalable machine learning approach for dynamic metabolic engineering

ABSTRACT. Metabolic engineering aims to produce chemicals in genetically modified microorganisms. Whereas traditional methods use constitutive or inducible promoters to express pathway enzymes, dynamic control methods aim to build regulatory circuits that respond to changes in cellular conditions. The implementation of these systems is costly and requires many trial-and-error iterations between system design and prototyping. At their core, these systems employ pathway intermediates to sense and actuate enzyme expression over the course of the culture. The key design challenge is to determine circuit architectures, i.e. which metabolite to sense and which enzymes to control, as well as the dose-response curves of the metabolite biosensors that allow to improve yield with a moderate burden on the production host. Computational simulation using differential equation models provides a low-cost option for circuit design; however, existing methods cannot simultaneously optimize both architectures and continuous biosensor parameters.

Here we present a machine learning approach to rapidly explore and identify optimal gene circuits for metabolic control. We employ a Bayesian optimization framework, which is commonly employed for tuning deep learning algorithms, to find circuit architectures and biosensors that optimize relevant design objectives. We test our method on four different models of dynamically engineered pathways, including models of allosteric and reversible reactions, systems with multiple regulatory loci, and examples of nested metabolic and genetic control. We illustrate the efficiency and scalability of this method by applying it to study robustness to growth conditions, parameter sloppiness, kinetic perturbations, and chemical toxicity factors. This method can serve as a fast screening method for dynamic control architectures prior to experimental testing.

Efficient brute-force model selection by iterative elimination of less useful model subspaces
PRESENTER: Dilan Pathirana

ABSTRACT. Model selection is a common task in systems biology, and more broadly statistical inference, wherein different models are compared to find the most useful model. This can often involve a single superset model, from which the model space is generated by disabling model components. The model space grows exponentially with the number of model components; hence, a brute-force strategy is often computationally infeasible. In such model spaces, the likelihood of any model is an upper bound on the likelihood of all subset models.

We will present algorithms that use this bound to efficiently eliminate less useful model subspaces, to find the most useful model in a model space more efficiently. The algorithms are suitable for use with model selection criteria, with specific formulations provided for the Akaike information criterion (AIC), the corrected AIC (AICc), and the Bayesian information criterion (BIC). We will also present application examples in systems biology.

The insidious trappings of gene set enrichments

ABSTRACT. Gene set enrichments remain one of the main tools linking statistical results from high throughput techniques with biological interpretation. In short, they rely on categorizing genes into a number of gene set and using an appriopriate statistical test to examine the given gene set as a whole. For example, we may ask whether interferon stimulated genes (ISG) are more likely to be differentially expressed between patients and healthy controls.

However, the apparent simplicity of gene set enrichments is misleading. Recently, we have shown a widely spread, but incorrect analysis: when two conditions (e.g. patients and healthy controls) are compared independently in two group of patients, incorrect use of gene set enrichments may lead to false positives. Moreover, we show that such positives are related to the differences between conditions, but not groups, and thus they seem to be "reasonable" in the given context. For example, we might come to the conclusion that ISG are stimulated stronger by one particular strain of the virus, whereas in reality there is no statistical difference between the groups.

A second common issue are the widely spread randomization-based tests for gene set enrichments. There are two main approaches for using a randomization test in the context of gene set enrichments: to estimate the null distribution it is possible to randomize either the samples or the genes. Randomizing samples is effective and correct, however requires a sufficient number of samples. We show that the popular alternative – randomizing genes rather than samples – leads to false positives and spurious results.

Inference of differential gene regulatory networks from gene expression data using boosted differential trees
PRESENTER: Gihanna Galindez

ABSTRACT. Diseases can be caused by molecular perturbations that induce specific changes in regulatory interactions and their coordinated expression, also referred to as network rewiring. However, the detection of complex changes in regulatory connections remains a challenging task and would benefit from the development of novel non-parametric approaches. We developed a new ensemble method called BoostDiff (boosted differential regression trees) to infer a differential network discriminating between two conditions. BoostDiff builds an adaptively boosted (AdaBoost) ensemble of differential trees with respect to a target condition. To build the differential trees, we propose differential variance improvement as a novel splitting criterion. Variable importance measures derived from the resulting models are used to reflect changes in gene expression predictability and to build the output differential networks. We first applied BoostDiff on simulated data in comparison to existing differential network methods. We then demonstrate the power of our approach when applied to real transcriptomics data in COVID-19 and Crohn’s disease. BoostDiff identifies context-specific networks that are enriched with genes of known disease-relevant pathways and complements standard differential expression analyses. BoostDiff is available at https://github.com/gihannagalindez/boostdiff_inference.

Metabolic profile predictions using efficient and interpretable data descriptors generated with relational learning

ABSTRACT. Metabolic profiles are arguably the type of biological data that most closely represents the functional readout of the physiological state of an organism, and thus, increased understanding of what controls and defines the accumulated abundances of these biochemicals is of high scientific interest. While the yeast Saccharomyces cerevisiae is an extremely well-studied model organism, the amount of high-quality data available on its metabolome is still lacking.

One of the keys to success in applying machine learning to scientific research tasks is the use of meaningful data representations. While popular methods such as deep neural networks (DNNs) are very successful in extracting rich internal representations from seemingly simple inputs, they have poor interpretability and explainability, wich are of the utmost importance in systems biology. More explainable models improve the understanding of the implications of systems biology models, and enable models to be rationally improved.

There is a wide range of available knowledge on yeast physiology contained in databases such as the Saccharomyces Genome Database, and in highly curated genome scale metabolic models such as Yeast8. Being the product of decades worth of experiments on multiple different modalities, these are rich in information, and adhere to semantically meaningful ontologies. By representing this prior knowledge in a richly expressive Datalog database we generate data descriptors using relational learning that makes more efficient use of existing propositional data and both improves model predictions and their interpretability.

Construction and Decomposition of Cellular Energy Landscapes using Hopfield Neural Networks

ABSTRACT. The dynamics of cellular circuits govern biology, from developmental processes to cellular reprogramming. Ever since Waddington’s conception, these circuits of genes have been thought to control and remodel an effective “energy” landscape controlling biological programs. Here we first ask how to construct such energy landscape models from systems that are inherently non-linear and non-symmetric in their interactions. Recent work on low-dimensional polynomial non-linear systems have demonstrated the feasibility of constructing a more general potential for non-gradient systems (Stumpf 2018). However, these advances require knowledge of explicit generative polynomial dynamical equations. Furthermore, since biological systems are not only non-linear but also non-symmetric in their interactions, a non-gradient part exists in their corresponding energy landscapes. Recent efforts, using different assumptions, have attempted to separate the gradient part from the flux component. We propose two novel ways of doing such a decomposition of the energy landscape. We use a continuous Hopfield Neural Network model supplemented with hill-type sigmoid kinetics. This is sufficient to reconstruct the energy landscape and the time-dependent dynamics of the system in such a landscape. Here we use the genetic switch (a two-gene inhibitory circuit model), and a genetic oscillator (a three-gene-circuit model). Our reconstructed landscape model captures the dynamics of these models. Next, using the partial derivatives of the derived energy function, we show two different decompositions of the energy landscape. First, we disentangle a pure (symmetric) gradient dynamics component where the remainder corresponds to a flux (curl) component. Our curl component captures the dynamical contribution of the asymmetric interactions in a biological system. Next, we demonstrate an orthogonal-residual decomposition of the energy landscape by taking advantage of the gradient of the energy function. Interestingly, the symmetric and the orthogonal part in respective decomposition are not identical. We investigate what these two decompositions correspond to in the case of the two investigated model systems. We compare this decomposition with the work of Jin Wang using a probabilistic Fokker-Planck formulation as a basis for decomposition into a gradient and flux part. Contrary to Wang’s formulation, our model does not require the numerical solution to the Fokker-Planck equation for low dimensional systems, nor the stochastic simulation for the inference of parameters of the probability distribution with independency assumptions for high dimensional systems. Our work sheds technical insight for reconstructing Waddington landscapes and how different decompositions and contributions of the landscape correspond to the interactions between the elements of a biological system. Subsequently, we plan to study using simple optimization techniques for the inference of the Hopfield network of our system from single cell RNA sequencing data. This could be useful for cellular differentiation, cellular development, and cellular reprogramming studies.

A training strategy for hybrid models to break the curse of dimensionality: An application in mortality estimation for cohorts of COVID-19 patients
PRESENTER: Moein E. Samadi

ABSTRACT. A hybrid mechanistic/data-driven model combines mechanistic or physics-based equations that describe available process knowledge with data-driven approaches such as Machine Learning. In comparison to sole Machine Learning models, hybrid models promise a low demand for training data alongside the ability to extrapolate beyond the validation data domain.

In this work, we introduce a supervised learning strategy for tree-structured hybrid models to perform a binary classification task. Given a set of binary labeled data, the challenge is to use them to develop a model that accurately assesses labels of new unlabeled data. Our strategy employs graph-theoretic methods to analyze the data and deduce a function that maps input features to output labels.

Our focus here is on data sets represented by binary features in which the label assessment of unlabeled data points is always extrapolation. Our strategy shows the existence of small sets of data points within given binary data for which knowing the labels allows for extrapolation to the entire valid input space. An implementation of our strategy yields a notable reduction of training-data demand in a binary classification task compared with different supervised machine learning algorithms.

As an application, we have fitted a tree-structured hybrid model to the vital status of a cohort of COVID-19 patients requiring intensive-care unit treatment and mechanical ventilation. Our learning strategy yields the existence of patient cohorts for whom knowing the vital status enables extrapolation to the entire valid input space of the developed hybrid model.

Predicting developmental states in zebrafish using transfer learning

ABSTRACT. Understanding how cells make decisions and change over time is an important question in developmental biology. Recent advances in single-cell technologies allow for the thorough and unbiased characterization of molecular states across developmental stages. Yet, these techniques can only provide static snapshots of the cellular dynamics, revealing a ‘cell state’ in gene expression space. Many computational methods for trajectory inference have been developed to construct a pseudo-temporal ordering of the cells according to their transcriptomic profiles. But these approaches are descriptive in nature and unable to produce in-sample or out-of-sample predictions. Recently, generative models have shown great success in out-of-sample predictions, but remain limited to perturbation response and batch removal. To address these limitations, we present Dcp (deep cell predictor), a transfer learning approach based on variational autoencoder, and normalizing flows over single-cell transcriptomic data. Dcp models cell transitions in distinct lineages during early zebrafish development. We show that the model accurately predicts gene expression changes across developmental stages. We implement Dcp in embryonic development and in an adult stem cell system. Further, we demonstrate that the predictability of cell states depends upon shared information between lineages in a biological system.

New data-driven gene representations using deep autoencoders at multiple omics to identify candidate disease genes and robust classifiers

ABSTRACT. Traditional knowledge-driven approaches for biomarker discovery within the field of systems medicine by us and others often utilized colocalization of disease genes in disease modules. Theresults, however, strongly rely on the quality of the available molecular interaction networks, which are known to be partially incomplete and affected by research biases. In this context, novel data-driven methodologies centred around deep artificial neural networks (DNNs) have begun to consolidate. Autoencoders (AEs) are a type of unsupervised DNN that reconstructs its input in its output, after reducing its dimensionality. Here, we hypothesized that the emergent encoding of AEs trained on huge transcriptomic, methylomic and genomic repositories, including hundreds of thousands of samples, could encompass complex non-linear relationships of biological relevance. To constrain the hyperparameters of the AEs, we also tested for co-localization patterns in protein-protein networks to ensure that our selected representation prioritized genes within functional modules. Next, we used them to discover candidate disease genes, and transfer learning using the latent variables as robust features for machine learning tasks for each of the omics.


Summary: Network biology is a fundamental branch of systems biology, which views, represents, and analyzes biological processes as networks of interacting components. Examples of these networks are protein-protein interaction networks, metabolic networks and gene regulatory networks. In this session, we will cover innovative large-scale network biology approaches involving single and multi’omics technologies that reveal novel interactions and regulatory mechanisms that control the phenotypes of normal and diseased cells.  We will showcase how applications of this concept in the field of precision medicine can be used to guide personalized therapeutic approaches. 

Location: Alexander
Proteomes in 3D: Structural proteome snapshots as a new functional ‘omics readout

ABSTRACT. Proteomics has been broadly applied to detect changes in protein levels in response to perturbations and derive information on altered pathways. Beyond protein expression changes, however, biological processes are also regulated by events such as intermolecular interactions, protein aggregation, chemical modification and protein conformational changes. These events do not affect protein levels and therefore escape detection in classical proteomic screens. I will present how a global, in situ analysis of protein structures can detect various types of protein functional alterations concomitantly. The approach, relying on the LiP-MS technique, monitors structural changes in thousands of proteins simultaneously in situ and across multiple conditions. Such a readout concomitantly captures enzyme activity changes, allosteric regulation, phosphorylation and protein complex formation and pinpoints regulated functional sites, thus substantially expanding the coverage of functional ‘omics screens and supporting the generation of mechanistic hypotheses. I will present applications of this approach the global identification of altered protein-protein interactions and to the detection of novel regulatory events in biochemical pathways.

Cancer cell fate decisions

ABSTRACT. Our lab is focused on understanding cancer cell fate decisions in response to drug treatment. I will present two recent studies combining single-cell analysis and mathematical modeling to understand mechanisms and consequences of these decisions. First, we investigated why some cancer cells choose to remain proliferative while others become senescent in non-lethal doses of chemotherapy. We analyzed cell signaling dynamics before, during, and days after drug treatment and identified a counter-intuitive role for early p21 dynamics in the proliferation/senescence cell-fate decision. Second, we investigated how some cancers can make use of dynamic state switching as a mechanism to increase population fitness. A persistent puzzle is why certain mutations, with no known function, are commonly found loss. In the context of AML, we found that loss of TET2 function alters the dynamics of transitions between differentiated and stem-like states. A conceptual mathematical model and experimental validation suggest that these altered cell-state dynamics benefit the cell population by slowing population decay during drug treatment and lowering the number of survivor cells needed to re-establish the initial population. These studies illustrate how cancer evolution can select for mutations that alter a cell’s phenotypic plasticity.

Disentangling Proliferation Signaling in Breast Cancer Subtypes based on an integrative modeling approach
PRESENTER: Svenja Kemmer

ABSTRACT. Breast cancer subtypes are characterized by the expression and activity of estrogen-, progesterone- and HER2-receptors and differ by their treatment as well as patient prognosis. Tumors of the HER2-subtype overexpress this receptor and are successfully targeted with anti-HER2 therapies. We wanted to know whether the HER2-receptor and the downstream signaling network act similarly also in the other subtypes and whether this network could potentially be a therapeutic target beyond the HER2-positive subtype. To this end, we quantitatively assessed the wiring of signaling events in the individual subtypes to unravel the characteristics of HER-signaling. Using an integrative modeling approach that combines mechanistic ODE modeling with regression modeling, we gained insights into the response mechanisms of breast cancer cells to different pharmacologic perturbations. Our multi-pathway model, capturing ERBB receptor signaling as well as downstream MAPK and PI3K pathways was calibrated on time-resolved data of the luminal breast cancer cell lines MCF7 and T47D across an array of four growth factors and five therapeutic drugs. The same model was then successfully extended to triple negative and HER2-positive breast cancer cell lines, requiring adjustments mostly for the respective receptor compositions within these cell lines. The additional relevance of cell-line-specific mutations in the MAPK and PI3K pathway components was identified via L 1 regularization. Based on this unified mathematical model, we effectively predicted the proliferation response of the cancer cells to combination drug treatments, validated by a proliferation array. Our model thus enabled us to predict the sensitivity of individual tumors to specific drugs only based on their receptor composition and occurring mutations. Finally, this study suggests that alterations in this network could render anti-HER therapies relevant beyond the HER2-positive subtype.

Combinatorial effects of ligands are predicted by a genome-scale model of signaling based on artificial neural networks
PRESENTER: Avlant Nilsson

ABSTRACT. Immune cells mount an appropriate response to threats by integrating signals from numerous ligands that bind their receptors. This depends on a network of thousands of signaling proteins that activate transcription factors (TFs), which trigger different gene programs. Simulations of this flow of information could help predict transcriptional phenotypes and the effects of mutations and drugs. However, it has been challenging to parametrize systems wide models using traditional methods.

To address this, we developed LEMBAS (Large-scale knowledge EMBedded Artificial Signaling networks). It represents these processes as a recurrent neural network with signaling molecules as hidden nodes and established protein-protein interactions as weights. Applied to synthetic data of ligand stimulated cells, LEMBAS rapidly parameterizes models that predict unseen test-data (Pearson correlation r=0.98) and the effects of knocking out signaling nodes (r=0.8).

To test LEMBAS performance on data from human macrophages, we measured their transcriptional response to more than 350 unique combinations of 20 ligands, up to 5 at a time that were chosen by an algorithm to maximize information content. Models trained on these data attained a good fit (r=0.8) that generalized well to unseen data (r=0.73 under cross validation). We systematically probed the model for interaction effects among all combinations of three ligands. To give an example, the model predicts a strong response to the combination of interferon gamma (IFNg) and a synthetic toll like receptor 2 (TLR2) agonist (PAM3CYK4) for expression of tumor necrosis factor (TNF) that is selectively suppressed by interleukin 10 (IL10). We extract the predicted causative signaling cascades using a combination of sensitivity analysis and simulated perturbations, for the previous example the model predicts a role for Ras-related C3 botulinum toxin substrate 1 (RAC1) and c-Jun N-terminal kinases (JNKs). This work demonstrates the feasibility and utility of genome-scale simulations of intracellular signaling. In future work we are looking to integrate neural network modules of signaling, metabolism, and gene regulation for a more complete mechanistic description of cellular activities.

Implementation of multiplexed single cell proteomics for the investigation of cellular heterogeneity in mammalian cell lines
PRESENTER: Craig Barry

ABSTRACT. Cellular diversity is a ubiquitous property of biological systems and is exemplified in tumour cell heterogeneity. Investigating cellular heterogeneity requires single cell resolution in order to identify population subtypes or temporal phenomena, such as cellular differentiation. Single cell resolution is afforded by well-established sc-RNASeq methods, where transcript abundance is frequently taken as a proxy for protein abundance. Realistically, protein abundance is the integral of mRNA translation rate which limits the application of mRNA as a proxy of protein abundance to constitutively expressed genes. Methods for high throughput single cell proteomics using LC-MS have gained recent traction with developments in peptide multiplexing reagents and high resolution Orbitrap instruments.

LC-MSn throughput of single cells is afforded by sample multiplexing where Orbitrap instruments facilitate the resolution of up to 18 TMTpro tags. Here, we present an implementation of a multiplexed single cell proteomics (scMS) workflow to semi-quantitatively investigate the diversity in intracellular protein abundances of oxygen-deprived HEK293 cells. From this data, we were able identify anoxia-driven heterogeneity in single cell proteomes. Meaningful protein profiles can be drawn from this data which corroborate literature findings from bulk-sampling methods. Here, we outline our implementation of an scMS workflow and discuss conclusions drawn from its utility in investigating anoxia-driven stress in industrially relevant HEK293 cells.

DecryptM: decrypting drug actions and protein modification by dose- and time-resolved proteomics
PRESENTER: Matthew The

ABSTRACT. Most molecularly targeted cancer drugs modulate activity of enzymes, such as kinases and histone deacetylases, that post-translationally modify proteins. Owing to the complexity of the cellular systems they work on and the fact that such drugs usually have more than one target, the drug mechanism of action is often not well understood. Here, we present decryptM, a method to quantitatively characterize the proteome-wide changes on the level of post-translational modifications (PTM) for cancer drugs in live cells in a dose- and time-dependent manner. By encoding each dose or time point of a drug treatment by tandem mass tags, changes in PTM-peptide abundance as a result of the drug perturbation are quantified at high precision.

31 drugs representing 6 drug classes in 14 human cell lines were profiled with decryptM, demonstrating its wide applicability. This resulted in 1.8 million drug-response curves, each curve representing the modulation of a PTM-peptide (phosphorylation, acetylation, ubiquitinylation) by a specific drug in a particular cellular system. The high precision allows the identification of 10s or 100s of regulated PTM-peptides per experiment in a background of 20,000 unregulated PTM-peptides. Applying stringent criteria for calling regulation, 55,000 (out of 136,000 identified) unique PTM-peptides were regulated in ≥1 experiment. Nevertheless, as expected, most were regulated in only one or a few experiments, reflecting common and drug-specific effects. This data has been integrated into ProteomicsDB, where it can be interactively explored and cross-referenced to UniProt and PhosphoSitePlus. PTM-peptides can be filtered by up- or down-regulation, EC50 range, curve quality and sequence motifs.

A comparative analysis of 10 kinase inhibitors applied to the A549 lung cancer cell line showed distinct clusters of phosphopeptides regulated by mTOR/PI3K versus those regulated by MAPK, in accordance with the known drug targets. The dose-dependent dimension allowed deconvolution of pathways for drugs with multiple targets at different potencies. For example, for the mTOR/PI3K inhibitor Dactolisib, a bimodal distribution of EC50s was observed. Phosphopeptides regulated at high potency could be attributed to the mTOR/PI3K pathway, whereas those regulated at low potency could be attributed to off-target binding of Dactolisib to ATR/ATM/PRKDC, confirmed by an enrichment of the SQ/TQ motif. Based on EC50 and motif information, “orphaned”, i.e. previously unannotated, phosphopeptides can be putatively linked to kinases and pathways through guilt-by-association. Additionally, the importance of the specific molecular wiring of different cancer cells was illustrated by marked differences in phosphopeptide regulation for the EGFR/HER2 inhibitor Lapatinib in three breast cancer cell lines with different expression levels of EGFR/HER2/HER3. These findings highlight how decryptM improves the understanding of drug mechanism of actions and its importance as a novel tool in phosphosite annotation and drug discovery.

Computational modeling of DLBCL predicts response to BH3-mimetics
PRESENTER: Simon Mitchell

ABSTRACT. In healthy cells, pro- and anti-apoptotic BCL2 family and BH3-only proteins are expressed in a delicate equilibrium. In contrast, this homeostasis is frequently perturbed in cancer cells due to the overexpression of anti-apoptotic BCL2 proteins. Variability in the expression and sequestration of these proteins in Diffuse Large B cell Lymphoma (DLBCL) likely contributes to variability in response to BH3-mimetics, which has prevented their widespread clinical adoption. While several highly specific BH3-mimetics have been developed, successful deployment of BH3-mimetics in DLBCL requires reliable predictions of which lymphoma cells will respond. Here we show that a computational systems biology approach enables accurate prediction of the sensitivity of DLBCL cells to BH3-mimetics. We found that fractional killing of DLBCL, and the presence of treatment-resistant cells within a cell population, can be explained by cell-to-cell variability in the molecular abundances of signaling proteins. Importantly, by combining protein interaction data with a knowledge of genetic lesions in DLBCL cells we could accurately predict in silico the sensitivity to BH3-mimetics in vitro. Furthermore, the library of virtual DLBCL cells we created was able to predict novel synergistic combinations of BH3-mimetics, which were then experimentally validated. These results show that when computational systems biology models of apoptotic signaling are constrained by experimental data, they can facilitate the rational assignment of efficacious targeted inhibitors in B cell malignancies paving the way for development of more personalised approaches to treatment.

Data-driven mathematical modeling of human skin aging and its application for natural compound screening
PRESENTER: Masatoshi Haga

ABSTRACT. Internal and external environmental factors cause skin aging and its functional decline reduces the barrier function, leading to skin dysfunction. Therefore, to maintain skin homeostasis, it is desirable to reduce the rate of skin aging. In this study, we identified genes whose expression was altered both in vivo and in vitro over time by examining two independent time-course public RNA-seq datasets; in vivo data of primary human skin fibroblasts obtained from a wide range of ages, representing external factors (Fleischer et al., Genome Biol. (2018)) and in vitro data from human foreskin fibroblasts (HFF-1) cultured for long-term cell passage, representing internal factors (Marthandan et al., PLoS One. (2016)). Pathway analysis of the genes revealed TGFβ signal as skin aging-related pathways. Thrombospondin-1 (THBS1) and Fibromodulin (FMOD) were selected as key regulators that show a high correlation between the age of in vivo samples and the population doubling level of in vitro. Validation revealed that THBS1 increases and FMOD decreases with aging in human dermal tissue and cellular senescence induced HFF-1. We also found that THBS1 induces SA-β-gal, a senescence marker, while FMOD suppresses the induced SA-β-gal level. Thus, we identified a novel regulatory network of skin aging controlled by THBS1 and FMOD.

Based on the experimental findings, we developed an ordinary differential equation model that reproduces the dynamics of the gene expression to comprehensively and quantitatively understand the global regulatory mechanism associated with skin aging. Data fitting of temporal changes in HFF-1 protein expression with and without TGFβ1 stimulation showed that TGFβ1, which increases with skin aging, sustainably activates the SMADs complex, increasing THBS1. FMOD, conversely, was found to be suppressed via persistently activated Akt induced by TGFβ1. The model showed that THBS1 expression is sensitive to changes in TGFβ1 while FMOD expression is associated with a robust regulatory mechanism. These results suggest that THBS1 is a promising drug target for skin aging, and the sensitivity analysis and siRNA-based in vitro validation suggest that SMAD4 is the target for THBS1 regulation.

A natural compound library screening found that retinoic acid (RA) is effective in suppressing THBS1 in HFF-1. We have further found that the RA signaling pathway is involved in the transcriptional repression of THBS1 in the nucleus. This research has clarified one of the mechanisms of action of RA, which is used as an anti-wrinkle active ingredient.

Ionizing radiation triggers differential signaling dynamics of p-p53 and p-ERK1/2 in sensitive and resistant HNSCC cell clones

ABSTRACT. Radiotherapy is a main treatment strategy for head and neck cancer, and radioresistance of tumors remains a major problem as it is not well understood. To get a better understanding of how therapy resistance emerges through differential signalling activity, we performed time-course mass cytometry (CyTOF) analyses of irradiated (6 Gy) and non-irradiated head-and-neck squamous carcinoma (HNSCC) cells. As a model system of intra-tumoral heterogeneity, we made use of the heterogeneous Cal33 cell line (parental), a radiosensitive and a radioresistant subclone, and examined potential differences in signaling dynamics that could explain the divergent responses to irradiation. Cell cycle classification based on IdU, pH3, Geminin, and Cyclin B1 indicated a delay in cell cycle progression after irradiation, mainly characterized by an accumulation of cells in S-phase and G2-phase, 8 hours and 12 hours after irradiation, respectively. However, the cell cycle dynamics were largely comparable for the three cell lines studied, suggesting that their differential radiation sensitivity is not explicitly linked to distinct cell-cycle dynamics. Interestingly, we observed differential dynamics of p-p53 [S15] phosphorylation as characterised by: 1) a first pulse 12h after irradiation in the parental Cal33 and the radio-resistant subclone in cells with high p-H2AX [S139] signal, and 2) a second pulse 48h after irradiation, which was stronger in the radio-sensitive subclone. The cells exhibiting this second p-p53 pulse at 48h showed intermediate levels of phosphorylated p-H2AX [S139], suggesting that these cells did not completely repair the radiation-induced DNA damage by that time. Additionally, these cells showed high levels of p-ERK1/2 [T202/Y204]. We observed that following the 48h pulse in p-p53 and p-ERK1/2, the levels of cleaved Caspase-3 and pNF-κB[S536] increased in the radio-sensitive subclone. Altogether, these results allow us to hypothesize that the 12h p-p53 pulse induces DNA repair in the resistant subclone, while the second 48h p-p53 pulse accompanied by a pERK1/2 pulse occurring in cells with residual DNA damage leads to Caspase-3-mediated cell death in the sensitive subclone. In order to evaluate this, single-cell time-course perturbation experiment will be performed with pharmacological inhibition of Chk1 and/or MEK. Ideally, we’ll be able to further dissect the underlying mechanisms of radiation resistance and find therapeutic vulnerabilities that will allow target radiosensitization of the resistant subclone.

Being noisy in a crowd: differential selective pressure on gene expression noise in model gene regulatory networks
PRESENTER: Nataša Puzović

ABSTRACT. Expression noise, the variability of the amount of gene product among isogenic cells grown in identical conditions, originates from the inherent stochasticity of diffusion and binding of the molecular players involved in transcription and translation. It has been shown that expression noise is an evolvable trait and that central genes exhibit less noise than peripheral genes in gene networks. A possible explanation for this pattern is increased selective pressure on central genes since they propagate their noise to downstream targets, leading to noise amplification. To test this hypothesis, we developed a new gene regulatory network model with inheritable stochastic gene expression and simulated the evolution of gene-specific expression noise under constraint at the network level. Stabilizing selection was imposed on the expression level of all genes in the network and rounds of mutation, selection, replication and recombination were performed. We observed that local network features affect both the probability to respond to selection, and the strength of the selective pressure acting on individual genes. In particular, the reduction of gene-specific expression noise as a response to stabilizing selection on the mean expression is higher in genes with higher centrality metrics. Furthermore, global topological structures such as network diameter, centralization and average degree affect the average expression variance and average selective pressure acting on constituent genes. Our results demonstrate that selection at the network level leads to differential selective pressure at the gene level, and local and global network characteristics are an essential component of gene-specific expression noise evolution.

Multi-Omics Visible Drug Activity prediction, interpreting the biological processes underlying drug sensitivity
PRESENTER: Luigi Ferraro

ABSTRACT. Cancer is a genetic disease resulting from the accumulation of genomics alterations in living cells. Large scale genomics studies have been instrumental to understand the recurrent somatic genetic alterations within a cell and for the characterization of their functional effects in transformed cells. One of the main challenging questions in this field is how to exploit all these molecular information to identify therapeutic targets and to develop personalized therapies, understanding which molecular features influencing sensitivity to drugs. Machine learning models are able to exploit multi-modal screening datasets to develop predictive algorithms useful to associate omics features with response. The basic approach is to use the data from these screenings to train a machine learning “black box" model that predicts the 50% inhibitory concentration (IC50) of a drug from the multi-omics profile of a cell line, without the possibility to interpret the biological mechanisms underlying predicted outcomes and the exploitation of the unbalanced nature of the data. In order to address these limitations we propose a Multi-Omics Visible Drug Activity prediction (MOViDA) neural network model that extends the visible network approach incorporating functional information in terms of pathway activity from gene expression and copy number data into a neural network. We have identified which pathways and drug features are good predictors for high sensitivity of a cell line to a drug. This explanation is the basis to hypothesize drug combinations, cell editing and properties of new drugs aimed at the identification of cell vulnerabilities.

Molecular regulators of catecholamine response in human pulmonary microvascular endothelial cells

ABSTRACT. Endothelial dysfunction is a systemic disease state of endothelial cells (ECs) occurring in a broad variety of pathologies ranging from atherosclerosis to cancer and more recently COVID-19. Sustained high levels of catecholamines associate with endothelial dysfunction and vascular permeability. Indeed, circulating adrenaline levels predict mortality in trauma patients. Yet, the molecular mechanisms that drive ECs into a pathological state in trauma patients upon elevated catecholamine levels are not well characterized. 

Here we identified the transcriptomic, metabolic and lipidomic responses to high levels of catecholamines in human pulmonary microvascular endothelial cells (HPMECs). We treated HPMECs with a wide range of equimolar concentrations of adrenaline and noradrenaline (0.5, 5 and 50 μM) and sampled cultures for molecular profiling at 4 hours and 24 hours after exposure. We identified a total set of 308 differentially expressed genes upon catecholamine exposure across all conditions. In particular, we found GRAMD1B, AREG, PDK4 and CXCR4 as the strongest transcriptional responders to treatment. Functional enrichment of responding genes distribute across three major axes: signalling, metabolism and proliferation/differentiation. Representative enriched functions within upregulated genes include cell proliferation, protein kinase B signalling and steroid metabolism. Within the set of repressed genes, we found that enrichment in characteristic functions like inflammatory response, response to interleukin-1 and lipopolysaccharide signalling. These identified functions recapitulate well the known response of ECs to catecholamines and point to novel regulation in metabolic functions. Next, towards the identification of the main transcriptional regulators of the differential response, we used a regulon enrichment approach using previously identified regulons in DoRothEA. Furthermore, given the importance of metabolism to endothelial biology, we carried out lipidome quantification of cell cultures under conditions of interest.  Finally, we plan to integrate transcriptome and lipidome profiles by the generation of constrained genome-scale metabolic models to predict differential metabolic fluxes under distinct conditions. 

Overall, this integrative analysis collectively builds a whole-scale picture of how signalling pathways downstream catecholamines receptors in ECs trigger a transcriptional state transition that consequently results in metabolic changes that drive ECs out of homeostasis. A precise understanding and prediction of ECs molecular states are fundamental to the discovery of more efficient clinical interventions in trauma patients.

Analysis of multivariate longitudinal metabolomics data from meal challenges using RM-ASCA+
PRESENTER: Balazs Erdos

ABSTRACT. Meal challenges are increasingly used to study metabolic perturbations in the field of precision nutrition. Time-series of high-dimensional metabolomics data supports a systematic view into metabolic resilience. However, current methodology allows limited use of such data due to a lack of tools to deal with temporal dynamics. Therefore, analysis of this type of data typically concludes on the temporal and cross-species relationships independently. Comprehensive analysis must take into account the interrelatedness of the metabolites across species within an individual as well as across time. Here, we extend the RM-ASCA+ methodology to allow quantification of temporal dynamics observed in frequently sampled time-courses while accounting for the multivariate property and the experimental design in the data. We demonstrate the extended RM-ASCA+ methodology on experimental data containing time-series of metabolomics following meal challenge tests.

FISHing for Correlation

ABSTRACT. he eukaryotic cell division cycle is quite well investigated, the main players and most interactions are known. Most important components of the protein-protein interaction network are a set of phase-specific cyclins interacting with Cdks, inhibitors and transcription factors. The expression of the targets is well investigated on the population level, but neither on the single cell nor single molecule level. We employ multiplexed FISH labeling in S.cerevisiae to obtain time-resolved or cell cycle position-assigned transcript numbers in single cells. The mRNAs are labeled with three different dyes which enables to image three different mRNA species the same time. Based on this approach, we determine the correlation of the expression and the mutual information between transcripts of different species in single cells.

Personalized medicine for Multiple Sclerosis treatment strategies at the individual and population level
PRESENTER: Roberta Bursi

ABSTRACT. The landscape of treatment options for Relapsing-remitting Multiple Sclerosis patients has been completely transformed over the last few decades. Over a dozen disease-modifying therapies are now available, targeting several distinct pathways and yielding varying degrees of efficacy and – inevitably – of side effects. Currently, a lack of understanding of the specific benefits, disadvantages and optimal patient profiles for each treatment has as a consequence that the available therapies are not being optimally leveraged to improve patients’ wellbeing. Thus, translating the increased availability of therapeutic options into treatment strategies and policies that benefit patients’ health remain a major challenge. We have developed a cloud-based MS simulator (MS TreatSim – https://mstreat.insiliconeuro.com) to support understanding of the efficacy and safety profiles of various Relapsing-remitting Multiple Sclerosis therapies at both the personal and population level. The simulator builds on an agent-based model that combines four key elements: (1) immune system architecture and dynamics, (2) Relapsing-remitting Multiple Sclerosis etiology (3) four commonly prescribed first and second line treatments for Multiple Sclerosis, and (4) immune system heterogeneity. The immune system forms the basis of the model, incorporating fundamental processes and cell types of both the innate and adaptive immune systems. Multiple Sclerosis etiology was incorporated by extending the model with an explicit white matter compartment, in which oligodendrocytes are attacked and destroyed by the autoresponsive immune system during active disease. The four treatment options – interferon β-1a, teriflunomide, natalizumab and ocrelizumab - are each incorporated through their pharmacokinetic characteristics and their mechanism of action. Finally, heterogeneous virtual Relapsing-remitting Multiple Sclerosis patients are created by mapping demographic and clinical parameters (e.g., age at disease onset, lesion load, immune variability) to underlying mechanistic model parameters, and subsequently selecting the patients of interest with the aid of disease history characteristics. The simulator includes both individual level and population level workflows, allowing user-friendly evaluation of the efficacy of the integrated treatment options. In addition to demonstrating treatment effects on clinical outcomes such as relapse rates, MS TreatSim also allows investigation of the underlying effects on immune system variables. MS TreatSim can thus be a valuable tool to investigate the variability in treatment response, and to guide individual and policy-level treatment guidance. With further clinical validation and advanced options for personalization, in the future MS TreatSim may be applied for personalized treatment planning.

Comparative analysis of molecular fingerprints in prediction of drug combination effects
PRESENTER: Bulat Zagidullin

ABSTRACT. Application of machine and deep learning methods in drug discovery and cancer research has gained a considerable amount of attention in the past years. As the field grows, it becomes crucial to systematically evaluate the performance of novel computational solutions in relation to established techniques. To this end, we compare rule-based and data-driven molecular fingerprints in prediction of drug combination sensitivity and drug synergy scores using standardized results of 14 HTS studies, comprising 64 200 unique combinations of 4 153 drug-like small molecules screened in 112 cancer cell lines. We also evaluate the clustering performance of drug representations and quantify their similarity by adapting the Centered Kernel Alignment metric. Our work demonstrates that to identify an optimal drug representation type, it is necessary to take into account quantitative metrics together with qualitative considerations, such as model interpretability and robustness requirements that vary between and throughout stages of a drug development project.

Proteomics profiling revealed a combination of protein biomarkers for predicting disease trajectory of multiple sclerosis
PRESENTER: Julia Åkesson

ABSTRACT. There is a high demand for clinically useful and easily detectable protein biomarkers for predicting disease trajectory and response to treatment for multiple sclerosis (MS) patients. We previously used mRNAs as proxies combined with protein-protein interaction networks to identify MS modules whose state was measured by a few secreted proteins and used to predict the disease trajectory. Here, for the first time, we have used the highly sensitive proximity extension assay (OLINK Explore) to measure 1,463 proteins in both plasma and cerebrospinal fluid (CSF) from 143 patients at the early stages of the disease and 43 healthy controls. The patients were divided into a discovery cohort from Linköping University hospital (n=92) and a replication cohort from Karolinska University Hospital (n=51). Consistent with previous studies, differential protein levels are clearly detectable in CSF, but these changes are mostly not transferred to plasma on individual protein level. Using clinical information about patients' progressing disease activity, we identified proteins whose baseline expression in CSF could predict two different measures of disease progression: NEDA-3 and nARMSS. In line with previous studies we found NFL to be the only protein needed to predict if a patient would show signs of disease activity (NEDA-3) 2 years after sampling, with a replication AUC of 0.77 (p = 0.02). The severity of disability worsening (nARMSS) proved to be more complex with the best prediction achieved (replication AUC = 0.81, p = 2*10-5) using NFL and 10 additional proteins. Importantly, this model could also predict the disability worsening from plasma with an AUC of 0.66 (p = 2*10-3). The suggested models could be used to distinguish patients with a promising disease course, not needing as effective treatment, reducing costs and side-effects. Lastly, by considering the network connectivity we identified a proteomics MS module in plasma which significantly overlapped with differential proteins in CSF which provides a functional context of the differentially expressed proteins.

Data-driven modeling of mitochondrial metabolism

ABSTRACT. N-acetylaspartate (NAA) is the second most abundant metabolite in the brain, which is linked to Canavan disease, gestational diabetes and cancer. NAA has been shown to affect mitochondrial metabolism.

Our aim is to improve understanding of the functional role of NAA, in particular, in mitochondrial metabolism. To do that, we are developing a data-driven mathematical mechanistic model describing mitochondrial metabolism.

This large-scale compartmental model allows us to combine time-resolved stable isotope labelling data with growth medium metabolite level measurements and to analyze dynamics of cellular metabolism. We use a rule-based modelling approach and thermodynamics-based parameterization. A computational pipeline for model simulation and parameter estimation have been established that facilitates the iterative model development.

We will use this model to check the plausibility of different hypotheses about the effect of NAA on mitochondrial metabolism.

11:00-11:25Coffee Break
11:25-12:00 Session K10: CLINICAL KEYNOTE: Leif Erik Sander

Abstract: Growing population density and -mobility, urbanization, decreasing biodiversity and changing ecosystems substantially contribute to an elevated risk of epidemic or pandemic outbreaks. COVID-19, and more recently the global MPX (monkeypox) outbreak, have demonstrated the high pandemic potential of zoonotic pathogens. Vaccine induced population immunity plays a major role in preventing and limiting pandemics and their detrimental medical and socioeconomic impact. Here I will review recent advances in developing protective vaccines, their mechanisms of action and future directions for transmission reducing vaccines for respiratory pathogens.

Location: Alexander
Preventing & limiting pandemics through vaccination: what have we learned from COVID-19 and MPX?

ABSTRACT. Growing population density and -mobility, urbanization, decreasing biodiversity and changing ecosystems substantially contribute to an elevated risk of epidemic or pandemic outbreaks. COVID-19, and more recently the global MPX (monkeypox) outbreak, have demonstrated the high pandemic potential of zoonotic pathogens. Vaccine induced population immunity plays a major role in preventing and limiting pandemics and their detrimental medical and socioeconomic impact. Here I will review recent advances in developing protective vaccines, their mechanisms of action and future directions for transmission reducing vaccines for respiratory pathogens.

12:00-12:45 Session K11: KEYNOTE XI: Eske Willerslev

Abstract: The field of ancient genomics has transformed our understanding of human history by uncovering how we obtained our present-day genetic diversity through past migrations and admixture events. It has also changed our understanding of ecosystem change in space and time by providing plant and animal ancient DNA directly from past environments and pathogen evolution by reconstructing the genomes of infectious diseases of the past . One future direction of the field is to understand the emergence of differences in disease risk across human populations. In this talk, I will cover these topics by providing research examples from Centre for GeoGenetics.

Location: Alexander
What we can learn from ancient genomics? Human history, ecosystem change, infectious disease, and disease susceptibility. (EXCEPTION - ZOOM)

ABSTRACT. The field of ancient genomics has transformed our understanding of human history by uncovering how we obtained our present-day genetic diversity through past migrations and admixture events. It has also changed our understanding of ecosystem change in space and time by providing plant and animal ancient DNA directly from past environments and pathogen evolution by reconstructing the genomes of infectious diseases of the past . One future direction of the field is to understand the emergence of differences in disease risk across human populations. In this talk, I will cover these topics by providing research examples from Centre for GeoGenetics.