ASI2025: 3RD WORKSHOP ON ADVANCES IN SYSTEMS IMMUNOLOGY
PROGRAM FOR SUNDAY, OCTOBER 12TH

View: session overviewtalk overview

09:00-10:00 Session 1: Keynote
09:00
Understanding nano-bio interactions to enable next-generation vaccines and drug delivery systems

ABSTRACT. In this presentation, we will explore nano-bio interactions of engineered nanoparticles with cellular and sub-cellular spatiotemporal resolutions. Using quantitative analytical techniques, such as elemental analysis and super-resolution microscopy, we determined the delivery of administered nanoparticles to individual cell populations. Our research enables the engineering of safer and more effective inorganic and organic nanomedicines for diagnosis, therapy, and vaccination applications.

10:30-12:30 Session 2: Session I
10:30
Reconstructing the co-evolution of tumors and their microenvironment from multiomic data

ABSTRACT. This talk will describe progress on methods development for reconstructing the process of somatic evolution in cancers and the concurrent co-evolution of the microenvironment, with specific focus on immune involvement. The work will specifically examine the problem from the perspective of defining computational constrained optimization models to accommodate an ever-growing complexity of data modalities and biological mechanisms to make use of our most current understanding of the biology and continuing advances in biotechnology for profiling somatic heterogeneity and evolution. It will describe progress on adapting these approaches to several complementary directions in multimodal data integration. It will also consider how such methods can contribute to an increasingly urgent need for better methods for experimental design in multimodal spaces. It will conclude with a consideration of unmet needs and future prospects.

10:50
Efficient Encoding of TCR-pMHC Structure for Improved Structure-Based Binding Prediction

ABSTRACT. The complex interaction between the T-cell receptor (TCR) and the peptide-major histocompatibility complex (pMHC), required for T-cell-mediated adaptive immune response, poses a challenging task for computational approaches to predict the activity of novel peptides. To effectively leverage the discriminatory signal hidden in the TCR-pMHC complex, the research efforts – mostly dominated by sequence-based TCR-pMHC activity classification – have recently seen the structure-based approaches, where predicted TCR-pMHC complex structures are processed to classify binding vs. non-binding samples. One such approach involves learning the classification rule from the TCR-pMHC representation encoded by a graph neural network (GNN). However, the performance on peptides unseen in training the GNN is slightly better than a random classifier. This talk covers our effort to improve the performance further by an efficient encoding strategy for the TCR-pMHC complex structures. Specifically, we propose a dual encoder approach, where two GNN encoders project the graph of complex structures and its subgraph by masking the peptide residues, respectively, into the corresponding residue-level embedding representations. In addition to the binary classification loss, our prediction network with the dual graph encoders is trained with an alignment loss term that enforces the learned embedding representations of the complex graph and its masked subgraph to be similar or different depending on the training sample's class label (binding or not). Through evaluations with two different graph encoder architectures: equivariant graph neural network (EGNN), and edge-variable-transformer convolution~(EVTConv), we demonstrate that our dual-encoder-based approach often improves prediction performance (especially for EGNN) over the corresponding single-encoder-based implementation, where only the TCR-pMHC interface graph is considered without masking.

11:10
Post-hoc Explanation for TCR-pMHC Prediction

ABSTRACT. T cells recognize peptide-MHC (pMHC) complexes through T Cell Receptors (TCRs), a process central to adaptive immunity and immunotherapy. Transformer models such as TULIP predict TCR–pMHC binding with high accuracy, but their black-box nature limits mechanistic insight. We present Quantifying Cross-Attention Interaction (QCAI), a post-hoc method for interpreting the inter-chains interaction modeled by decoder cross-attention in TCR–pMHC models. To evaluate interpretability, we introduce TCR-XAI, a benchmark of 274 solved TCR–pMHC structures that link residue-level distances with model explanations. Experiments show QCAI delivers state-of-the-art interpretability while maintaining predictive performance, providing a new framework and benchmark for transparent modeling of T cell responses.

11:30
An Integrated Computational Antigen Discovery Pipeline with Hierarchical Filtering for Emerging Viral Variants

ABSTRACT. Emerging and evolving viral diseases, such as SARS-CoV-2, continue to pose significant global health challenges, underscoring the urgent need for rapid and scalable antigen discovery pipelines. This work presents a computational pipeline that integrates diverse computational tools and machine learning (ML) models to accelerate the identification and optimization of antigen candidates. The pipeline employs efficient filtering and consensus-based strategies to highlight epitopes with high therapeutic potential. We demonstrate its utility by significantly narrowing the antigen search space for Rift Valley fever virus (RVFV) and Mayaro virus (MAYV), and by effectively identifying conserved neutralizing epitopes in SARS-CoV-2. Our proposed computational antigen pipeline offers a powerful framework for expediting the development of future vaccines and therapeutics in response to emerging pathogens.

11:50
scMultiNODE: Integrative and Scalable Framework for Multi-Modal Temporal Single-Cell Data

ABSTRACT. Profiling single-cell genomics across multiple developmental stages and measurements (or modalities) reveals dynamics, but integrative multi-modal analysis is limited by the difficulty of jointly measuring multiple modalities in the same cells. We introduce scMultiNODE, an unsupervised deep learning model that integrates gene expression and chromatin accessibility measurements in developing single cells, while preserving cell type variations and cellular dynamics when aligning a large number of cells across different measurements. On six real-world datasets, scMultiNODE outperforms existing methods in integrating temporally profiled multi-modal single-cell data, while its joint latent space enables analyses such as complex trajectory inference and cross-modal label transfer. The preprint, data, and code supporting this work are publicly available at https://github.com/rsinghlab/scMultiNODE.

12:10
GraphMatch: Knowledge Graphs for Allogeneic Stem Cell Matching

ABSTRACT. Allogeneic bone marrow and umbilical cord stem cell transplants often provide the best hope for curing many patients with leukemia, lymphoma, and over 70 other diseases. Matching patients to unrelated donors requires flexible and timely searches as matching criteria change. Matching systems should scale to accommodate the diversity in patient and donor typing resolution, as well as the growing number of donors.

We developed GraphMatch, a scalable graph database solution for storing and searching variable-resolution HLA genotype markers. As a test set, we expanded the World Marrow Donor Association (WMDA) validation set based on the IPD-IMGT/HLA Database to create a synthetic production data set comprising 1 million patients and 10 million donors. Single-patient identity search times range from 218.5 milliseconds per patient for 2 million donors to 1201.4 milliseconds per patient for 10 million donors. Search performance timing remained linear with the number of edges, even at a production scale.

We anticipate practical extensions to the GraphMatch platform, supporting transactionally coherent maintenance, horizontally scalable performance, and suggesting schema additions to accommodate additional search criteria. Ultimately, GraphMatch demonstrates the utility of graph databases as a flexible platform for scalable matching solutions.

12:30-13:30 Session 3: Poster session
Location: TBD
Cancer Immunology: Mechanisms, Clinical Impact, and Future Directions

ABSTRACT. Background Cancer immunology investigates the interactions between malignant cells and the host immune system. Decades of evidence show that immune surveillance controls early tumor development, while immune evasion drives progression to clinically significant disease. The cancer immunoediting paradigm describes this duality through three phases: elimination, equilibrium, and escape. Understanding these has enabled the development of immunotherapies now standard in multiple malignancies. Immune Surveillance and Tumor Escape • Elimination: Innate immune cells (natural killer cells, macrophages, dendritic cells) detect stress ligands or tumor antigens and initiate cytotoxic responses, which adaptive immunity amplifies via tumor-specific CD8⁺ T cells and CD4⁺ helper cells. • Equilibrium: Persistent immune pressure selects tumor variants with reduced immunogenicity, allowing microscopic persistence. • Escape: Tumor clones develop immune evasion strategies including: • Down-regulation of MHC class I and antigen processing machinery • Up-regulation of inhibitory checkpoint ligands (PD-L1, Galectin-9) to inactivate effector T cells • Secretion of immunosuppressive factors (TGF-β, IL-10) and recruitment of regulatory T cells (Tregs) and myeloid-derived suppressor cells (MDSCs) Tumor Microenvironment (TME) The TME consists of tumor cells, stromal fibroblasts, endothelial cells, immune infiltrates, extracellular matrix, and soluble mediators. Key aspects include: • Hypoxia and metabolic alterations (e.g., lactate accumulation) impairing T-cell function • Polarization of tumor-associated macrophages (TAMs) to an M2-like, pro-tumoral phenotype • Chronic inflammation sustaining tumor growth and DNA damage Advances in Cancer Immunotherapy 1. Immune Checkpoint Inhibitors (ICIs): Antibodies targeting CTLA-4, PD-1, or PD-L1 restore T cell activity and show durable responses in melanoma, NSCLC, renal cell carcinoma, etc. 2. Adoptive Cell Therapy (ACT): • Expansion and reinfusion of tumor-infiltrating lymphocytes (TILs) • Engineered CAR-T cells targeting antigens (e.g., CD19, BCMA) achieving high remission rates in hematologic cancers 3. Cancer Vaccines and Neoantigen Approaches: Personalized mRNA or peptide vaccines induce new T-cell responses. 4. Oncolytic Viruses: Selectively replicate in tumors causing immunogenic cell death and antigen release. 5. Combinatorial Strategies: Combining ICIs with radiotherapy, chemotherapy, or targeted agents to enhance efficacy and overcome resistance. Biomarkers and Precision Medicine Predicting immunotherapy responders is complex; promising biomarkers include: • PD-L1 expression levels • Tumor mutational burden and neoantigen load • T-cell receptor clonality and interferon-γ gene signatures (indicating “hot” tumors) • Liquid biopsies for circulating tumor DNA and immune markers Current Challenges • Primary and acquired resistance via alternative checkpoints (TIM-3, LAG-3) • Immune-related adverse events (irAEs), requiring careful management • Tumor heterogeneity and antigen loss complicating targeted approaches • Cost and access barriers, especially in low- and middle-income countries Future Directions Research focuses on: • Next-generation cell therapies (armored CAR-T, CAR-NK) for solid tumors • Bispecific T-cell engagers enabling MHC-independent tumor recognition • Microbiome modulation to improve systemic immunity and therapy response • AI and multi-omics integration for personalized tumor-immune profiling • Preventive immunology for premalignant lesion targeting Conclusion Cancer immunology has transformed oncology by harnessing the immune system therapeutically. Despite challenges in resistance, toxicity, and access, ongoing translational research and multidisciplinary collaboration promise improved durable responses and possible cures.

Deep learning for immune network inference from single-cell omics data and modeling tumor-immune system interaction

ABSTRACT. Systems immunology is increasingly empowered by single-cell omics technologies, enabling detailed characterization of cellular complexity within the tumor microenvironment. However, integrating and interpreting these large multi-omics datasets remains a major challenge to understanding immune networks and their interactions with tumor cells. We propose a deep learning framework combining convolutional and recurrent architectures to analyze and integrate single-cell transcriptomic, proteomic, and epigenomic data. This model aims to infer intra- and intercellular regulatory immune networks and characterize the spatio-temporal dynamics of tumor-immune system interactions.

Our approach includes advanced preprocessing steps to correct biases and normalize heterogeneous data. By integrating multi-omics information, we seek to extract signatures relevant to immune cell differentiation and states and discover potential molecular targets for immunotherapy. The interpretable nature of the model facilitates identification of key mechanisms and may guide optimized immunotherapy and vaccine design.

This work aligns with current advances in computational immunology, combining advanced bioinformatics and artificial intelligence to address complex immunological questions. Preliminary results demonstrate the model’s ability to capture functional relationships within immune cell populations related to cancer.

Orthogonal Concept Representation for Traceable Deductive Reasoning: A Neuro-Symbolic Framework for Continual Learning with Applications in Oncology

ABSTRACT. This work introduces a novel neuro-symbolic framework to address fundamental limitations in AI, namely shallow understanding, catastrophic forgetting, and a lack of traceable reasoning. We propose that concepts can be represented as Zernike Concept Vectors (ZCVs) in an orthogonal basis, enabling a factored, compositional encoding of knowledge. This representation is embedded in a hybrid architecture where a neural front-end maps data to ZCVs and a symbolic engine performs deductive reasoning via geometric operations. To enable continual learning, we develop the Orthogonal Concept Update (OCU) algorithm, which mitigates catastrophic forgetting. The framework’s utility is demonstrated through a case study in oncology, deconstructing the unmet needs in breast cancer to generate actionable, explainable insights.

LSTM Prediction of Next-Gene Mutations Using Graph-Derived Orders in Colon Adenocarcinoma

ABSTRACT. Accurate prediction of tumor evolution and the sequential appearance of somatic mutations can improve our understanding of cancer progression. We trained a recurrent neural network with Long Short-Term Memory (LSTM) units to predict, at each step, whether the next gene in a specified order is mutated, using colon adenocarcinoma samples. This global model achieved high overall accuracy but low true positive rates (TPR) due to strong class imbalance. We then applied hierarchical clustering by mutational burden and trained and evaluated cluster-specific LSTM models, which improved TPR and positive predictive value (PPV) relative to the global baseline. Building on this, we constructed directed mutation-order graphs from co-occurrence statistics, removed cycles, and extracted topological orders and longest paths as alternative mutation sequences. Using these graph-derived orders as input sequences for the LSTM, rather than initial global order, further improved recall. Both the cluster-specific modeling and the graph-derived ordering contribute measurable gains to predicting whether the next gene in the sequence is mutated, though class imbalance remains a key limitation.

Evaluation of Nuclei Isolation Techniques for Single-Nucleus RNA Sequencing in Tissue and Cell Suspensions

ABSTRACT. The nucleus, the largest organelle in eukaryotic cells, encloses DNA within a double-membrane envelope that organizes chromatin for compaction and gene regulation, thereby shaping both epigenetic and transcriptomic profiles. Single-nucleus RNA sequencing (snRNA-seq) provides a powerful approach to investigate cellular diversity within complex tissues. By isolating nuclei instead of whole cells, snRNA-seq overcomes key limitations of conventional single-cell RNA-seq. A central step in this process is the isolation of intact nuclei. Multiple protocols have been developed for nuclei isolation from both tissues and cell suspensions, and each source presents its own unique challenges and limitations. To address these complexities, this review examines and compares some of the most evaluated protocols across diverse biological systems, including human umbilical cord blood (CD34+ cells), kidney tissue, organoids, mouse brain, heart tissue, and plant tissue. Our analysis highlights that the choice of nuclei isolation method is a critical experimental variable influencing snRNA-seq data quality, with direct implications for the reliability, reproducibility, and interpretability of downstream analyses. Ultimately, this review aims to provide guidance for selecting the most appropriate nuclei isolation protocol, thereby ensuring robust data generation and advancing transcriptomic studies across a wide range of biological contexts.

Enhancing computational immunometabolomic analysis with domain knowledge

ABSTRACT. Metabolomics plays a crucial role in understanding mechanisms of immune-mediated diseases, effects of diet on immune systems, etc. Existing studies often identify biomarkers—individual metabolites, metabolic pathways, etc.—of different immune response parameters, immunological diseases, etc. These analyses mostly rely on experimental metabolomic profiles to detect differentiating patterns across biologically distinct sample groups. A limitation of such analyses is that the metabolomic system in any organism embodies intricate dynamics across heterogeneous entities. For example, multiple metabolites co-participate in reactions, multiple reactions comprise pathways, genes translate enzymes that affect reactions, etc. Such rich prior domain knowledge representing holistic metabolic systems is absent in current immunometabolomic analyses.

This work aims to investigate the impact of utilizing domain knowledge with experimental immunometabolomic profiles. Also, metabolomic datasets tend to be much smaller compared to different biomedical domains such as transcriptomics, electronic health records, clinical trials, etc. Our earlier works have presented several methods to exploit domain knowledge for augmenting metabolomic analyses, especially to address challenges of small-scale data. Our studies designed and utilized diverse knowledge networks to facilitate such analyses.

We presented the first study to integrate prior knowledge with nutritional metabolomics for assessing dietary intake. Here we presented several novel feature engineering strategies to generate interpretable heuristic representations for metabolic reactions and subsystems. We showed that the proposed features yielded novel insights regarding fine-grained biomechanisms and improved empirical performances. We utilized the presented approach to improve diet-specific metabolite biomarker identification and to present ranked subsets of individual diet-relevant metabolic reactions and subsystems.

Our another work was the first study to utilize domain knowledge to model and analyze longitudinal metabolic systems. Our approach generated interpretable and longitudinal representations for metabolic reactions and subsystems by inspecting time-series metabolic profiles. We demonstrated that the proposed features could better capture biologically relevant signals and improve downstream performances. The proposed method also identified top subsets of biologically relevant metabolic reactions and subsystems.

In another study, we focused on generating synthetic metabolomic data, as metabolomic data are often small and private. We presented a method that utilizes feature dependency graphs as regularizing constraints to prune down search spaces of learnable distributions, which makes the training convenient with a small volume of data. The graphs capture feature interrelationships—based on correlations in real data or domain knowledge. By using our method, the quality of synthetic data—evaluated from multiple perspectives—improved across many evaluation metrics and datasets compared to state-of-the-art baselines.

In this work, we aim to investigate the impact of the abovementioned methods on immunological analyses. These techniques are powerful for exploiting domain knowledge to provide detailed insights on underlying biodynamics, which is impactful for advancing computational immunology. The interpretable feature design strategies can enable biomarker metabolic pathway discovery for immune-mediated phenomena. Also, several studies have focused on understanding how diets impact immune systems. Metabolomics is a closely connected modality to biomechanisms of dietary intake. Our methods have already proven useful for dietary and nutritional analyses. This study aims to extend that to understand multi-dimensional relationships among diet, the immune system, diseases, and metabolism. We hypothesize that applying our earlier approaches to immunological contexts will yield novel hypotheses and facilitate empirical performance.

GeNeo2: a suite of bioinformatics tools for identifying tumor-specific neoepitopes with enhanced features for personalized cancer immunotherapies

ABSTRACT. The identification and prioritization of cancer-specific neoepitopes from next-generation sequencing data for personalized immunotherapies such as cancer vaccines remains challenging and requires the use of complex bioinformatics approaches. Here, we present GeNeo2, a new version with enhanced features of the GeNeo toolbox for predicting neoepitopes from matched tumor/normal exome sequencing data coupled with tumor RNA-Seq data (Al Seesi et al., 2023). A distinguishing feature of GeNeo2 is that it has integrated mass spectrometry immunopeptidomics tools, which can reveal neoantigens derived from various canonical and noncanonical sources. Also, we added tools for predicting neoepitopes encoded by validated indels. In the GeNeo2, we introduce a novel machine-learning approach to improve the accuracy of somatic variants and indels calling.

GeNeo2 tools can be accessed via web-based interfaces deployed on a Galaxy portal accessible at https://neo.engr.uconn.edu/. A virtual machine image for running GeNeo locally is also available to academic users upon request.

13:30-15:30 Session 4: Session II
13:30
Toward Automated and Scalable Hematopoietic Cell Sorting: A Deep Learning Approach

ABSTRACT. Background Hematopoietic stem cells (HSCs) and multipotent progenitors (MPPs) are central to the lifelong process of hematopoiesis. These populations display functional heterogeneity, with subpopulations such as long-term (LT)-HSCs, short-term (ST)-HSCs, and MPPs differing in self-renewal capacity, proliferation, and lineage potential. Traditional classification relies on fluorescence-activated cell sorting (FACS), which, while effective, is labor-intensive, technically demanding, and limited in scalability. Advances in multi-channel fluorescence microscopy now allow high-content imaging, but accurate automated classification remains a major challenge. Deep learning (DL), particularly transfer learning, offers an opportunity to accelerate and improve the identification of rare subpopulations.

Purpose This study aims to develop and evaluate a deep learning-based framework to classify functional subpopulations of HSCs and MPPs from multi-channel microscopic images. By leveraging transfer learning with state-of-the-art convolutional neural network (CNN) architectures, we seek to: 1. Improve classification accuracy at both image and single-cell levels. 2. Introduce an optimized preprocessing pipeline to integrate multi-slice images into robust RGB representations. 3. Benchmark performance against existing methods and provide an open-source pipeline for reproducibility.

Material and Method Cell samples of ST-HSCs, LT-HSCs, and MPPs were imaged using multi-channel microscopy. Each sample contained 2–4 grayscale slices, which were consolidated into single RGB images by channel mapping. This produced standardized 2048×2048 RGB representations. The curated dataset consisted of 1,457 ST-HSC, 1,047 LT-HSC, and 1,035 MPP samples. Two classification strategies were designed: • Image-level classification: Entire images were input into pre-trained ResNet and DenseNet models. • Cell-level classification: Individual cells were segmented via blob detection (Laplacian of Gaussian), cropped, and resized (64×64). This produced ~74,500 cell patches across the three classes. Transfer learning was applied with ResNet18–152 and DenseNet121–201 variants. Models were trained on NVIDIA A100 GPUs using Adam optimization, categorical cross-entropy loss, and extensive data augmentation (flips, rotations, scaling). Performance was evaluated using balanced accuracy and area under the receiver operating characteristic curve (AUROC). Model explainability was studied via Grad-CAM.

Result Both strategies demonstrated strong classification performance. At the image level, DenseNet169 achieved the best results with balanced accuracy of 86.9% and AUROC of 97.7%. At the cell level, DenseNet169 again outperformed other models, reaching balanced accuracy of 89.3% and AUROC of 99.5%. These results surpass prior studies, including Wang et al.’s ResNet50 approach. Confusion matrices indicated near-perfect performance for ST-HSCs, with slightly lower but robust accuracy for LT-HSCs and MPPs. Grad-CAM visualization revealed that the models primarily focused on cell interiors and distinctive textural features, underscoring biologically meaningful decision-making.

Conclusion We present a robust and scalable deep learning pipeline for the classification of hematopoietic subpopulations. By integrating multi-channel imaging, optimized preprocessing, and transfer learning, our DenseNet169-based framework significantly improves classification accuracy at both image and cell levels. The open-source release of our pipeline facilitates reproducibility and adoption. This work highlights the potential of automated image-based classification to complement or replace traditional FACS, paving the way for high-throughput, reproducible, and precise stem cell analysis in regenerative medicine and hematology research.

13:50
Exploring Large Language Models for Parameter Estimation: Insights from the Moran Process

ABSTRACT. The Moran process is a foundational model for studying stochastic evolutionary dynamics in finite populations. While traditionally used in forward simulations, the inverse problem, i.e., retrospectively inferring initial parameters from generational outcomes, remains largely unexplored. In this work, we introduce a novel framework that leverages large language models (LLMs), specifically ChatGPT, to recover the initial mutant count \(i\) and relative fitness \(r\) from complete evolutionary trajectories. We develop a fully automated pipeline that simulates synthetic Moran processes, encodes full birth-death histories into structured prompts, and extracts parameter estimates via natural language queries.

Our empirical evaluation across a wide range of configurations demonstrates that ChatGPT can infer both \(i\) and \(r\) with accuracy, often achieving near-exact recovery and revealing sensitivity to evolutionary dynamics. Notably, the model exhibits robust generalization and consistent trends across replicates, suggesting emergent capabilities for probabilistic reasoning in stochastic biological systems. These results introduce a new paradigm for integrating LLMs into model-based inference pipelines and open promising directions for AI-assisted discovery in computational biology.

Our study offers two main contributions. First, it introduces the reverse inference problem in the Moran process as a novel task for evaluating the inferential capabilities of LLMs under uncertainty. Second, it provides empirical evidence that ChatGPT can nontrivially approximate ground truth parameters based solely on generational event histories, without access to explicit probabilistic models. This suggests the presence of latent model-based reasoning capabilities in LLMs.

Several observations emerged from this study. The model performs more reliably on estimating the initial mutant count than on estimating relative fitness, and inference accuracy exhibits sensitivity to the amount of history presented. These findings underscore both the promise and the current limitations of LLMs when applied to structured generative processes. They also highlight the importance of carefully designed prompts and the potential role of hybrid modeling strategies that combine symbolic simulations with language-based inference.

14:10
Predicting Healthcare-Associated Infections in Emergency Department Patients with Solid Tumor History

ABSTRACT. Developing healthcare-associated infections (HAI) during a hospital stay can lead to increased costs and negative outcomes such as in-hospital mortality, while posing risks for patients and healthcare workers. Individuals with a compromised immune system are considered to be particularly vulnerable because their likelihood of developing infection is higher and treatment is more challenging. Among them, patients with a medical history of solid tumors (ST) are considered a high-risk group for HAIs [5,6]. As a consequence of oncological treatments, the immune system is generally weakened; furthermore, frequent healthcare visits and prolonged stays in the emergency department (ED) further increase their susceptibility to HAIs. Thus, in the ED, the timely identification of high-risk patients can help direct preventive measures and improve outcomes. Earlier studies have examined general risk factors and prediction tasks for HAI, but HAI in the ST subgroup has not been studied in detail [1,3,4,7].

In our work, we assessed whether having a medical history of ST is associated with a higher risk of HAI. Using a comorbidity extraction algorithm, we identified patients with ST medical history from free-text patient reports noted in Italian language. We then compared patients with ST medical history with other patients and highlighted predictors linked to infection risk. By analyzing comorbidity information, we aim to provide a more detailed understanding of infection risks in immunocompromised patients.

For comorbidity extraction, we evaluated a rule-based approach and an LLM-based approach (using ChatGPT-4 and LLaMA-3B-8, with and without preprocessing). Using 200 annotated samples provided by ED physicians, we compared them in terms of precision, recall, and F1-score. For the rule-based approach, we first removed negation expressions and then applied regular expression patterns (the rule set is available at https://github.com/aeyc/ComorbidityExtractionRegEx ). The rule-based approach achieved the best overall performance, with higher recall (90.65%) and F1-score (80.77%), and comparable precision (72.83%) to the highest precision obtained by ChatGPT-4 with preprocessing (75.64%). This comorbidity extraction algorithm was subsequently applied to define the ST group and derive the Charlson Comorbidity Index, a well-established assessment tool for mortality prediction [2].

This retrospective study was conducted using data from the official registry of the Local Health Agency of Romagna, Forlì, covering the period between 1 January 2017 and 31 December 2022. A total of 37,263 patients aged >18 years with no missing data were hospitalized. Of these, 7,366 stayed in hospital less than 3 days, an insufficient time frame for HAI, and 1,094 were transferred to another acute hospital. The resulting cohort of 28,803 patients was divided into two main groups: 4,628 (16.1%) patients with ST medical history (ST group), and 24,175 (83.9%) patients without ST medical history (non-ST group). In the ST group, 499 (10.8%) patients developed HAI, a 2.3% higher rate than in the non-ST group.

We compared demographics, comorbidity profiles, National Early Warning Score, length of stay, age, gender, diagnosis code distributions, priority level of ED visit, trauma, and seasonal variables (year, month, season, COVID-19 period) between ST and non-ST groups using odds ratios, 95% confidence intervals, and p-values. For descriptive data analysis, Kaplan–Meier survival analysis was performed for groups and in-hospital mortality up to 30 days. The Cox proportional hazards model was used to estimate the hazard ratio of in-hospital mortality for patients with ST medical history who developed HAI.

Due to the unbalanced nature of the data, different balancing strategies were considered. Oversampling, which creates synthetic samples, can introduce bias in the medical field. Therefore, we tested random undersampling with sampling ratios ranging from 0.1 to 1, in increments of 0.1. Ultimately, we selected sampling ratio = 1 for the final model, meaning randomly selecting an equal number of samples from the majority class to match the minority class size.

Later, logistic regression, random forest, extreme gradient boosting, and light gradient boosting models were tested for prediction of developing HAI in the ST group. Precision, recall, F1-scores, and AUC performance metrics were evaluated for model selection. The resulting model was used with stepwise regression for feature selection.

Advanced age, male sex, and in-hospital mortality rates were higher in the ST group; no significant differences were observed in National Early Warning Score, length of stay, priority levels of visit, or seasonal variables. Trauma-related visits were more common in the non-ST group. Diseases of the respiratory system, circulatory system, digestive system, genitourinary system, and infectious and parasitic diseases were the most observed diagnosis categories; digestive system, genitourinary system, and infectious and parasitic diseases were more prevalent in the ST group. Except for connective tissue disease, hemiplegia, dementia, and AIDS, all comorbidities were more frequent in the ST group. Top comorbidities in the ST group were diabetes mellitus, chronic pulmonary disease, chronic kidney disease, and liver disease, which also showed the largest percentage difference compared to the non-ST group.

Kaplan–Meier survival analysis showed that the ST group had lower survival rates for 30-day in-hospital mortality. The largest differences were observed during the first 25 days.

The Cox model showed that patients with a history of ST who developed HAI faced a 73% higher risk of in-hospital death compared with patients without ST who did not develop HAI. Even those with ST who did not develop HAI had a 67% higher risk.

Patients in the ST group had a higher chance of developing HAI and showed stronger associations (in terms of p-values and differences) than those in the non-ST group. This was especially evident for HAI categories such as sepsis (with ICD9 code starting with 995.91), infections related to internal prosthetic devices (996.6), septicemia with specific organisms (038, 038.4), influenza with pneumonia (487.0), urinary tract infections (599.0), bronchopneumonia of unspecified organism (485), and unspecified septicemia or bacteremia (038.8, 790.7).

Logistic regression was selected as the main model with AUC 0.694 ± 0.025. At the optimal cutoff point (0.489), sensitivity was 0.627 and specificity 0.671. Selected features were infectious and parasitic diseases, visit during the COVID-19 period, advanced age (80+), diseases of the genitourinary system, and digestive system.

This study highlights the elevated risk of developing HAI for patients visiting the ED with a ST medical history. Survival analysis confirms poorer outcomes in this group, underscoring the need for early identification of high-risk patients. Our predictive model, based on diagnosis, age, and COVID-19 parameters, achieved moderate performance, suggesting that early detection of HAI in ST patients could inform preventive strategies, optimize resource allocation, and ultimately improve patient outcomes. Future work should explore integration of such models into clinical decision support systems to support timely interventions.

14:30
Computational and Experimental Integration Reveals Multi-scale Effects of Peroxidase Mutations

ABSTRACT. Peroxidase enzymes are central to immune defense, where they generate reactive oxygen species that regulate both microbial killing and tissue development. To bridge molecular-level mutations with organismal immune and developmental outcomes, we applied multi-scale modeling to Drosophila melanogaster Curly Su (dMPO), a homolog of human myeloperoxidase. Using computational saturation mutagenesis, we assessed over 11,000 missense variants and identified destabilizing mutations, including G378W and W621R, that compromise protein stability. CRISPR-engineered flies carrying these mutations displayed profound phenotypic consequences including abnormal wing morphology, reduced lifespan, and altered immune competence. Transcriptomic profiling further revealed consistent disruption of metabolic and immune pathways, linking structural perturbations to systemic regulatory changes. Extending this framework to other human peroxidases highlights recurrent destabilizing mutations associated with immunodeficiency, autoimmunity, and cancer. Our integrative approach demonstrates how computational protein modeling, genome editing, and transcriptomic data can be unified into a multi-scale platform, connecting molecular destabilization of immune enzymes to organism-level phenotypes. This work provides a foundation for predictive modeling of immune responses across scales and for identifying pathogenic variants with translational relevance.

14:50
CoDER: A new framework for Consistent Drug Efficacy Ranking

ABSTRACT. Drug discovery is slow and expensive, often taking more than a decade for a single approved therapy. Drug repurposing provides a faster alternative by reusing existing compounds, but ranking candidates is complicated by tissue heterogeneity: a drug effective in one tissue may fail in another due to differences in gene expression and regulation. Existing ranking methods typically average across tissues or focus on one context, masking this variability. We present CoDER (Consistent Drug Efficacy Ranking), a framework that finds drugs whose relative ranking is preserved across at least λ (lambda) tissues, where λ (lambda) denotes the minimum number of tissues in which an intersection must occur. Formulated as a λ (lambda)-valid path optimization, the problem is NP-hard, motivating heuristic methods. In a case study on Alzheimer’s disease, CoDER recovered consistent drug sequences including insulin, tamoxifen, and celecoxib.

15:10
On the Intractability of Evolutionary Dynamics in Layered Graphs

ABSTRACT. Understanding the computational complexity of ecological and evolutionary dynamics is crucial for delineating when fixation probabilities can be efficiently computed. However, ecological systems, such as those involving population structure, multiplayer games, or evolving topologies, often exhibit complex spatial, temporal, and combinatorial properties. Previous work has established that even simplified ecological dynamics lead to NP- and \#P-complete problems, showing that no general closed-form formulas can exist for takeover probabilities in structured populations.

Building on this foundation, we investigate a restricted yet biologically motivated setting: a layered population structure represented by planar graphs, where connectivity is maintained by a single inter-layer edge. Despite the structural simplicity, we prove that the fixation problem, that is, deciding whether an invading species can eventually take over, remains NP-complete. Our reduction shows hardness persists even when out-degree is limited to two and invasion is subject to adjacency constraints.

This result sharpens the tractability boundary by demonstrating that intractability holds under stronger restrictions than previously known. Beyond theoretical interest, our work informs immunological and ecological modeling, where spatial structures are layered (e.g., tissue compartments, microbial strata). The results suggest that algorithmic barriers are inherent and highlight the need for approximation heuristics or simulation-based methods in applied settings.

16:00-17:00 Session 5: Session III
16:00
Regulatory Drift in Immune Aging: Enhancer Logic and Methylation-Based Predictors Across Species

ABSTRACT. Immunosenescence is a hallmark of aging, characterized by the progressive decline and dysregulation of immune function. It manifests through structural remodeling of lymphoid organs, reduced regenerative capacity, altered cytokine profiles, and impaired responses to pathogens and vaccines. These changes are driven by complex, multi-layered regulatory shifts across genomic, epigenomic, and transcriptional domains. Among these, DNA methylation has emerged as a robust biomarker of biological age, yet its application to immune-specific aging remains underexplored, particularly in the context of enhancer-mediated regulation and transcriptional resilience.

This project proposes a biologically interpretable framework for modeling age-linked methylation drift in immune tissues using enhancer-aware CpG annotation and transcription factor motif scanning. The approach emphasizes immune-relevant regulators such as FOXO3, NF-κB, STAT1, and IRF8, integrating cross-species conservation and motif density to prioritize CpG sites within enhancer regions. By combining regulatory genomics with interpretable machine learning, the goal is to uncover conserved logic underlying immune aging and build predictive models with translational utility.

Scientific Objectives:

Enhancer-aware CpG prioritization: Annotate CpG sites using ENCODE immune enhancer tracks, UCSC conservation scores, and motif enrichment analysis to identify regulatory hotspots linked to immune aging.

Interpretable modeling of methylation drift: Apply gradient boosting and deep learning frameworks with SHAP and integrated gradients to extract biologically meaningful patterns of age-related methylation change in T-cell and myeloid lineages.

Cross-species validation: Compare methylation drift signatures across human and murine datasets to assess evolutionary conservation and tissue specificity, leveraging resources such as Tabula Muris Senis and ImmGen.

Translational applications: Explore the utility of enhancer-linked CpG signatures in predicting vaccine responsiveness, inflammaging risk, and age-associated immune dysfunction.

Methodological Framework:

The study will leverage publicly available methylation datasets from GEO, MethAgingDB, and ImmGen, focusing on immune-relevant tissues such as peripheral blood mononuclear cells (PBMCs), spleen, lymph nodes, and sorted immune cell populations. CpG sites will be annotated using a modular pipeline that integrates enhancer context, transcription factor motif scanning, and cross-species conservation metrics. Motif libraries will include curated immune regulators, and enhancer maps will be derived from ENCODE and Roadmap Epigenomics datasets.

Machine learning models will be trained to predict biological age and immune resilience, with feature attribution methods used to identify key CpGs and regulatory motifs driving drift. Visualization tools will be developed to map methylation trajectories and enhancer logic across age cohorts. Cross-species comparisons will be performed using matched human and mouse datasets to identify conserved aging signatures and assess the generalizability of the models.

Significance and Impact:

This work bridges computational epigenomics and systems immunology by offering a transparent, mechanistically grounded framework for immune aging clocks. It advances the field by integrating enhancer logic into methylation modeling, enabling the identification of regulatory drift patterns that are both interpretable and evolutionarily conserved. The resulting models have potential applications in personalized immunosenescence profiling, age-stratified vaccine design, and early detection of immune dysfunction.

By aligning computational rigor with biological relevance, this project contributes to the development of next-generation biomarkers for aging and immune health. It also opens pathways for collaborative validation in experimental immunology settings and integration with multi-omics platforms, including transcriptomics and proteomics. The framework is designed to be modular and reproducible, supporting future extensions into host-pathogen interaction modeling and immune repertoire analysis.

16:20
Multi-View Graph–Text Alignment for Immunology

ABSTRACT. Many components of the immune system can be represented as graphs, where cells, molecules, and pathways act as nodes, and their interactions form the edges. In this work, we present a framework that connects immune-related graphs with textual descriptions by aligning them in a shared embedding space. The key idea is to analyze graphs from different views. On the text side, we train a model with multi-view pooling to produce view-specific embeddings from literature, ontologies, or disease associations. On the graph side, we use a graph model with multiple projection heads to generate embeddings for each view. Finally, we integrate the two modalities using a CLIP-style contrastive objective, with orthogonality regularization to keep the views disentangled.

16:40
Multi-Agent LLM Frameworks for Multimodal Identification of Alzheimer’s Disease

ABSTRACT. Alzheimer’s disease (AD) poses major challenges for patients, caregivers, and healthcare systems. Its complexity and heterogeneity—spanning genetics, neuroimaging, biomarkers, and cognitive factors—make early and accurate diagnosis difficult. While biomedical data collection has advanced rapidly, existing AI approaches often rely on narrow or unimodal models with limited generalization and interpretability. At the same time, large language models (LLMs) show strong reasoning abilities but are not yet adapted to AD’s multimodal landscape. We propose a vision for using multi-agent LLM frameworks to address this gap. Unlike single-agent models, multi-agent systems enable specialized agents for different modalities to collaborate dynamically. Adaptive, task-driven topologies allow these agents to integrate heterogeneous signals and provide robust, interpretable diagnostic reasoning. This blueprint highlights the potential of multi-agent LLMs to advance AD identification and support clinical decision-making.