ICCABS 2026: THE 14TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES
PROGRAM FOR SUNDAY, FEBRUARY 15TH
Days:
next day
all days

View: session overviewtalk overview

10:20-11:20 Session 3: ICCABS-I
Location: Room A/B
10:20
Benchmarking foundation models for splice site and exon annotation

ABSTRACT. Recent foundation and deep learning models have brought a generational leap in improving the quality of genome annotation, particularly for identifying genes and their structural elements, including exons and splice sites. However, they are trained on reduced data sets that may not capture biological complexity, such as coding versus non-coding, terminal versus internal exons, constitutive versus alternatively spliced exons, and transposable element-derived elements. We evaluate several foundation models for gene and splice site annotation, including the transformer-based SegmentNT, Enformer and Borzoi, coupled with a segmentation head for per-base resolution, and the CNN-based SpliceAI and AlphaGenome, along with a newly developed fine-tuned expert model, on different classes of gene elements as described above. We found that the performance of all methods is highest for the class of exons found in their training data class and decreases drastically for classes of exons poorly represented. In particular, performance is highest for protein-coding genes, coding exons, and constitutive exons, and decreases drastically by up to 2-3 fold for non-coding internal exons, terminal exons, and exons that undergo alternative splicing. Similarly, performance is impaired on LINE-1 and Alu-derived exons. In contrast, a locally developed expert CNN fine-tuned on a comprehensive dataset showed improved performance across multiple categories. Our study highlights the outstanding challenges in gene and exon annotation when leveraging powerful foundation models, and the need for further fine-tuning on judiciously selected classes of data or task-specific models to capture a broader, more diverse spectrum of gene features.

10:40
Cross-Attention Transformer for Prostate Cancer Grading with LLM-Guided Histopathology Descriptions

ABSTRACT. Accurate prostate cancer grading from whole slide images (WSIs) is clinically important yet challenging due to heterogeneous tissue morphology, weak slide-level supervision, and limited interpretability in current deep learning systems. Existing MIL-based grading models rely exclusively on visual patch features and do not incorporate the Gleason semantics that define gland structure and tumor progression. We propose a cross-modal transformer framework that performs prostate grading by aligning slide-level visual embeddings with automated textual descriptions generated by a frozen large language model (LLM). For each Gleason grade, the LLM provides descriptive prompts that summarize characteristic glandular patterns, enabling semantic supervision without expert-written annotations. Refined patch embeddings are aggregated into a slide representation using a convex attention mechanism, while a cross-attention transformer conditions class-specific text tokens on relevant visual regions to produce slide-aware textual embeddings. Classification is performed using cosine similarity between normalized slide and text embeddings in a shared latent space, improving weakly supervised learning and interpretability. Experiments on prostate WSIs demonstrate competitive performance relative to state-of-the-art MIL approaches, while offering clinically meaningful insights through automated pathology semantics.

11:00
NeuroGAN-3D: Enhancing Intrinsic Functional Brain Networks via High-Fidelity 3D Generative Super-Resolution

ABSTRACT. Recent advances in neuroimaging have deepened our understanding of the brain’s complex functional and struc- tural organization. Among these, functional Magnetic Resonance Imaging (fMRI) – particularly resting-state fMRI (rs-fMRI) – has emerged as a critical tool for identifying biomarkers of intrinsic brain connectivity and delineating large-scale neural networks. These networks are typically represented as volumetric spatial maps that capture functionally coherent brain regions and reflect individual differences in brain activity and structure. The spatial resolution of these maps plays a crucial role, as it determines the ability to localize functional units with precision, perform reliable brain parcellation, and detect subtle, spatially specific neurobiological alterations associated with development, aging, or disease. Therefore, improving the effective resolution of neuroimaging-derived maps holds significant promise for enabling more detailed insights into brain architecture and its relationship to behavior and pathology. To address this need, we propose NeuroGAN-3D, a novel 3D generative super-resolution model tailored to the computational demands of volumetric neuroimaging. Our model leverages a generative adversarial network architecture to enhance the spatial resolution of rs-fMRI spatial maps, significantly outperforming a conventional baseline

11:20-11:40Coffee Break
11:40-13:00 Session 4A: CASCODA-I
Location: Room C/D
11:40
Methods for Spatial Analysis of Immune Cell Interaction in Breast Cancer

ABSTRACT. NA

12:00
Noise reduction methods for single cell RNA and VDJ sequencing data

ABSTRACT. Single cell sequencing data is prone to noise such as the presence of ambient contamination and multiplets (doublets, triplets and etc.) Tools like CellBender for ambient RNA removal, and scDblFinder for multiplet removal are very effective at de-noising scRNA-Seq data. However, no such tools have been developed for de-noising scVDJ-Seq data (T-cell and/or B-cell receptor sequencing data). In this work, we show that high levels of noise are present in some datasets, and present preliminary computational methods for reducing the noise in scVDJ-Seq data which is a confounding factor in downstream analyses.

12:20
Diversity and Distinctive Characteristics of the Global RNA Virome in Urban & Peri-urban Environments

ABSTRACT. RNA viruses represent an integral component of human-associated environments and human health. However, the ecology of environmental RNA viruses remains largely unexplored. Here, we analyzed 2,922 metatranscriptomic samples collected from urban and surrounding environments across 102 cities in 31 countries, and constructed the Urban & Peri-urban RNA Virus Atlas (UPVAtlas), comprising 54,945 RNA viruses, 77% of which had not been previously observed. Phylogenetic reconstruction and directional selection analyses based on RNA-dependent RNA polymerases from UPVAtlas greatly expanded the evolutionary diversity and feature of RNA viruses, leading to the identification of two potential candidate phyla, one candidate class, and several unclassified clades, as well as providing additional insights into the multiple origins of double-stranded RNA viruses. Host association analyses further revealed the ecological complexity of environmental RNA viruses, with the diversity of vertebrate-related and ESKAPE pathogen-related viruses underscoring the importance of continued monitoring and mapping of urban environments for tracking RNA viral prevalence and dynamics, with direct relevance to future public health.

12:40
Inferring mutational order via a novel dependency graph-based approach

ABSTRACT. Accurate prediction of tumor evolution and the sequential appearance of somatic mutations can improve our understanding of cancer progression. In this study, we compare a Recurrent Neural Network (RNN) model based on Long Short- Term Memory (LSTM) against a classical time-series model, Autoregressive Integrated Moving Average (ARIMA), for predicting the presence of a mutation in a specific gene in a colon adenocarcinoma sample based on a sequence of temporally-ordered mutations. For gene order inference, we propose a novel approach creating a dependency directed acyclic graph from which we infer local and global mutational gene orders and validate the proposed gene order using colon cancer mutational profiles.

11:40-13:00 Session 4B: CAMeRA-I
Location: Room A/B
11:40
AmpliconHunter2: a SIMD-Accelerated In-Silico PCR Engine

ABSTRACT. In this talk, we present AmpliconHunter2 (AHv2), a highly scalable in silico PCR engine written in C that can handle degenerate primers and uses a highly accurate melting temperature model. AHv2 implements a bit-mask IUPAC matcher with AVX2 SIMD acceleration, supports user-specified mismatches and 3’ clamp constraints, calls amplicons in all four primer pair orientations (FR/RF/FF/RR), and optionally trims primers and extracts fixed-length flanking barcodes into FASTA headers. The pipeline packs FASTA into 2-bit batches, streams them in 16 MB chunks, writes amplicons to per-thread temp files and concatenates outputs, minimizing peak RSS during amplicon finding. We also summarize updates to the Python reference (AHv1.1).

AmpliconHunter2 is available as a freely available webserver at: https://ah2.uconn.engr.edu. Source code is available at: https://github.com/rhowardstone/AmpliconHunter2 under an MIT license. AHv2 was implemented in C; AHv1.1 using Python 3 with Hyperscan.

12:00
All Edges Lead to Nodes: Can Multiple Methods Lead to the Same Hi-C Contact Maps?

ABSTRACT. Hi-C (High-Throughput Chromosome Conformation Capture) sequencing has emerged as a valuable approach in understanding the dynamics of antimicrobial resistance in bacterial populations. In environmental microbiology, valuable samples are frequently collected at remote sites and transported to laboratories for sequencing. Hi-C relies on the preparation of two libraries per sample; the crosslinked connections between microbes and resistance plasmids and shotgun metagenomic data used as input to a MAG assembly. These Hi-C experiments rely on short read sequencing, and production of sufficient read data is often more expensive in comparison to standard metagenomics because of the need for two distinct run types. In order to reduce the barriers to use of the Hi-C approach with environmental samples, we tested two experimental variations. First, is the Proximeta protocol tolerant to starting material collected and preserved on electronegative filters (HA filtration) rather than using a fresh pellet as recommended by the protocol? Ability to use a filtered product that is common in applications like wastewater epidemiology will decrease logistical barriers to sample collection and distribution. Second, will long-read Oxford Nanopore shotgun sequencing used in place of Illumina sequencing produce superior results? The flexibility to replace the metagenomic component of the sequencing protocol with ONT has the potential to reduce costs and make Hi-C more accessible to labs in the environmental genomics space. Additionally, what adjustments are needed to make these analyses work? The necessary read depth of the ONT and Illumina sequences was tested using rarefaction curves. The quality of MAGs (Metagenomic Assembled Genomes) will also be compared between both Hi-C approaches. Preliminary analysis using both Kraken and MetaPhlAn show no significant difference in detected alpha diversity as a result of changes in either sampling or sequencing methods. Sufficient depth was identified to capture most of the community with multiple taxonomic techniques for both ONT and Illumina.  Future work will benchmark these methods using Hi-C derived comparisons of the communities to determine consistency and potential interoperability.

12:20
Branch lengths inference and its application in beta diversity computation

ABSTRACT. Distance-guided tree construction with unknown tree topology and branch lengths has been a long studied problem. In contrast, distance-guided branch lengths assignment with fixed tree topology has not yet been systematically investigated, despite having significant applications. In this paper, we provide a formal mathematical formulation of this problem and propose two representative methods for solving this problem, each with its own strength. We evaluate the performance of these two methods under various settings using simulated data, providing guidance for the choice of methods in respective cases. We demonstrate a practical application of this operation through an extension we termed FunUniFrac, which quantifies the differences in functional units between metagenomic samples over a functional tree with assigned branch lengths, allowing clustering of metagenomic samples by functional similarity instead of taxonomic similarity in traditional methods, thus expanding the realm of comparative studies in metagenomics.

12:40
Foundation Models for Gut Microbiome Representation Learning

ABSTRACT. The human gut microbiome is strongly linked to health and disease, yet modeling metagenomic data remains challenging due to high dimensionality, sparsity, and substantial cross-study heterogeneity. Recent work has proposed transformer-based foundation models for microbiome data, but their advantages over simpler baselines are not well established. In this work, we investigate whether foundation model architectures adapted from single-cell transcriptomics can learn meaningful representations for gut microbiome data. Using the Human Microbiome Compendium, we characterize dataset quality and identify pronounced batch effects driven by study-specific confounders. We evaluate pretrained and non-pretrained transformer models on multiple downstream prediction tasks and compare them against XGBoost baselines. While pretraining yields modest and task-dependent improvements, tree-based models consistently outperform transformers. Our results suggest that current microbiome foundation models are limited by data sparsity and shortcut learning, highlighting the need for evaluation protocols that emphasize cross-study generalization and biologically meaningful structure.

13:00-14:00Lunch Break (lunch provided)
15:10-15:30Coffee Break
15:30-17:10 Session 6A: CASCODA-II
Location: Room C/D
15:30
Computational Inference of CRC Cancer Gene Mutation Order

ABSTRACT. The ability to understand the temporal gene order in which driver gene mutations accumulate is essential for modeling cancer progression and improving diagnostic strategies. In colorectal cancer, tumor development is described as a multistep evolutionary process. However, mutation ordering can vary across samples, leading to different gene mutation orders. This thesis proposes the use of a graph-based approach for finding the mutation order relationships between driver genes using binary mutation data. Given a gene-by-sample mutation dataset, directed edges are derived based on statistical support computed across 200 bootstrap datasets. Cycles present in the resulting directed graph are removed using multiple support threshold strategies which form a Directed Acyclic Graph (DAG). The longest path in the DAG is then computed to identify the most likely sequence of gene mutations involved in tumor progression. The derived mutation order is compared against several alternative gene orderings, including mutation frequency-based gene order, the gene order derived using established order score method, gene order from the densest graph and a gene order generated by a generative AI model. The results demonstrated that the proposed DAG approach captures meaningful mutation relationships while supporting diverse tumor evolutionary patterns.

15:50
Manifold learning reveals cell cycle plasticity underlying fractional resistance to palbociclib in ER+/HER2− breast tumor cells

ABSTRACT. The CDK4/6 inhibitor palbociclib blocks cell cycle progression in Estrogen receptor–positive, human epidermal growth factor 2 receptor–negative (ER+/HER2−) breast tumor cells. Despite the drug’s success in improving patient outcomes, a small percentage of tumor cells continues to divide in the presence of palbociclib—a phenomenon we refer to as fractional resistance. It is critical to understand the cellular mechanisms underlying fractional resistance because the precise percentage of resistant cells in patient tissue is a strong predictor of clinical outcomes. Here, we hypothesize that fractional resistance arises from cell-to-cell differences in core cell cycle regulators that allow a subset of cells to escape CDK4/6 inhibitor therapy. We used multiplex, single-cell imaging to identify fractionally resistant cells in both cultured and primary breast tumor samples resected from patients. Resistant cells showed premature accumulation of multiple G1 regulators including E2F1, retinoblastoma protein, and CDK2, as well as enhanced sensitivity to pharmacological inhibition of CDK2 activity. Using trajectory inference approaches, we show how plasticity among cell cycle regulators gives rise to alternate cell cycle “paths” that allow individual tumor cells to escape palbociclib treatment. In additional work, we posit that spherical manifold approximations represent these single-cell populations, suggesting that significant changes in latent lower dimensional manifold structures correspond to distinct cell cycle behaviors. Leveraging an existing manifold approximation method, we fit single-cell data to a hypersphere and establish an empirical hypothesis testing framework to quantify differences in these spheres across conditions. Our model-agnostic approach enables the direct quantification of the effect of single-cell perturbations, treatments, or differences between patient tumors, revealing cellular behaviors in a novel paradigm. Understanding drivers of cell cycle plasticity, and how to eliminate resistant cell cycle paths, could lead to improved cancer therapies targeting fractionally resistant cells to improve patient outcomes.

16:10
A Comprehensive Benchmark of Discrepancies Across Microbial Genome Reference Databases

ABSTRACT. Metagenomic profiling of microbial communities relies heavily on the quality and completeness of reference genomes. However, the reliability of these analyses is often compromised by substantial discrepancies across existing genomic resources, including variations in assembly quality, fragmentation, and taxonomic annotation. While these inconsistencies are known to introduce bias, the extent of divergence between major databases remains largely unquantified. Here, we present a comprehensive benchmark of discrepancies across multiple widely used microbial genome reference resources. To facilitate this evaluation, we developed the Cross-DB Genomic Comparator (CDGC), a framework that encodes genome alignments to capture base-level matches, insertions, and deletions, enabling precise and reproducible quantification of genomic similarity. Our benchmarking revealed significant inconsistencies dependent on domain and assembly quality. While 99% of viral genomes were identical across databases, fungal genomes showed greater variability, with 96.81% reaching at least 90% similarity. In bacteria, discrepancies were strongly linked to fragmentation: while 86% of single-contig genomes matched perfectly, this dropped to just 41% for genomes with more than 10 contigs, with many displaying partial or no similarity. Our findings quantify the severe impact of assembly fragmentation on database consistency and emphasize the critical need to consolidate fragmented contigs to ensure reproducible metagenomic analyses.

16:30
Compactly Representing a set of Tumor Phylogenies

ABSTRACT. TBA

16:50
Cell-to-Cell Communication Analysis in Skin Cancer Using Cell Chat

ABSTRACT. Skin Cancer is the most prevalent cancer in the United States, and its development is known to be dependent on the interactions between the cells of the tumor, immune and stromal within the tumor microenvironment (TME ). However, cell level expression anlaysis of these cells becomes possible due to single-cell RNA sequencing (scRNA-seq) analysis. On the other hand, very little is known how therapeutic interventions reshape cell–cell communication networks at the signaling level. Treatments such as RTX(Resiniferatoxin) and 6OHDA(6-hydroxydopamine) have different effects on communication patterns in tumor-infiltrating lymphocytes (TILs) versus tumor-draining lymph nodes (TDLNs).

In this study we are trying to examine the intercellular communication networks between groups of cells in Control, RTX and 6OHDA using s single-cell RNA sequence data from mouse skin cancer tissues obtained by 10x genomics chromium platform.

Using this framework, we can identify key sender and receiver cell populations, dominant signaling pathways and treatment-driven changes in communication patterns within the TME. When we focus on intercellular signaling rather than gene expression alone it can provide a system-level view of how therapies modulate immune and tumor communication in skin cancer. Overall, this work displays the use of CellChat for comparative cell–cell communication analysis and highlights the importance of signaling network inference for understanding therapeutic response in cancer.

15:30-17:10 Session 6B: CAME-I
Location: Room A/B
15:30
Predicting clonal progression in cancers with blood-based genomics

ABSTRACT. This talk will describe efforts to better predict clonal evolution from analysis of blood-based genomic data, particularly circulating tumor DNA (ctDNA), which has emerged as an important tool for bringing genomics to the clinic in cancer care. We will examine the problem of ctDNA signal and its interpretation through mechanistic computational models. We will then explore statistical and machine learning approaches to leverage this signal to better predict disease progression outcomes. We will conclude with work in progress on using ctDNA signal for real-time tracking of clonal progression and prospects of leveraging this for further improvement in our prospects for improving clinical decision-making with blood-based genomics.

15:50
Identification of ADAR editing signatures across bacterial, viral, and fungal infections

ABSTRACT. RNA editing through adenosine deaminases acting on RNA (ADARs), A-to-I editing, introduces transcriptomic diversity, allowing for adaptive and dynamic responses to internal and external factors. ADAR editing plays a critical role in neurodevelopment, central nervous system regulation, and responses to infections. Deviations in editing have been observed in various diseases, including Parkinson’s and congenital Zika syndrome. Despite this, it is unknown whether editing within a diseased state is wholly stochastic or specific to a disease. To address this question, an exploratory application of Benford’s Law was used to collect condition-specific editing signatures. We analyzed patterns of ADAR editing from the whole blood transcriptomes of patients infected with Gram-negative bacteria Rickettsia or Leptospira, a fungus Candida, and ssRNA Dengue virus. Our in-silico results showed that ADAR1 and ADAR3, and ADAR1 and ADAR2 genes were differentially expressed in the Rickettsia versus Candida, and Rickettsia and Dengue comparisons, respectively. Global editing levels differed significantly between all cohorts (p < 0.05), and significant differentially edited sites were identified. Condition-specific editing signatures composed of 10 sites were collected through Benford’s Law and validated using the K-nearest neighbors machine learning algorithm. Balanced accuracy results of the models ranged from 63% for Dengue to 75% for Candida. In addition to collected editing signatures, differences in enriched gene ontology pathways harboring editing targets - particularly between bacterial infections - further suggests that ADAR editing is pathogen-specific. Noteworthy, pathways relevant to neurologic conditions, such as Huntington’s, Parkinson’s and Alzheimer’s diseases, were found to be enriched by edited genes in infections. Shifts in ADAR editing within such pathways may contribute to clinical symptoms during and post-infection, although experimental validation is needed. Ultimately, our findings suggest that ADAR editing is condition-specific and that the differences in editing landscapes may contribute to the development of differing symptoms and post-infection health outcomes.

16:10
Embedding-based methodology for Balanced Minimum Evolution phylogenetic inference

ABSTRACT. N/A

16:30
Benchmarking Deep Generative Models for Antibody Design and Optimization: A Systematic Evaluation

ABSTRACT. Deep generative models have emerged as powerful tools for computational antibody design. However, the rapid proliferation of methods, spanning diffusion models, flow matching, graph neural networks, and protein language models, has limited our ability to compare them fairly. Existing evaluations rely on inconsistent datasets, incompatible metrics, and isolated tasks, making it difficult to assess true progress in the field. Therefore, we present a systematic benchmark evaluating recent methods across standardized tasks, including CDR-H3 design, full variable region generation, and binding affinity optimization. We assess structural quality, sequence recovery, binding affinity, novelity, and computational efficiency under controlled conditions to evaluate their performance and identify systematic failure modes. We release our evaluation pipeline to enable reproducible comparison of future methods and identify the most promising directions for future research in AI-driven antibody engineering.

16:50
Uncovering Hierarchical Structure in LLM Embeddings with $\delta$-Hyperbolicity, Ultrametricity, and Neighbor Joining

ABSTRACT. The rapid advancement of large language models (LLMs) has enabled significant strides in various fields. This paper introduces a novel approach to evaluate the effectiveness of LLM embeddings through comprehensive geometric and topological analysis. We investigate the structural properties of these embeddings through six complementary metrics: $\delta$-hyperbolicity and ultrametricity quantify tree-like hierarchical structure; Neighbor Joining assesses algorithmic tree reconstruction quality; Ollivier-Ricci curvature captures local geometric properties; persistent homology measures topological stability across scales; and fractal dimension estimates intrinsic complexity. By analyzing the embeddings generated by LLMs using these metrics, we uncover the extent to which the embedding space reflects underlying hierarchical, geometric, and topological organization.