EfficientNet in Digital Twin-based Cardiac Arrest Prediction and Analysis
ABSTRACT. Cardiac arrest is one of the biggest global health problems, and early identification and management are key to enhancing the patient's prognosis. In this paper, we propose a novel framework that combines an EfficientNet-based deep learning model with a digital twin system to improve the early detection and analysis of cardiac arrest. We use compound scaling and EfficientNet to learn the features of cardiovascular images. In parallel, the digital twin creates a realistic and individualized cardiovascular system model of the patient based on data received from the Internet of Things (IoT) devices attached to the patient, which can help in the constant assessment of the patient and the impact of possible treatment plans. As shown by our experiments, the proposed system is highly accurate in its prediction abilities and, at the same time, efficient. Combining highly advanced techniques such as deep learning and digital twin (DT) technology presents the possibility of using an active and individual approach to predicting cardiac disease.
Improved Graph-Based Antibody-Aware Epitope Prediction with Protein Language Model-Based Embeddings
ABSTRACT. The accurate identification of B-cell epitopes is critical in antibody design, diagnostics, and immunotherapies. Many \textit{in silico} approaches have recently been proposed to predict epitopes, but these approaches struggle primarily because of the variational and conformational nature of epitopes. However, deep learning-based approaches have recently shown great promise in achieving better performance at the epitope prediction task. In this paper, we employ a graph convolutional network (GCN) coupled with pre-trained protein language model (PLM)-based embeddings for epitope prediction on a benchmark antibody-specific epitope prediction (AsEP) dataset. We explore the use of different PLM-embedding methods on the epitope prediction task and show that the choice of PLM embeddings impacts the performance. Specifically, we find that antibody-specific PLMs such as AntiBERTy and general PLMs such as ProtTrans and ESM-2 for antigens provide improved epitope prediction performance with an AUCROC of $0.65$, precision of $0.28$, and recall of $0.46$. The source code is available at: \url{https://github.com/mansoor181/walle-pp.git}.
Enhancing Privacy Preservation and Reducing Analysis Time with Federated Transfer Learning in Digital Twins-based CT Scan Analysis
ABSTRACT. Integrating Digital Twin (DT) technology and Federated Learning (FL) is an emerging paradigm for biomedical image analysis, especially for Computed Tomography (CT) scans. A digital twin digitally represents each clinic or hospital's CT scanning systems and patient data. The traditional centralized data collection to train and support digital twin models is becoming unpopular due to privacy issues. This paper proposes a novel Federated Transfer Learning (FTL) paradigm for Digital Twin-based CT scan analysis. FTL uses pre-trained models and knowledge transfer across different hospitals, solving challenges, including data privacy, limited computational resources, and data heterogeneity. The proposed architecture allows effective real-time assistance between cloud servers and digital twin-enabled CT scanning systems, ensuring patient data protection and security. We assess the performance of FTL using various metrics such as convergence time, model accuracy, and computational efficiency on large-scale, heterogeneous CT scan datasets. The results indicate that FTL has faster convergence than federated and centralized learning. This method benefits medical diagnostic environments with non-IID data, thus offering reliability, flexibility, and security. This work will help improve the decision-making process of Digital Twin-based CT scan analysis using Federated Transfer Learning. It also provides possibilities for developing new applications in innovative healthcare and medicine.
Simulation of tumor evolution and metastasis at single-cell resolution
ABSTRACT. Cancer is a dynamic disease characterized by complex processes that lead to genetic heterogeneity and metastasis, both of which present considerable challenges for therapeutic intervention. A better understanding of the mechanisms that drive clonal expansion and cell migration is critical to improving prognosis and treatment. To study these mechanisms, computational methods applied to next-generation sequencing (NGS) data are used to reconstruct cell lineages and migration histories. As the capability of NGS technologies and the number of computational methods continue to increase, there is a growing need for versatile simulation tools that can both explore the impact of various mechanisms on tumor growth and metastasis and help evaluate the accuracy of inference methods under different conditions.
We present a new, agent-based simulation framework for modeling the evolution of cancer cells and metastatic spread. The key strength of our simulator is the flexibility it provides to study a multitude of cancer mechanisms, from the role of different mutation processes to migration patterns, under conditions specific to a cohort or individual patients. The simulator can be used to: (i) analyze the impact of mutation properties, selection parameters, and anatomy-specific migration rates on tumor growth dynamics and metastatic patterns, and (ii) evaluate and benchmark methods for phylogenetic reconstruction and migration history by generating raw or processed sequencing data from sampled cells. To meet specific experimental needs, the user is able to mix-and-match models of somatic mutation (e.g. copy number aberrations, single-nucleotide variants), selection (e.g. driver genes, fitness-driven migration), and sequencing technology (e.g. single-cell vs bulk, raw data vs count and mutation data), using either built-in or custom models. To demonstrate the utility and potential of our simulator, we evaluate the accuracy of migration inference methods with data generated using parameters fit to a large cohort.
Cancer Diseases Classification with Sparse Neural Networks: An Information-Theoretic Approach
ABSTRACT. Machine learning is indispensable for biomedical data modeling and classification. Tasks involving large, high-dimensional datasets are nevertheless computationally intensive and approximation methods are often sought to scale down the volume of raw data or model size without compromising substantial information embedded within the data. However, previous approximation methods have yielded mixed results and have yet to establish a clear framework linking feature selection and model sparsification. In this paper, we present an information-theoretic
approach for cancer classification by addressing two prominent questions in data model approximation: how to identify a minimal set of critical features in cancer microarray data and how to design sparse neural networks
that are effective and efficient for cancer classification. Our study highlights a key connection between these two challenges.
In particular, we introduce a mutual information (MI)-based method to select a highly informative subset of genes from extensive microarray gene expression data. Each selected subset of genes, up to two orders of magnitude smaller than the original gene set, demonstrates superior performance in cancer classification compared to the full dataset. Additionally, the MI-based method enables the design of sparsified neural networks that consistently maintain or even improve classification performance compared to fully connected networks. Our test results reveal
that sparsified networks selectively retain connections to the critical genes identified by the MI-based filtering method, effectively ignoring contributions from irrelevant genes.
Diversity and Distinctive Characteristics of the Global RNA Virome in Urban Environments
ABSTRACT. RNA viruses are the primary catalysts for infectious disease outbreaks, epidemics, and pandemics across multiple hosts including humans. The role of RNA viruses in urban areas remains largely unexplored. This study analyzed the metatranscriptome of 3,326 urban samples from 102 cities in 31 countries, uncovering 54,945 RNA viral units, 77% previously unseen. Two new phyla were discovered, enhancing our understanding of RNA virome phylogenetic diversity. The research also supports the polyphyletic nature of Duplornaviricota and identifies 104 amino acid sites in RNA polymerase that affect virus replication and host interaction. A distinct biogeographical pattern of RNA viruses was observed, indicating potential transmission routes in cities. The study revealed interactions between RNA viruses and ESKAPE pathogens (Staphylococcus aureus, Klebsiella pneumoniae, etc.), highlighting urban areas as significant reservoirs for RNA viruses. These findings underscore the need for continuous surveillance and mapping of urban environments to track RNA virus prevalence and dynamics, crucial for public health.
The study of the effects of the full-scale Russian-Ukrainian war on the epidemiological situation and respiratory and intestinal diseases in the Kharkiv region of Ukraine from 2022 to 2024
ABSTRACT. In this work, the effects of the Russian-Ukrainian war on respiratory and intestinal infections have been studied in detail in the Kharkiv region of Ukraine, which was one of the most heavily affected regions by the war. In particular, some parts of the Kharkiv region experienced warfare and were under the direct control of the Russian army for a prolonged period of time. In this work, the general epidemiological situation in the region has been studied and discussed over the past ten years, including the two years during which the war has been ongoing. The main attention has been given to COVID-19, influenza, and salmonella, which have been studied and will be discussed.
ABSTRACT. With the accumulation of large-scale genomic data such as whole-genome RNA sequencing, copy number, and mutation profiles for tens of thousands of samples, associated with screening thousands of small molecules and other perturbagens, arises the question of how to best leverage partially overlapping datasets generated at different facilities. As research groups across the world continue to generate drug screens of variable size and quality, the need for approaches that can learn from such partially overlapping experiments and improve the signal to noise ratio emerges with increasing importance. We present an application of a Bayesian group factor analysis model, where we employ a drug-centric prior to transfer information about drugs screened in the same samples in multiple datasets. We show that joint models leveraging partially overlapping pharmacogenomic datasets from the Broad and Sanger institutes can overall improve drug signature identification.
Evaluating $\delta$-Hyperbolicity of LLM Embeddings for Personalized Health Recommendations
ABSTRACT. The rapid advancement of large language models (LLMs) has enabled significant strides in various fields, including health recommendation systems. This paper introduces a novel approach to evaluate the effectiveness of LLM embeddings in the context of personalized health recommendations by assessing their $\delta$-hyperbolicity. $\delta$-hyperbolicity, a measure derived from geometric group theory, quantifies how much a metric space deviates from being a tree-like structure. By applying this measure to the embeddings generated from LLMs, we aim to understand the underlying geometric properties of these embeddings and their implications for recommendation accuracy. Our results demonstrate how LLM output exhibits varying degrees of hyperbolicity, offering insights into the structure of embeddings and their suitability for personalized health recommendation tasks.
HLA class I escape drives the evolution of SARS-CoV-2 in human population
ABSTRACT. SARS-CoV-2 evolution is shaped by human adaptive immunity, so that mutations allowing escape from the B-cell response confer selective advantage to the virus and spread in the population. Meanwhile, the role of escape from T-cellular cytotoxic response remains controversial. Here, we study the origin and spread of SARS-CoV-2 variants that allow escape from presentation by the HLA class I alleles common in human populations. We find that 35% of mutations that are characteristic of the variants of concern, and 34% of mutations that have reached high (>5%) frequencies in viral populations, facilitate escape of viral epitopes from presentation by HLA class I alleles. Mutations associated with escape from common HLA alleles reach higher frequencies than those that allow escape from less common HLA alleles, indicating that they are favored by selection. Moreover, viral escape mutations reach higher frequencies in those countries where the causal HLA alleles are more frequent, indicating that viral escape is driven by the local genetic composition of the human host population. The observed escape cannot be due to selection only in immunosuppressed individuals, and instead indicates selection over the course of acute SARS-CoV-2 infections. Together, these data indicate that CTL escape is a major driver of SARS-CoV-2 evolution and an epidemiological concern, and reveal a novel facet of selection on this virus.