ISBRA 2023: THE 19TH INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS RESEARCH AND APPLICATIONS
PROGRAM FOR WEDNESDAY, OCTOBER 11TH
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-10:00 Session 6: Keynote Talk.

Mark Robinson University of Zurich

On the care and feeding of (computational method) benchmarks.

10:20-12:20 Session 7A: Single-Cell Sequencing
10:20
scGASI: A graph autoencoder-based single-cell integration clustering method
PRESENTER: Tian-Jing Qiao

ABSTRACT. Single-cell RNA sequencing (scRNA-seq) technology offers the opportunity to study biological issues at the cellular level. The identification of single-cell types by unsupervised clustering is a basic goal of scRNA-seq data analysis. Although there have been a number of recent proposals for single-cell clus-tering methods, only a few of these have considered both shallow and deep potential information. Therefore, we propose a graph autoencoder-based sin-gle-cell integration clustering method, scGASI. Based on multiple feature sets, scGASI unifies deep feature embedding and data affinity recovery in a uniform framework to learn a consensus affinity matrix between cells. scGASI first constructs multiple feature sets. Then, to extract the deep poten-tial information embedded in the data, scGASI uses a graph autoencoder (GAEs) to learn the low-dimensional latent representation of the data. Next, to effectively fuse the deep potential information in the embedding space and the shallow information in the raw space, we design a multi-layer kernel self-expression integration strategy. This strategy uses a kernel self-expression model with multi-layer similarity fusion to learn a similarity ma-trix shared by the raw and embedding spaces of a given feature set, and a consensus learning mechanism to learn a consensus affinity matrix across all feature sets. Finally, the consensus affinity matrix is used for spectral clus-tering, visualization, and identification of gene markers. Large-scale valida-tion on real datasets shows that scGASI has higher clustering accuracy than many popular clustering methods.

10:40
Integrative analysis of gene expression and alternative polyadenylation from single-cell RNA-seq data

ABSTRACT. Single-cell RNA-seq (scRNA-seq) is a powerful technique for assaying tran-scriptional profile of individual cells. However, high dropout rate and overdisper-sion inherent in scRNA-seq hinders the reliable quantification of genes. Recent bioinformatic studies switched the conventional gene-level analysis to APA (al-ternative polyadenylation) isoform level, and revealed cell-to-cell heterogeneity in APA usages and APA dynamics in different cell types. The additional layer of APA isoforms creates immense potential to develop cost-efficient approaches for dissecting cell types by integrating multiple modalities derived from existing scRNA-seq experiments. Here we proposed a pipeline called scAPAfuse for en-hancing cell type clustering and identifying of novel/rare cell types by combing gene expression and APA profiles from the same scRNA-seq data. scAPAfuse first maps gene expression and APA profiles to a shared low-dimensional space using partial least squares. Then anchors (i.e., similar cells) between gene and APA profiles were identified by constructing the nearest neighbors of cells in the low-dimensional space, using algorithms like hyperplane local sensitive hash and shared nearest neighbor. Finally, gene and APA profiles were integrated to a fused matrix, using the Gaussian kernel function. Applying scAPAfuse on four public scRNA-seq datasets including human peripheral blood mononuclear cells (PBMCs) and Arabidopsis roots, new subpopulations of cells that were unde-tectable using the gene expression or APA profile alone were found. scAPAfuse provides a unique strategy to mitigate the high sparsity of scRNA-seq by fusing gene expression and APA profiles to improve cell type clustering, which can be included in many other routine scRNA-seq pipelines.

11:00
Inferring Boolean networks from single-cell human embryo datasets

ABSTRACT. This study aims to understand human embryonic development and cell fate determination, specifically in relation to trophectoderm (TE) maturation. We utilize single-cell transcriptomics (scRNAseq) data to develop a framework for inferring computational models that distinguish between two developmental stages. Our method selects pseudo-perturbations from scRNAseq data since actual perturbations are impractical due to ethical and legal constraints. These pseudo-perturbations consist of input-output discretized expressions, for a limited set of genes and cells. By combining these pseudo-perturbations with prior-regulatory networks, we can infer Boolean networks that accurately align with scRNAseq data for each developmental stage. Our publicly available method was tested with several benchmarks, proving the feasibility of our approach. Applied to the real dataset, we infer Boolean network families, corresponding to the medium and late TE developmental stages. Their structures reveal contrasting regulatory pathways, offering valuable biological insights and hypotheses within this domain.

11:15
CHLPCA: Correntropy-Based Hypergraph Regularized Sparse PCA for Single-cell Type Identification

ABSTRACT. Over the past decade, high-throughput sequencing technologies have driven a dramatic increase in single-cell RNA sequencing (scRNA-seq) data. The study of scRNA-seq data has widened the scope and depth of researchers' understanding of cellular heterogeneity. A prerequisite for studying heterogeneous cell populations is accurate cell type identification. However, the highly noisy and high-dimensional nature of scRNA-seq data poses a challenge to existing methods to further improve the success rate of cell type identification. Principal component analysis (PCA) is an important data analysis technique that is widely used to identify cell subpopulations. On the basis of PCA, we propose correntropy-based hypergraph regularized sparse PCA (CHLPCA) for accurate cell type identification. In addition to using correntropy to reduce the effect of noise, CHLPCA also considers higher-order relationships between samples by constructing the hypergraph, which compensates for the lack of local structure capture ability of PCA. Furthermore, we introduce the L2,1/5-norm into the model to enhance the interpretability of principal components (PCs), which further improves the model performance. CHLPCA has superior clustering accuracy and outperforms the best comparative method by 5.13% and 8.00% for ACC and NMI metrics, respectively. The results of clustering visualization experiments also confirm that CHLPCA can better perform the cell type recognition task.

11:30
Simulating tumor evolution from scDNA-seq as an accumulation of both SNVs and CNAs

ABSTRACT. Ever since single-cell sequencing (scDNA-seq) was coined 'method of the year' in 2013, it has provided many insights into the evolution of tumors, viewed as a branching process of accumulating can- cerous mutations that initiated with a single driver mutation — a model of clonal evolution which has been theorized almost half a century ago (Nowell, 1976). With this, is seen an explosion of methods for inferring the histories of such evolution, often in the form of a phylogenetic tree, from single-cell sequencing data. While the first methods modeled such evolution as an accumulation of point mutations (SNVs), copy number aberrations (CNAs, i.e., duplications or deletions of large genomic re- gions) are an important factor to consider. As a result, later methods began to bolster cancer phylogeny inference with bulk sequencing data, to account for CNAs. Despite the dozens of such inference methods avail- able, there still does not exist much in the form of a unified benchmark for all such methods.

This paper moves to initiate such a benchmark, which can be built upon, by proposing a simulator which models both SNVs and CNAs jointly in generating an evolutionary scenario which can be interpreted as a scDNA-seq/matched bulk sample pair. The simulator models the ac- cumulations of SNVs, and the duplication or deletion of chromosomal segments. We test this simulation on three methods: (a) a method which accounts for SNVs only, and under the infinite sites assumption (ISA), (b) a second more general method which models only SNVs, but allows for relaxations to the ISA, and (c) a third most general method which accounts for both SNVs and CNAs (and violations to the ISA). Results are consistent with the generality of these methods. This work is a step in the direction of developing a de-facto benchmark for cancer phylogeny inference methods.

10:20-12:20 Session 7B: Classification
10:20
Multi-Class Cancer Classification of Whole Slide Images through Transformer and Multiple Instance Learning

ABSTRACT. Whole slide image (WSI) produces images of high resolution, which are rich in details; however, it lacks localized annotations. WSI classification can be treated as a multiple instance learning (MIL) prob- lem while only slide-level labels are available. We introduce a approach for WSI classification that leverages the MIL and Transformer, effectively eliminating the requirement for localized annotations. Our method con- sists of three key components. Firstly, we use ResNet50, which has been pre-trained on ImageNet, as an instance feature extractor. Secondly, we present a Transformer-based MIL aggregator that adeptly captures con- textual information within individual regions and correlation information among diverse regions within the WSI. Our proposed approach effectively mitigates the issue of high computational complexity in the Transformer architecture by integrating linear attention. Thirdly, we introduce the global average pooling (GAP) layer to increase the mapping relationship between WSI features and category features, further improving classifi- cation accuracy. To evaluate our model, we conducted experiments on the CPTAC dataset. The results demonstrate the superiority of our ap- proach compared to previous MIL-based methods. Our proposed method achieves state-of-the-art performance in WSI classification without re- liance on localized annotations. Overall, our work offers a robust and effective approach that overcomes challenges posed by high-resolution WSIs and limited annotation availability.

10:40
Efficient Sequence Embedding For SARS-CoV-2 Variants Classification

ABSTRACT. Kernel-based methods, such as Support Vector Machines (SVM), have demonstrated their utility in various machine learning (ML) tasks, including sequence classification. However, these methods face two primary challenges: (i) the computational complexity associated with kernel computation, which involves an exponential time requirement for dot product calculation, and (ii) the scalability issue of storing the large $n \times n$ matrix in memory when the number of data points (n) becomes too large. Although approximate methods can address the computational complexity problem, scalability remains a concern for conventional kernel methods.

This paper presents a novel and efficient embedding method that overcomes both the computational and scalability challenges inherent in kernel methods. To address the computational challenge, our approach involves extracting the $k$-mers/nGrams (consecutive character substrings) from a given biological sequence, computing a sketch of the sequence, and performing dot product calculations using the sketch. By avoiding the need to compute the entire spectrum (frequency count) and operating with low-dimensional vectors (sketches) for sequences instead of the memory-intensive $n \times n$ matrix or full-length spectrum, our method can be readily scaled to handle a large number of sequences, effectively resolving the scalability problem.

Furthermore, conventional kernel methods often rely on limited algorithms (e.g., kernel SVM) for underlying ML tasks. In contrast, our proposed fast and alignment-free spectrum method can serve as input for various distance-based (e.g., $k$-nearest neighbors) and non-distance-based (e.g., decision tree) ML methods used in classification and clustering tasks. By applying our method solely to real-world biological sequences, specifically those of the coronavirus spike/Peplomer, we achieve superior predictive performance without the need for full-length genome sequences. Moreover, our proposed method outperforms several state-of-the-art embedding and kernel methods in terms of both predictive performance and computational runtime.

11:00
Unveiling the Robustness of Machine Learning Models in Classifying COVID-19 Spike Sequences

ABSTRACT. In the midst of the global COVID-19 pandemic, a wealth of data has become available to researchers, presenting a unique opportunity to investigate the behavior of the virus. This research aims to facilitate the design of efficient vaccinations and proactive measures to prevent future pandemics through the utilization of machine learning (ML) models for decision-making processes. Consequently, ensuring the reliability of ML predictions in these critical and rapidly evolving scenarios is of utmost importance. Notably, studies focusing on the genomic sequences of individuals infected with the coronavirus have revealed that the majority of variations occur within a specific region known as the spike (or S) protein. Previous research has explored the analysis of spike proteins using various ML techniques, including classification and clustering of variants. However, it is imperative to acknowledge the possibility of errors in spike proteins, which could lead to misleading outcomes and misguide decision-making authorities. Hence, a comprehensive examination of the robustness of ML and deep learning models in classifying spike sequences is essential. In this paper, we propose a framework for evaluating and benchmarking the robustness of diverse ML methods in spike sequence classification. Through extensive evaluation of a wide range of ML algorithms, ranging from classical methods like naive Bayes and logistic regression to advanced approaches such as deep neural networks, our research demonstrates that utilizing k-mers for creating the feature vector representation of spike proteins is more effective than traditional one-hot encoding-based embedding methods. Additionally, our findings indicate that deep neural networks exhibit superior accuracy and robustness compared to non-deep-learning baselines. To the best of our knowledge, this study is the first to benchmark the accuracy and robustness of machine-learning classification models against various types of random corruptions in COVID-19 spike protein sequences. The benchmarking framework established in this research holds the potential to assist future researchers in gaining a deeper understanding of the behavior of the coronavirus, enabling the implementation of proactive measures and the prevention of similar pandemics in the future.

11:20
MPFNet: ECG Arrhythmias Classication Based on Multi-Perspective Feature Fusion

ABSTRACT. Arrhythmia is a common cardiovascular disease that can cause sudden cardiac death. The electrocardiogram (ECG) signal is often used to diagnose the state of the heart. However, most existing ECG diagnostic methods only use information from a single perspective, ignoring the extraction of fusion information. In this paper, we propose a novel Multi-Perspective feature Fusion Network (MPFNet) for ECG arrhythmia classification. In this model, two independent feature extraction modules are first deployed to learn one-dimensional and two-dimensional ECG features from the original one-dimensional ECG signals and its corresponding recurrence plots. At the same time, an interactive feature extraction module based on bidirectional encoder-decoder is designed to further capture the interrelationships between one-dimensional and two-dimensional perspectives, and combine them with independent features from two different perspectives to enhance the completeness and accuracy of the final representation by utilizing the correlation and complementarity between perspectives. We evaluate our method on a large public ECG dataset and the experimental results demonstrate that MPFNet outperforms the state-of-the-art approaches.

11:40
Hist2Vec: Kernel-Based Embeddings for Biological Sequence Classification

ABSTRACT. Biological sequence classification is vital in various fields, such as genomics and bioinformatics. The advancement and reduced cost of genomic sequencing have brought the attention of researchers for protein and nucleotide sequence classification. Traditional approaches face limitations in capturing the intricate relationships and hierarchical structures inherent in genomic sequences, while numerous machine-learning models have been proposed to tackle this challenge. In this work, we propose Hist2Vec, a novel kernel-based embedding generation approach for capturing sequence similarities. Hist2Vec combines the concept of histogram-based kernel matrices and Gaussian kernel functions. It constructs histogram-based representations using the unique $k$-mers present in the sequences. By leveraging the power of Gaussian kernels, Hist2Vec transforms these representations into high-dimensional feature spaces, preserving important sequence information. Hist2Vec aims to address the limitations of existing methods by capturing sequence similarities in a high-dimensional feature space while providing a robust and efficient framework for classification. We employ kernel Principal Component Analysis (PCA) using standard machine-learning algorithms to generate embedding for efficient classification. Experimental evaluations on protein and nucleotide datasets demonstrate the efficacy of Hist2Vec in achieving high classification accuracy compared to state-of-the-art methods. It outperforms state-of-the-art methods by achieving $>76\%$ and $>83\%$ accuracies for DNA and Protein datasets, respectively. Hist2Vec provides a robust framework for biological sequence classification, enabling better classification and promising avenues for further analysis of biological data.

14:00-15:00 Session 9: Keynote Talk.

Sagi Snir The Department of Evolutionary and Environmental Biology, University of Haifa

 Assembling the Tree of Life in Light of Conflicting Signals

15:20-17:40 Session 10A: Learning
15:20
SGMDD: Subgraph Neural Network-Based Model for Analyzing Functional Connectivity Signatures of Major Depressive Disorder
PRESENTER: Yan Zhang

ABSTRACT. Biomarkers extracted from brain functional connectivity (FC) can assist in diagnosing various psychiatric disorders. Recently, several deep learning-based methods are proposed to facilitate the development of biomarkers for auxiliary diagnosis of depression and promote automated depression identification. Although they achieved promising results, there are still existing deficiencies. Current methods overlook the subgraph of braingraph and have a rudimentary network framework, resulting in poor accuracy. Conducting FC analysis with poor accuracy model can render the results unreliable. In light of the current deficiencies, this paper designed a subgraph neural network-based model named SGMDD for analyzing FC signatures of depression and depression identification. Our model surpassed many state-of-the-art depression diagnosis methods with an accuracy of 73.95%. To the best of our knowledge, this study is the first attempt to apply subgraph neural network to the field of FC analysis in depression and depression identification, we visualize and analyze the FC networks of depression on the node, edge, motif, and functional brain region levels and discovered several novel FC feature on multi-level. The most prominent one show that the hyperconnectivity of postcentral gyrus and thalamus could be the most crucial neurophysiological feature associated with depression, which may guide the development of biomarkers used for the clinical diagnosis of depression.

15:40
TCSA: A Text-guided Cross-view Medical Semantic Alignment Framework for Adaptive Multi-view Visual Representation Learning
PRESENTER: Hongyang Lei

ABSTRACT. Recently, in the medical domain, visual-language (VL) representation learning has demonstrated potential effectiveness in diverse medical downstream tasks. However, existing works typically pre-trained on the one-to-one corresponding medical image-text pairs, disregarding fluctuation in the quantity of views corresponding to reports (e.g., chest X-rays typically involve 1 to 3 projection views). This limitation results in sub-optimal performance in scenarios with varying quantities of views (e.g., arbitrary multi-view classification). To address this issue, we propose a novel Text-guided Cross-view Semantic Alignment (TCSA) framework for adaptive multi-view visual representation learning. For arbitrary number of multiple views, TCSA learns view-specific private latent sub-spaces and then maps them to a scale-invariant common latent sub-space, enabling individual treatment of arbitrary view type and normalization of arbitrary quantity of views to a consistent scale in the common sub-space. In the private sub-spaces, TCSA leverages word context as guidance to match semantic corresponding sub-regions across multiple views via cross-modal attention, facilitating alignment of different types of views in the private sub-space. This promotes the combination of information from arbitrary multiple views in the common sub-space. To the best of our knowledge, TCSA is the first VL framework for arbitrary multi-view visual representation learning. We report the results of TCSA on multiple external datasets and tasks. Compared with the state of the art frameworks, TCSA achieves competitive results and generalize well to unseen data.

16:00
A Convolutional Denoising Autoencoder for Protein Scaffold Filling
PRESENTER: Richard Annan

ABSTRACT. De novo protein sequencing is a valuable task in proteomics, yet it is not a fully solved problem. Many state-of-the-art approaches use top-down and bottom-up tandem mass spectrometry (MS/MS) to sequence proteins. However, these approaches often produce protein scaffolds, which are incomplete protein sequences with gaps to fill between contiguous regions. In this paper, we propose a novel convolutional denoising autoencoder (CDA) model to perform the task of filling gaps in protein scaffolds to complete the final step of protein sequencing. We demonstrate our results both on a real dataset and eleven randomly generated datasets based on the MabCampath antibody. Our results show that the proposed CDA outperforms recently published hybrid convolutional neural network and long short-term memory (CNN-LSTM) based sequence model. We achieve 100% gap filling accuracy and 95.32% full sequence accuracy on the MabCampth protein scaffold.

16:15
Enhancing t-SNE Performance for Biological Sequencing Data through Kernel Selection

ABSTRACT. The genetic code for many different proteins can be found in biological sequencing data, which offers vital insight into the genetic evolution of viruses. While machine learning approaches are becoming increasingly popular for many ``Big Data'' situations, they have made little progress in comprehending the nature of such data. One such area is the t-distributed Stochastic Neighbour Embedding (t-SNE), a general-purpose approach used to represent high dimensional data in low dimensional (LD) space while preserving similarity between data points. Traditionally, the Gaussian kernel is used with t-SNE. However, since the Gaussian kernel is not data-dependent, it only determines each local bandwidth based on one local point. This makes it computationally expensive, hence limited in scalability. Moreover, it can misrepresent some structures in the data. An alternative is to use the isolation kernel, which is a data-dependent method. However, it has a single parameter to tune in computing the kernel. Although the isolation kernel yields better performance in terms of scalability and preserving the similarity in LD space, it may still not perform optimally in some cases. This paper presents a perspective on improving the performance of t-SNE and argues that kernel selection could impact this performance. We use 9 different kernels to evaluate their impact on the performance of t-SNE, using SARS-CoV-2 ``spike'' protein sequences. With three different embedding methods, we show that the cosine similarity kernel gives the best results and enhances the performance of t-SNE.

16:30
PDB2Vec: Using 3D Structural Information For Improved Protein Analysis

ABSTRACT. In recent years, machine learning methods have shown remarkable results in various protein analysis tasks, including protein classification, folding prediction, and protein-to-protein interaction prediction. However, most studies focus only on the 3D structures or sequences for the downstream classification task. Hence analyzing the combination of both 3D structures and sequences remains comparatively unexplored. This study investigates how incorporating protein sequence and 3D structure information influences protein classification performance. We use two well-known datasets, STCRDAB and PDB Bind, for classification tasks to accomplish this. To this end, we propose an embedding method called PDB2Vec to encode both the 3D structure and protein sequence data to improve the predictive performance of the downstream classification task. We performed protein classification using three different experimental settings: only 3D structural embedding (called PDB2Vec), sequence embeddings using alignment-free methods from the biology domain including on $k$-mers, position weight matrix, minimizers and spaced $k$-mers, and the combination of both structural and sequence-based embeddings. Our experiments demonstrate the importance of incorporating both three-dimensional structural information and amino acid sequence information for improving the performance of protein classification and show that the combination of structural and sequence information leads to the best performance. We show that both types of information are complementary and essential for classification tasks.

16:45
DCNN: Dual-Level Collaborative Neural Network for Imbalanced Heart Anomaly Detection

ABSTRACT. The electrocardiogram (ECG) plays an important role in assisting clinical diagnosis such as arrhythmia detection. However, traditional techniques for ECG analysis are time consuming and laborious. Recently, deep neural networks have become a popular technique for automatically tracking ECG signals, which has demonstrated that they are more competitive than human experts. However, the minority class of life-threatening arrhythmias causes the model training to skew towards the majority class. To address the problem, we propose a dual-level collaborative neural network (DCNN), which includes data-level and cost-sensitive level modules. In the Data Level module, we utilize the generative adversarial network with Unet as the generator to synthesize ECG signals. Next, the Cost-sensitive Level module employs focal loss to increase the cost of incorrect prediction of the minority class. Empirical results show that the Data Level module generates highly accurate ECG signals with fewer parameters. Furthermore, DCNN has been shown to significantly improve the classification of the ECG.

15:20-17:40 Session 10B: Imaging/Signal Processing
15:20
SaID: Simulation-aware Image Denoising Pre-trained Model for Cryo-EM Micrographs
PRESENTER: Zhidong Yang

ABSTRACT. Cryo-Electron Microscopy (cryo-EM) is a revolutionary technique for determining the structures of proteins and macromolecules. Physical limitations of the imaging conditions cause a very low Signal-to-Noise Ratio (SNR) in cryo-EM micrographs, resulting in difficulties in downstream analysis and accurate ultrastructure determination. Hence, the effective denoising algorithm for cryo-EM micrographs is in demand to facilitate the quality of analysis in macromolecules. However, lacking rich and well-defined dataset with ground truth images, supervised image denoising methods generalize poorly to experimental micrographs. To address this issue, we present a Simulation-aware Image Denoising (SaID) pre-trained model for improving the SNR of cryo-EM micrographs by only training with the accurately simulated dataset. Firstly, we devise a calibration algorithm for the simulation parameters of cryo-EM micrographs to fit the experimental micrographs. Secondly, with the accurately simulated dataset, we propose to train a deep general denoising model which can well generalize to real experimental cryo-EM micrographs. Extensive experimental results demonstrate that our pre-trained denoising model can perform outstandingly on experimental cryo-EM micrographs and simplify the downstream analysis. This indicates that a network only trained with accurately simulated noise patterns can reach the capability as if it had been trained with rich real data. Code and data will be available at https://github.com/ZhidongYang/SaID.

15:40
Attention-Guided Residual U-Net with SE Connection and ASPP for Watershed-based Cell Segmentation in Microscopy Images

ABSTRACT. Time-lapse microscopy imaging is an important method used in biomedical studies to observe how cells behave over time. This technique provides valu-able cell numbers, sizes, shapes, and interaction data. Manual analysis of hundreds or thousands of cells is impractical, necessitating automated cell segmentation approaches. Due to their success, deep learning (DL) based methods, particularly those using U-Net-based networks, have gained popu-larity in medical and microscopy image segmentation. However, accurately segmenting touching cells in images with low signal-to-noise ratios remains challenging. Existing methods often simplistically combine low-level and high-level features, leading to model confusion. To address these issues, we propose a novel framework called RA-SE-ASPP-Net, which incorporates Re-sidual Blocks (RB), Attention Mechanism (AM), Squeeze and-Excitation (SE) connection, and Atrous Spatial Pyramid Pooling (ASPP) for precise and robust cell segmentation. We evaluate our proposed architecture using an in-duced pluripotent stem (iPS) cell reprogramming dataset, which has received limited attention in this field. Additionally, we compare our model with dif-ferent ablation experiments to demonstrate its robustness. Our dataset achieves mean Jaccard scores of 0.835, 0.854, 0.846, 0.862, 0.871, 0.889, and 0.89 for U-Net, Att-U-Net, ResU-Net, ResAtt-U-Net, ResU-Net-SE, ResU-Net-ASPP, and RA-SE-ASPP-Net, respectively. The proposed architec-ture outperforms the baseline models in all evaluated metrics, providing the most accurate semantic segmentation results. Finally, we applied the water-shed method to the semantic segmentation results to obtain precise segmenta-tions with specific information for each cell. The source code is publicly available at https://github.com/jovialniyo93/cell-segmentation.

16:00
Multi-modality MRI Feature Interaction for Pseudoprogression Prediction of Glioblastoma

ABSTRACT. Pseudoprogression (Psp) prediction of glioblastoma (GBM) is a challenging task in clinical practice. Currently, Psp prediction of GBM is mainly performed by scanning different magnetic resonance imaging (MRI) modalities. However, how to effectively make use of the complementary information between the multi-modality to improve Psp prediction of GBM is still a challenge. To address these challenges, we propose a multi-modality MRI feature interaction method for Psp prediction of GBM, using T1 and T2 MRI. To mine multi-modality multi-scale features, we design a multi-scale feature extraction network based on a three-branch asymmetric convolution (TAC) block. Particularly, to make full use of the complementary information between T1 and T2 MRI, we propose a multi-modality MRI feature interaction (MMFI) module. Our proposed method is evaluated on a private dataset from Hunan Cancer Hospital including 10 subjects with Psp and 42 subjects with relapse. The experimental results show that the average accuracy (ACC) and area under the receiver operating characteristic curve (AUC) of the proposed method are 0.954 and 0.929, respectively. Compared with some existing methods, the proposed method can obtain better results. In summary, our proposed method has the potential for Psp prediction of GBM in clinical practice.

16:20
NeoMS: Identification of Novel MHC-I Peptides with Tandem Mass Spectrometry

ABSTRACT. The study of immunopeptidomics requires the identification of both regular and mutated MHC-I peptides from mass spectrometry data. For the efficient identification of MHC-I peptides with either one or no mutation from a sequence database, we propose a novel work- flow: NeoMS. It employs three main modules: generating an expanded sequence database with a tagging algorithm, a machine learning-based scoring function to maximize the search sensitivity, and a careful target- decoy implementation to control the false discovery rates (FDR) of both the regular and mutated peptides. Experimental results demonstrate that NeoMS both improved the identification rate of the regular peptides over other database search software and identified hundreds of mutated pep- tides that have not been identified by any current methods. Further study shows the validity of these new novel peptides.

16:40
Radiology Report Generation via Visual Recalibration and Context Gating-aware

ABSTRACT. The task of radiology report generation aims to analyze medical images, extract key information, and then assist medical personnel in generating detailed and accurate reports. Therefore, automatic radiology report generation plays an important role in medical diagnosis and healthcare. However, radiology medical data face the problems of visual and text data bias:medical images are similar to each other, and the normal feature distribution is larger than the abnormal feature distribution; second, the accurate location of the lesion and the generation of accurate and coherent long text reports are important challenges. In this paper, we propose Visual Recalibration and Context Gating-aware model (VRCG) to alleviate visual and textual biases for enhancing report generation. We employ a medical visual recalibration module to enhance the key lesion features' extraction. We use the context gating-aware module to combine lesion location and report context information to solve the problem of long-distance dependence in diagnostic reports. Meanwhile, the context gating-aware module can identify text fragments related to lesion descriptions, improve the model's perception of lesion text information, and then generate coherent, consistent medical reporting. Extensive experiments demonstrate that our proposed model outperforms existing baseline models on publicly available IU X-Ray datasets.