Program for Tuesday, October 10th

PROGRAM FOR TUESDAY, OCTOBER 10TH

Days:

previous day

next day

all days

View: session overview talk overview

08:30-09:00 Registration

09:00-09:30 Session 1: Official Opening of the Conference

09:30-10:30 Session 2: Keynote Talk.

Teresa Przytycka NIH National Library of Medicine, National Center for Biotechnology Information

Delineating relation between mutagenic signatures, cellular processes, and environment through computational approaches

Chair:

Alex Zelikovsky

10:30-10:50 Coffee Break

10:50-12:20 Session 3A: Cancer/Health

Chair:

Alex Zelikovsky

10:50	Rui Gao, Zixue Liu, Mei Meng and Jian He Neurogenesis-associated Protein, a Potential Prognostic Biomarker in anti-PD-1 based kidney renal clear cell carcinoma patients therapeutics PRESENTER: Zixue Liu ABSTRACT. Background: TKTL1 is an essential factor that has been found to perform an important role in brain development. Some studies have shown the influence of TKTL1 in cancers, but it is rarely reported in kidney cancer. Furthermore, the relationship of TKTL1 to prognosis potential and tumor infiltration immune cells in different cancers, especially kidney cancer, is still unclarified. Methods: TKTL1 expression and its clinical characteristics were evaluated on various databases. Also, the correlation between TKTL1 and TILs in the tumor and normal adjacent tissue of three types of renal patients respectively by using various types of bioinformatics approaches. Nextly, the association between TKTL1 and immune infiltrates of various types of cancer was investigated via TIMER. Furthermore, we studied the relationship between TKTL1 expression and response to PD-1 blocker immunotherapy in renal cancer and performed molecular docking to screen TKTL1 agonists. Results: We constructed a systematic prognostic landscape in various types of cancer and enclosed that TKTL1 significantly affects the prognostic potential in different types of kidney cancer patients. And the underlying mechanism might be the expression level of TKTL1 was positively associated with devious immunocytes in kidney renal clear cell carcinoma (KIRC) rather than in kidney renal papillary cell carcinoma (KIRP) and kidney chromophobe (KICH). Moreover, this recruitment may result from the upregulation of the mTOR signaling pathway affecting T-cell metabolism. We also found that TKTL1 might be an immunomodulator in KIRC patients' response to anti-PD-1 therapy. Finally, we found that 3-hydroxyflavone demonstrated a potential TKTL1 agonist compared to other flavonoids. Conclusions: Our discovery implies that TKTL1 appears to be a promising prognostic biomarker for KIRC patients that response to anti-PD-1 therapy. Moreover, flavonols might be a potential therapeutic combination to anti-PD-1-based immunotherapy.
11:10	Bikram Sahoo and Alex Zelikovsky Deep Learning Reveals Biological Basis of Racial Disparities in Quadruple-Negative Breast Cancer PRESENTER: Bikram Sahoo ABSTRACT. Triple-negative breast cancer (TNBC) lacks crucial receptors. More aggressive is quadruple-negative (QNBC), which lacks androgen receptors. Racial disparities emerge, with African Americans facing worse QNBC outcomes. Our study deploys deep neural networks to identify QNBC ancestral biomarkers. Achieving 0.85 accuracy and 0.928 AUC, the model displays robust learning, optimized through hyperparameter tuning. Top genes are chosen via ANOVA rankings and hypothesis testing, highlighting \emph{ABCD1} as significant post-correction. Effect sizes suggest important shifts in other genes. This approach enhances QNBC understanding, particularly racial aspects, potentially guiding targeted treatments.
11:25	Bikram Sahoo and Alex Zelikovsky Exploring Racial Disparities in Triple-Negative Breast Cancer: Insights from Feature Selection Algorithms PRESENTER: Bikram Sahoo ABSTRACT. Triple-negative breast cancer (TNBC) represents an aggressive and heterogeneous form of breast cancer with poor clinical outcomes. It lacks estrogen, progesterone, and human epidermal growth factor receptor, which limits treatment options. Notably, the incidence of TNBC is higher in African American (AA) women compared to European American (EA) women, resulting in worse clinical outcomes. The racial disparity observed in TNBC can be attributed to socioeconomic factors, lifestyle, and tumor biology. In this study, we explored feature selection algorithms, including filters, wrappers, and embedded methods, to identify significant genes associated with racial disparities. Our findings reveal that genes such as LOC90784, LOC101060339, XRCC6P5, and TREML4 were consistently selected by both correlation and information gain-based filter methods. Moreover, in our two-stage embeddedbased feature selection algorithm, we consistently identified LOC90784, STON1-GTF2A1L, and TREML4 as crucial genes across high-performing machine learning algorithms. Particularly noteworthy is the consistent selection of LOC90784 by all three filter selection methods. These comprehensive results, obtained through the implementation of three different feature selection algorithms, offer valuable insights to researchers studying racial disparities
11:40	Yulong Li, Hongming Zhu, Xiaowen Wang and Qin Liu HetBiSyn: Predicting Anticancer Synergistic Drug Combinations Featuring Bi-perspective Drug Embedding with Heterogeneous Data PRESENTER: Yulong Li ABSTRACT. Synergistic drug combination is a promising solution to cancer treatment. Since the combinatorial space of drug combinations is too vast to be traversed through experiments, computational methods based on deep learning have shown huge potential in identifying novel synergistic drug combinations. Meanwhile, the feature construction of drugs has been viewed as a crucial task within drug synergy prediction. Recent studies shed light on the use of heterogeneous data, while most studies make independent use of relational data of drug-related biomedical interactions and structural data of drug molecule, thus ignoring the intrinsical association between the two perspectives. In this study, we propose a novel deep learning method termed HetBiSyn for drug combination synergy prediction. HetBiSyn innovatively models the drug-related interactions between biomedical entities and the structure of drug molecules into different heterogeneous graphs, and designs a self-supervised learning framework to obtain a unified drug embedding that simultaneously contains information from both perspectives. In details, two separate heterogeneous graph attention networks are adopted for the two types of graph, whose outputs are utilized to form a contrastive learning task for drug embedding that is enhanced by hard negative mining. We also obtain cell line features by exploiting gene expression profiles. Finally HetBiSyn uses a DNN with batch normalization to predict the synergy score of a combination of two drugs on a specific cell line. The experiment results show that our model outperforms other state-of-art DL and ML methods on the same synergy prediction task. The ablation study also demonstrates that our drug embeddings with bi-perspective information learned through the end-to-end process is significantly informative, which is eventually helpful to predict the synergy scores of drug combinations.

10:50-12:20 Session 3B: RNA/Transcriptomics

Chair:

Magda Mielczare

10:50	Jingjing Zhang, Md. Tofazzal Hossain, Zhen Ju, Wenhui Xi and Yanjie Wei Identification and functional annotation of circRNAs in neuroblastoma based on bioinformatics ABSTRACT. Neuroblastoma is a prevalent solid tumor affecting children, with a low 5-year survival rate in high-risk patients. Previous studies have shed light on the involvement of specific circRNAs in neuroblastoma development. However, there is still a pressing need to identify novel therapeutic targets associated with circRNAs. In this study, we performed an integrated analysis of two circRNA sequencing datasets, the results revealed dysregulation of 36 circRNAs in neuroblastoma tissues, with their parental genes likely implicated in tumor development. In addition, we identified three specific circRNAs, namely hsa_circ_0001079, hsa_circ_0099504, and hsa_circ_0003171, that exhibit interaction with miRNAs, modulating the expression of genes associated with neuroblastoma. Additionally, by analyzing the translational potential of differentially expressed circRNAs, we uncovered seven circRNAs with the potential capacity for polypeptide translation. Notably, structural predictions suggest that the protein product derived from hsa_circ_0001073 belongs to the TGF-beta receptor protein family, indicating its potential involvement in promoting neuroblastoma occurrence.
11:10	Xuehua Bi, Chunyang Jiang, Cheng Yan, Kai Zhao, Linlin Zhang and Jianxin Wang Identifying miRNA-disease Associations based on Simple Graph Convolution with DropMessage and Jumping Knowledge ABSTRACT. MiRNAs play an important role in the occurrence and development of human disease. Identifying potential miRNA-disease associations is valuable for disease diagnosis and treatment. Therefore, it is very urgent to develop efficient computational methods for predicting potential miRNA-disease associations in order to reduce the cost and time associated with biological wet experiments. In addition, although the good performance achieved by graph neural network methods for predicting miRNA-disease associations, they still face the risk of oversmoothing and have room for improvement. In this paper, we propose a novel model named nSGC-MDA, which employs a modified Simple Graph Convolution (SGC) to predict the miRNA-disease associations. Specifically, we first construct a bipartite attributed graph for miRNAs and diseases by computing multi-source similarity. Then we adapt SGC to extract the features of miRNAs and diseases on the graph. To prevent over-fitting, we randomly drop the message during message propagation and employ Jumping Knowledge (JK) during feature aggregation to enhance feature representation. Furthermore, we utilize a feature crossing strategy to get the feature of miRNA-disease pairs. Finally, we calculate the prediction scores of miRNA-disease pairs by using a fully connected neural network decoder. In the five-fold cross-validation, nSGC-MDA achieves a mean AUC of 0.9502 and a mean AUPR of 0.9496, outperforming six compared methods. The case study of cardiovascular disease also demonstrates the effectiveness of nSGC-MDA.
11:30	Sarah von Loehneysen, Thomas Spicher, Yuliia Varenyk, Hua-Ting Yao, Ronny Lorenz, Ivo Hofacker and Peter F. Stadler Phylogenetic Information as Soft Constraints in RNA Secondary Structure Prediction PRESENTER: Sarah von Loehneysen ABSTRACT. Pseudo-energies are a generic method to incorporate extrinsic information into energy-directed RNA secondary structure predictions. Consensus structures of RNA families, usually predicted from multiple sequence alignments, can be treated as soft constraints in this manner. In this contribution we first revisit the theoretical framework and then show that pseudo-energies for the centroid base pairs of the consensus structure result in a substantial increase in folding accuracy. In contrast, only a moderate improvement can be achieved if only the information that a base is predominantly paired is utilized.
11:50	Rafał Stępień, Joanna Szyda, Bartosz Czech and Magda Mielczarek The effect of transcriptomic annotations in breast cancer DGE study PRESENTER: Magda Mielczarek ABSTRACT. Gene expression profiling is crucial for understanding breast cancer biology and treatment individualization. The aim of this study was to elucidate the transcriptome annotation effect on differential gene expression (DGE) and breast cancer survival prognosis. DGE analyses were performed for MCF7 breast cancer (case) and normal tissues (control). The pipeline comprised quality control, quality-based data editing, transcript expression quantification, and DGE analysis. Two quantified transcripts expression outputs were used to apply four approaches defining DGE between (A1) case and control samples quantified based on GRCh37 assembly, (A2) case and control on GRCh38, (A3) case on GRCh37 and case on GRCh38 and (A4) control on GRCh37 and control on GRCh38. Identical Hallmark pathways resulted in Gene Set Enrichment Analysis for both A1 and A2, except Pancreas beta cells presented in A1 only. The Kyoto Encyclopedia of Genes and Genomes pathways presented only in one approach involved: Melanoma and Prostate cancer (A1) and ABC transporters, Acute myeloid leukaemia, Glycerophospholipid, and retinol metabolism, Hedgehog and p53 signalling (A2). Principal Component Analysis determined that the greatest variability (97%) was found between cancer and normal samples (A1, A2) and GRCh37 and GRCh38 annotations (A3). For A4 the variability determined by the annotations was lower (40%). The difference between the average expression of prognostic genes associated with survival in breast cancer (NADERI) between GRCh37 and GRCh38 was not statistically significant (P-value=0.91). The overall DGE outcomes were not identical between GRCh37 and GRCh38 annotations, however, the transcriptome annotation had no effect on survival prognosis in breast cancer.

12:20-13:00 Lunch Break

13:00-14:00 Matchmaking event | Registration required

A matchmaking event is a quick and easy way to meet potential cooperation partners. Via b2Match platform one has a possibility to schedule short one to one meetings onsite (during conferece at PORT) or online. 20 minutes run fast, but they are enough to build first connections before the meeting ends and the next talk starts.

We cordially invite you to „Neuroscience meets Bioinformatics for Horizon Europe" matchmaking sessions organized by the Industry Contact Point for Medical Technologies and Health. Sessions will take place on October 10-11, 2023, at PORT during ISBRA conference, as well as online until 18.10.2023. Matchmaking will allow participants to get to know each other and talk. Its goal is to strengthen collaboration by building project consortia and implementing joint interdisciplinary projects. Registration for the event lasts until October 10 via a dedicated form. Note please: when registering, your email is case-sensitive, so use only lowercase letters. If you have any questions, please contact the BPK TMiZ team at our Institute: Monika Ślęzak and Katarzyna Banyś.

14:00-15:00 Session 4: Keynote Talk.

Anna Gambin Faculty of Mathematics, Informatics and Mechanics, University of Warsaw

Statistical modeling in proteomics.

Chair:

Murray Patterson

15:00-15:20 Coffee Break

15:20-17:40 Session 5A: Theory

Chair:

Carlile Lavor

15:20	Enrico Rossignolo and Matteo Comin USTAR: Improved Compression of k-mer Sets with Counters Using De Bruijn Graphs ABSTRACT. A fundamental operation in computational genomics is to reduce the input sequences to their constituent k-mers. Finding a space- efficient way to represent a set of k-mers is important for improving the scalability of bioinformatics analyses. One popular approach is to convert the set of k-mers into a de Bruijn graph and then find a compact representation of the graph through the smallest path cover. In this paper, we present USTAR, a tool for compressing a set of k-mers and their counts. USTAR exploits the node connectivity and density of the de Bruijn graph enabling a more effective path selection for the construction of the path cover. We demonstrate the usefulness of USTAR in the compression of read datasets. USTAR can improve the compression of UST, the best algorithm, from 2.3% up to 26,4%, depending on the k-mer size. The code of USTAR and the complete results are available at the repos- itory https://github.com/enricorox/USTAR
15:40	Gabriel Siqueira, Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Géraldine Jean, Guillaume Fertin and Zanoni Dias Approximating Rearrangement Distances with Replicas and Flexible Intergenic Regions PRESENTER: Gabriel Siqueira ABSTRACT. Many tools from Computational Biology compute distances between genomes by accounting the number of genome rearrangement events, such as reversals of a segment of genes. Most approaches to model these problems consider some simplifications such as ignoring nucleotides outside genes (the so-called intergenic regions), or assuming that just a single copy of each gene exists in the genomes. Recent works made advancements in more general models, considering replicated genes and intergenic region information. Our work aims at adapting those results by applying some flexibilization to the representation of intergenic region information. We propose the Signed Flexible Intergenic Reversal Distance problem, which seeks the minimum number of reversals necessary to transform one genome into the other and encodes the genomes using flexible intergenic region information while also allowing multiple copies of a gene. We show the relationship of this problem with the Signed Minimum Common Flexible Intergenic String Partition problem and use a 2k-approximation to the partition problem to show a 8k-approximation to the distance problem, where k is the maximum number of copies of a gene in the genomes.
16:00	Joyanta Basak, Ahmed Soliman, Nachiket Deo, Kenneth Haase, Anup Mathur, Krista Park, Rebecca Steorts, Daniel Weinberg, Sartaj Sahni and Sanguthevar Rajasekaran On Computing the Jaro Similarity Between Two Strings PRESENTER: Sanguthevar Rajasekaran ABSTRACT. Jaro similarity is widely used in computing the similarity (or distance) between two strings of characters. For example, record linkage is an application of great interest in many domains for which Jaro similarity is popularly employed. Existing algorithms for computing the Jaro similarity between two given strings take quadratic time in the worst case. In this paper, we present an algorithm for Jaro similarity computation that takes only linear time. We also present experimental results that reveal that our algorithm outperforms existing algorithms.
16:20	Guy Katriel, Udi Mahanaymi, Christoph Koutschan, Doron Zeilberger, Mike Steel and Sagi Snir Using Generating Functions to Prove Additivity of Gene-Neighborhood Based Phylogenetics PRESENTER: Sagi Snir ABSTRACT. Prokaryotic evolution is often described as the Spaghetti of Life due to massive genome dynamics (GD) events of gene gain and loss, resulting in different evolutionary histories for the set of genes comprising the organism. These different histories, dubbed as gene trees provide confounding signals, hampering the attempt to reconstruct the species tree describing the main trend of evolution of the species under study. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing comparison of unequal gene content genomes, together with order considerations of their common genes. Recently, GD has been modelled as a continuous-time Markov process. Under this formulation, distance between genes along the chromosome, was shown to follow a birth-death-immigration process. Using classical results from birth-death theory, we recently showed that the SI measure is consistent under that formulation. In this work we provide an alternative, stand-alone combinatorial proof of the same result. By using generating function techniques we derive explicit expressions of the system’s probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically the expected distances between organisms based on a transformation of their SI. Although the ex- pressions obtained are rather complex, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). This approach relies on holonomic functions and the Zeilberger Algorithm in order to establish additivity of the transformation of SI.
16:40	Sumaira Zaman and Mukul S. Bansal Reducing the impact of domain rearrangement on sequence alignment and phylogeny reconstruction ABSTRACT. Existing computational approaches for studying gene family evolution generally do not account for domain rearrangement within gene families. However, it is well known that protein domain architectures often differ between genes belonging to the same gene family. In particular, domain shuffling can lead to out-of-order domains which, unless explicitly accounted for, can significantly impact even the most fundamental of tasks such as multiple sequence alignment and phylogeny inference. In this work, we make progress towards addressing this important but often overlooked problem. Specifically, we (i) demonstrate the impact of protein domain shuffling and rearrangement on multiple sequence alignment and gene tree reconstruction accuracy, (ii) propose two new computational methods for correcting gene sequences and alignments for improved gene tree reconstruction accuracy and evaluate them using realistically simulated datasets, and (iii) assess the potential impact of our new methods and of two existing approaches, MDAT and ProDA, in practice by applying them to biological gene families. We find that the methods work very well on simulated data but that performance of all methods is mixed, and often complementary, on real biological data, with different methods helping improve different subsets of gene families.
17:00	Huixiu Xu, Xin Tong, Haitao Jiang, Lusheng Wang, Binhai Zhu and Daming Zhu On Sorting by Flanked Transpositions PRESENTER: Lusheng Wang ABSTRACT. Transposition is a well-known genome rearrangement event that switches two consecutive segments on a genome. The problem of sorting permutations by transpositions has attracted a great amount of interest since it was introduced by Bafna and Pevzner in 1995. However, empirical evidence has reported that, in many genomes, the participation of repeat segments is inevitable during genome evolution and the breakpoints where a transposition occurs are most likely accompanied by a triple of repeated segments. For example, a transposition will transform r x r y z r into r y z r x r, where r is a relative short repeat appearing three times and x and y are long segments involved in the transposition. For this transposition event, the neighbors of segments x and y remain the same before and after the transposition. This type of transposition is called flanked transposition. In this paper, we investigate the problem of sorting by flanked transpositions, which requires a series of flanked transpositions to transform one genome into another. First, we present an O(n) expected running time algorithm to determine if a genome can be transformed into the other genome by a series of flanked transposition for a special case, where each adjacency (roughly two neighbors of two element in the genome) appears once in both input genomes. We then extend the decision algorithm to work for the general case with the same expected running time O(n). Finally, we show that the new version, sorting by minimum number of flanked transpositions is also NP-hard.
17:20	Michael Souza, Nilton Maia and Carlile Lavor The Ordered Covering Problem in Distance Geometry PRESENTER: Michael Souza ABSTRACT. This study is motivated by the Discretizable Molecular Distance Geometry Problem (DMDGP), a specific category in Distance Geometry, where the search space is discrete. We address the challenge of ordering the DMDGP constraints, a critical factor in the performance of the state-of-the-art SBBU algorithm. To this end, we formalize the constraint ordering problem as a vertex cover problem, which diverges from traditional covering problems due to the substantial importance of the sequence of vertices in the covering. In order to solve the covering problem, we propose a greedy heuristic and compare it to the ordering of the SBBU. The computational results indicate that the greedy heuristic outperforms the SBBU ordering by an average factor of 1,300x.

15:20-17:40 Session 5B: Interaction/Binding/Function Prediction

Chair:

Murray Patterson

15:20	Ming Chen, Bin Yao, Xiujuan Lei, Chunyan Ji, Zitao Hu and Yi Pan Predicting Comprehensive Drug-Drug Interactions by Magnetic Signed Graph Neural Network ABSTRACT. Drug combination is a common means of clinical treatments, but detection and evaluation of drug-drug interactions (DDI) can be expensive. Graph neural network (GNN) is a popular method for DDI prediction, which has achieved encouraging performance in the scenarios of mono-type and multi-type DDI. However, most studies ignore the comprehensive information of DDI, such as the signs of DDI, asymmetric roles of drugs in pharmacological changes. In this article, we model DDI on signed and directed graphs with node attributes and define multiple tasks which include the sign$\&$direction prediction beyond individual tasks. Furthermore, we put forward a framework, called MSGNN-DDI, which uses spectral information of DDI networks and builds GNN models based on magnetic signed Laplacians. The framework not only facilitates the prediction of signs and directions in pharmacological changes but is also adaptable for their combination task. Our experiments use the drug data extracted from DrugBank and PubChem databases, and the results show the feasibility of our method on multiple tasks. The case study further verifies its effectiveness in sign$\&$direction prediction.
15:40	Hui Feng, Guishen Wang and Chen Cao BiRNN-DDI:A Drug-drug Interaction Event Type Prediction Model based on Bidirectional Recurrent Neural Network and Graph to Sequence Representation PRESENTER: Hui Feng ABSTRACT. Drug-drug interactions (DDIs) prediction is helpful for better under- standing drug adverse reactions and drug combinations. Recent works reveal the importance of DDI event-type prediction. Hence, this paper proposes a Bidirec- tional Recurrent Neural Network for drug-drug interaction event type prediction (BiRNN-DDI). BiRNN-DDI model first constructs drug feature graphs based on drug feature similarity. To mine contextual information in DDI, the BiRNN-DDI model uses a graph-to-sequence model to transform drug feature homogeneous graphs into drug sequence representation. Then, a two-channel structure model consisting of the BiRNN is proposed to get contextual DDI sequence representa- tions. Finally, a feedforward neural network is used to predict the DDI event type. To test the effectiveness of our BiRNN-DDI model, representative state-of-the-art models are compared in two drug-drug interaction event type benchmarks. Ex- tensive experimental results show that our BiRNN-DDI model outperforms other compared models regarding precision, recall, and F1 value measures. In the mean- time, experiment results demonstrate that our model has lower parameter space. It indicates that our model is able to learn drug feature representations and predict possible drug-drug interaction event types more effectively.
16:00	Md. Tofazzal Hossain, Md. Selim Reza, Yin Peng, Shengzhong Feng and Yanjie Wei PCPI: Prediction of circRNA and protein interaction using machine learning method ABSTRACT. Circular RNA (circRNA) is an RNA molecule different from linear RNA with covalently closed loop structure. CircRNAs can act as sponging miRNAs and can interact with RNA binding protein. Previous studies have revealed that circRNAs play important role in the development of different diseases. The biological functions of circRNAs can be investigated with the help of circRNA-protein interaction. Due to scarce circRNA data, long circRNA sequences and the sparsely distributed binding sites on circRNAs, much fewer endeavors are found in studying the circRNA-protein interaction compared to interaction between linear RNA and protein. With the increase in experimental data on circRNA, machine learning methods are widely used in recent times for predicting the circRNA-protein interaction. The existing methods either use RNA sequence or protein sequence for predicting the binding sites. In this paper, we present a new method PCPI (Predicting CircRNA and Protein Interaction) to predict the interaction between circRNA and protein using support vector machine (SVM) classifier. We have used both the RNA and protein sequences to predict their interaction. The circRNA sequences were converted in pseudo peptide sequences based on codon translation. The pseudo peptide and the protein sequences were classified based on dipole moments and the volume of the side chains. The 3-mers of the classified sequences were used as features for training the model. Several machine learning model were used for classification. Comparing the performances, we selected SVM classifier for predicting circRNA-protein interaction. Our method achieved 93% prediction accuracy.
16:20	Wanyi Yang, Chuanfang Wu and Jinku Bao PDFll: Intrinsic protein disorder and function prediction from the language of life PRESENTER: Wanyi Yang ABSTRACT. Identification of intrinsic disorder proteins and their function relies in large part on computational predictors, which demands that their quality should be high. Here we present a series of computational predictors, PDFll, that provide accurate disorder and disorder function predictions based on protein sequences. PDFll generated by two main steps, the first step relies on large protein language models (pLMS), which train on billions of protein sequences.The second step is to put the embeddings gained from pLMs into small and simple deep-learning models to get predictions.These predictions are substantially better than the results of the state-of-the-art predictors that predict disorder and funtion while training without evolutionary information.
16:40	Usama Sardar, Sarwan Ali, Muhammad Sohaib Ayub, Muhammad Shoaib, Khurram Bashir, Imdadullah Khan and Murray Patterson Sequence-Based Nanobody-Antigen Binding Prediction ABSTRACT. Nanobodies (Nb) are monomeric heavy-chain fragments de- rived from heavy-chain only antibodies naturally found in Camelids and Sharks. Their considerably small size (∼3-4 nm; 13 kDa) and favorable biophysical properties make them attractive targets for recombinant pro- duction. Furthermore, their unique ability to bind selectively to specific antigens, such as toxins, chemicals, bacteria, and viruses, makes them powerful tools in cell biology, structural biology, medical diagnostics, and future therapeutic agents in treating cancer and other serious illnesses. However, a critical challenge in nanobodies production is the unavail- ability of nanobodies for a majority of antigens. Although some com- putational methods have been proposed to screen potential nanobodies for given target antigens, their practical application is highly restricted due to their reliance on 3D structures. Moreover, predicting nanobody- antigen interactions (binding) is a time-consuming and labor-intensive task. This study aims to develop a machine-learning method to pre- dict Nanobody-Antigen binding solely based on the sequence data. We curated a comprehensive dataset of Nanobody-Antigen binding and non- binding data and devised an embedding method based on gapped k-mers to predict binding based only on sequences of nanobody and antigen. Our approach achieves up to 90% accuracy in binding prediction and is significantly more efficient compared to the widely-used computational docking technique.
17:00	Ahtisham Fazeel, Muhammad Nabeel Asim, Johan Trygg, Andreas Dengel and Sheraz Ahmed Deep Learning Architectures For the Prediction of YY1-Mediated Chromatin Loops PRESENTER: Ahtisham Fazeel ABSTRACT. YY1-mediated chromatin loops play substantial roles in basic biological processes like gene regulation, cell differentiation, and DNA replication. YY1-mediated chromatin loop prediction is important to understand diverse types of biological processes which may lead to the development of new therapeutics for neurological disorders and cancers. Existing deep learning predictors are capable to predict YY1-mediated chromatin loops in two different cell lines however, they showed limited performance for the prediction of YY1-mediated loops in the same cell lines and suffer significant performance deterioration in cross cell line setting. To provide computational predictors capable of performing large-scale analyses of YY1-mediated loop prediction across multiple cell lines, this paper presents two novel deep learning predictors. The two proposed predictors make use of Word2vec, one hot encoding for sequence representation and long short-term memory, and a convolution neural network along with a gradient flow strategy similar to DenseNet architectures. Both of the predictors are evaluated on two different benchmark datasets of two cell lines HCT116 and K562. Overall the proposed predictors outperform existing DEEPYY1 predictor with an average maximum margin of 4.65%, 7.45% in terms of AUROC, and accuracy, across both of the datases over the independent test sets and 5.1%, 3.2% over 5-fold validation. In terms of cross-cell evaluation, the proposed predictors boast maximum performance enhancements of up to 9.5% and 27.1% in terms of AUROC over HCT116 and K562 datasets.
17:20	Boxin Guan, Anqi Wang, Yahan Li, Feng Li, Jin-Xing Liu and Junliang Shang ABCAE: Artificial Bee Colony Algorithm with Adaptive Exploitation for Epistatic Interaction Detection ABSTRACT. The detection of epistatic interactions among multiple single-nucleotide polymorphisms (SNPs) in complex diseases has posed a significant challenge in genome-wide association studies (GWAS). However, most existing methods still suffer from algorithmic limitations, such as high computational requirements and low detection ability. In the paper, we propose an artificial bee colony algorithm with adaptive exploitation (ABCAE) to address these issues in epistatic interaction detection for GWAS. an adaptive exploitation mechanism is designed and used in the onlooker stage of ABCAE. By using the adaptive exploitation mechanism, ABCAE can locally optimize the promising SNP combination area, thus effectively coping with the challenges brought by high-dimensional complex GWAS data. To demonstrate the detection ability of ABCAE, we compare it against four existing algorithms on eight epistatic models. The Adaptive exploitation · Artificial bee colony · Complex disease Epistatic interactionexperimental results demonstrate that ABCAE outperforms the four existing methods in terms of detection ability.

18:00-21:00 Dinner at Łukasiewicz - PORT