View: session overviewtalk overview
Ana Teresa Freitas Department of Computer Science and Engineering at Técnico Lisbon, University of Lisbon
Turning Data into Genomic Medicine
10:20 | CSA-MEM: Enhancing Circular DNA Multiple Alignment through Text Indexing Algorithms ABSTRACT. In the realm of Bioinformatics, the comparison of DNA sequences is essential for tasks such as phylogenetic identification, comparative genomics, and genome reconstruction. Methods for estimating sequence similarity have been successfully applied in this field. The application of these methods to circular genomic structures, common in nature, poses additional computational hurdles. In the advancing field of metagenomics, innovative circular DNA alignment algorithms are vital for accurately understanding circular genome complexities. Aligning circular DNA, more intricate than linear sequences, demands heightened algorithms due to circularity, escalating computation requirements and runtime. This paper proposes CSA-MEM, an efficient text indexing algorithm to identify the most informative region to rotate and cut circular genomes, thus improving alignment accuracy. The algorithm uses a circular variation of the FM-Index and identifies the longest chain of non-repeated maximal subsequences common to a set of circular genomes, enabling the most adequate rotation and linearisation for multiple alignment. The effectiveness of the approach was validated in five sets of mitochondrial, viral and bacterial DNA. The results show that CSA-MEM significantly improves the efficiency of multiple sequence alignment, consistently achieving top scores compared to other state-of-the-art methods. This tool enables more realistic phylogenetic comparisons between species, facilitates large metagenomic data processing, and opens up new possibilities in comparative genomics. |
10:35 | Genetic Algorithm with Evolutionary Jumps PRESENTER: Alex Zelikovsky ABSTRACT. It has recently been noticed that dense subgraphs of SARS-CoV-2 epistatic networks correspond to future unobserved variants of concern. This phenomenon can be interpreted as multiple correlated mutations occurring almost simultaneously, resulting in a new variant relatively distant from the current population. We refer to this phenomenon as an evolutionary jump and propose to use it for enhancing genetic algorithm. Evolutionary jumps were implemented using CliqueSNV algorithm which find cliques in the epistatic network. We have applied the genetic algorithm with evolutionary jumps (GA+EJ) to the 0-1 Knapsack problem, and found that evolutionary jumps allow the genetic algorithm to escape local minima and find solutions closer to the optimum. |
10:50 | A Brief Study of Gene Co-Expression Thresholding Algorithms ABSTRACT. The thresholding problem is considered in the context of high-throughput biological data. Several approaches are reviewed, implemented, and tested over an assortment of transcriptomic data. |
11:05 | Graph-Based Motif Discovery in Mimotope Profiles of Serum Antibody Repertoire PRESENTER: Hossein Saghaian ABSTRACT. Phage display technique has a multitude of applications such as epitope mapping, organ targeting, therapeutic antibody engineering and vaccine design. One area of particular importance is the detection of cancers in early stages, where the discovery of binding motifs and epitopes is critical. While several techniques exist to characterize phages, Next Generation Sequencing (NGS) stands out for its ability to provide detailed insights into antibody binding sites on antigens. However, when dealing with NGS data, identifying regulatory motifs poses significant challenges. Existing methods often lack scalability for large datasets, rely on prior knowledge about the number of motifs, and exhibit low accuracy. In this paper, we present a novel approach for identifying regulatory motifs in NGS data. Our method leverages results from graph theory to overcome the limitations of existing techniques. |
10:20 | ricME: long-read based mobile element variant detection using sequence realignment and identity calculation PRESENTER: Huidong Ma ABSTRACT. Mobile element variant is a very important structural variant, accounting for a quarter of structural variant, and it is closely related to many issues such as genetic diseases and species diversity. However, few detection algorithms of mobile element variants have been developed on the third-generation se-quencing data. We propose an algorithm ricME that combines sequence rea-lignment and identity calculation for detecting mobile element variants. ric-ME first performs the initial detection to obtain the positions of insertions and deletions and extracts the variant sequences; then applies sequence rea-lignment and identity calculation to obtain the transposon classes related to the variant sequences; finally, adopts a multi-level judgment rule to achieve accurate detection of mobile element variants based on the transposon clas-ses and identities. Compared with a representative long-read based mobile element variant detection algorithm rMETL, ricME improves the F1 scores by 11.5% and 21.7% for the experimental results run on simulated datasets and real datasets, respectively. The proposed algorithm ricME is available freely at https://github.com/mhuidong/ricME. |
10:40 | Reconciling Inconsistent Molecular Structures from Biochemical Databases ABSTRACT. Information on the structure of molecules, retrieved via biochemical databases, plays a pivotal role in various disciplines, such as metabolomics, systems biology, and drug discovery. However, no such database can be complete, and the chemical structure for a given compound is not necessarily consistent between databases. This paper presents StructRecon, a tool for resolving unique and correct molecular structures from database identifiers. StructRecon traverses the cross-links between database entries in different databases to construct what we call an identifier graph, which offers a more complete view of the total information available on a particular compound across all the databases. In order to reconcile discrepancies between databases, we first present an extensible model for chemical structure which supports multiple independent levels of detail, allowing standardisation of the structure to be applied iteratively. In some cases, our standardisation approach results in multiple structures for a given compound, in which case a random walk-based algorithm is used to select the most likely structure among incompatible alternates. We applied StructRecon to the EColiCore2 model, resolving a unique chemical structure for 85.11 % of identifiers. StructRecon is open-source and modular, which enables the potential support for more databases in the future. |
11:00 | Clique-based topological characterization of chromatin interaction hubs ABSTRACT. Chromatin conformation capture technologies are a vital source of information about the spatial organization of chromatin in eukaryotic cells. Of these technologies, Hi-C and related methods have been widely used to obtain reasonably complete contact maps in many cell lines and tissues under a wide variety of conditions. This data allows for the creation of chromatin interaction graphs from which topological generalizations about the structure of chromatin may be drawn. Here we outline and utilize a clique-based approach to analyzing chromatin interaction graphs which allows for both detailed analysis of strongly interconnected regions of chromatin and the unraveling of complex relationships between genomic loci in these regions. We find that clique-rich regions are significantly enriched in distinct gene ontologies as well as regions of transcriptional activity compared to the entire set of links in the respective datasets, and that these cliques are also not entirely preserved in randomized Hi-C data. We conclude that cliques and the denser regions of connectivity in which they are common appear to indicate a consistent pattern of chromatin spatial organization that resembles transcription factories, and that cliques can be used to identify functional modules in Hi-C data. |
11:15 | On the Realisability of Chemical Pathways ABSTRACT. The exploration of pathways and alternative pathways that have a specific function is of interest in numerous chemical contexts. A framework for specifying and searching for pathways has previously been developed, but a focus on which of the many pathway solutions are realisable, or can be made realisable, is missing. Realisable here means that there actually exists some sequencing of the reactions of the pathway that will execute the pathway. We present a method for analysing the realisability of pathways based on the reachability question in Petri nets. For realisable pathways, our method also provides a certificate encoding an order of the reactions which realises the pathway. We present two extended notions of realisability of pathways, one of which is related to the concept of network catalysts. We exemplify our findings on the pentose phosphate pathway. Lastly, we discuss the relevance of our concepts for elucidating the choices often implicitly made when depicting pathways. |