WCB10:Papers with Abstracts

Abstract. The idea of constraint-based modeling in systems biology is to describe a biological system by a set of constraints, i.e., by pieces of partial information about its structure and dynamics. Using constraint-based reasoning one may then draw conclusions about the possible system behaviors.

In this talk, we will focus on constraint-based modeling techniques for regulatory networks starting from the discrete logical formalism of René Thomas. In this framework, logic and constraints arise at two different levels. On the one hand, Boolean or multi-valued logic formulae provide a natural way to represent the structure of a regulatory network, which is given by positive and negative interactions (i.e., activation and inhibition) between the network components. On the other hand, temporal logic formulae (e.g. CTL) may be used to reason about the dynamics of the system, represented by a state transition graph or Kripke model.
Abstract. This paper, concerned with the protein structure prediction problem, aims at automatically selecting the Constraint Satisfaction algorithm best suited to the problem instance at hand. The contribution is twofold. Firstly, the selection criterion is the quality (minimal cost) in expectation of the solution found after a fixed amount of time, as opposed to the expected runtime. Secondly, the presented approach, based on supervised Machine Learning algorithms, considers the original description of the protein structure problem, as opposed to the features related to the SAT or CSP encoding of the problem.
Abstract. Pedigrees are `family trees' relating groups of individuals which can usefully be seen as Bayesian networks. The problem of finding a maximum likelihood pedigree from genotypic data is encoded as an integer linear programming problem. Two methods of ensuring that pedigrees are acyclic are considered. Results on obtaining maximum likelihood pedigrees relating 20, 46 and 59 individuals are presented. Running times for larger pedigrees depend strongly on the data used but generally compare well with those in the literature. Solving is particularly fast when allele frequency is uniform.
Abstract. X-ray crystallography is one of the main methods to establish the three-dimensional structure of biological macromolecules. In an X-ray experiment, one can measure only the magnitudes of the complex Fourier coefficients of the electron density distribution under study, but not their phases. The problem of recovering the lost phases is called the phase problem. Building on earlier work by Lunin/Urzhumtsev/Bockmayr, we extend their constraint-based approach to the phase problem by adding further 0-1 linear programming constraints. These constraints describe geometric properties of proteins and increase the quality of the solutions. The approach has been implemented using SCIP and CPLEX, first computational results are presented here.
Abstract. In the goal of genetic improvement of livestock by marker assisted selection, we aim at reconstructing the haplotypes of sires from their offspring. We reformulate this problem into a weighted binary constraint satisfaction problem with only equality and disequality soft constraints. Our results show these problems have a small treewidth and can be solved optimally, improving haplotype reconstruction compared to previous approaches especially for small families (<10 descendants).
Abstract. In systems biology, identifying vital functions like glycolysis from a given metabolic pathway is important to understand living organisms. In this paper, we particularly focus on the problem of finding minimal sub-pathways producing target metabolites from source metabolites. We represent laws of biochemical reactions in propositional formulas and use a state-of-the-art SAT solver as a minimal model generator to solve the problem efficiently. An advantage of our method is that it can treat reversible reactions represented by cycles. Moreover recent advances of SAT technologies enables us to obtain solutions for large pathways. We have applied our method to a whole Escherichia coli pathway. As a result, we found 5 sets of reactions including the conventional glycolysis sub-pathway described in a biological database EcoCyc.
Abstract. Sequence-structure alignment of RNAs with arbitrary secondary structures is Max-SNP-hard. Therefore, the problem of RNA alignment is commonly restricted to nested structure, where dynamic programming yields efficient solutions. However, nested structure cannot model pseudoknots or even more complex structural dependencies. Nevertheless those dependencies are essential and conserved features of many RNAs. Only few existing approaches deal with crossings of limited complexity or arbitrary crossing structures. Here, we present a constraint approach for alignment of structures in the even more general class of structures with unlimited complexity. Our central contribution is a new RNA alignment propagator. It is based on an efficient O(n<sup>2</sup>) relaxation of the RNA alignment problem. Specifically, our approach Carna solves the alignment problem for sequences with given input structure of unlimited complexity. Carna is implemented using Gecode.
Abstract. Building on a technique for associating Hybrid Systems (HS) to stochastic programs written in a stochastic extension of Concurrent Constraint Programming (sCCP), we will discuss several aspects of performing such association. In particular, as we proved an sCCP program can be mapped in a HS varying in a lattice at a level depending on the amount of actions to be simulated continuously, we will discuss what are the problems involved in a semi-automatic choice of such level. Decidability, semantic, and efficiency issues will be taken into account, with special emphasis on their links with biological applications. We will also discuss about the role of constraints and of the constraint store is this construction.
Abstract. To find the best lattice model representation of a given full atom protein structure is a hard computational problem. Several greedy methods have been suggested where results are usually biased and leave room for improvement. In this paper we formulate and implement a Constraint Programming method to refine such lattice structure models. We show that the approach is able to provide better quality solutions. The prototype is implemented in COLA and is based on limited discrepancy search. Finally, some promising extensions based on local search are discussed.
Abstract. A large number of species cannot be distinguished via standard non genetic analysis in the lab. In this paper we address the problem of finding minimum sets of restriction enzymes that can be used to unequivocally identify the species of a yeast specimen by analyzing the size of digested DNA fragments in gel electrophoresis experiments. The problem is first mapped into set covering and then solved using Constraint Programming techniques. Although the data sets used are relatively small (23 yeast species and 331 enzymes), a similar approach might be applicable to larger ones and to a number of variants as discussed in the conclusion. The subject of this paper has already raised the interest of our biologist partners and may become a benchmark for the application of Constraint Programming techniques to Bioinformatics.