ABSTRACT. The notion of ED string was introduced as a simple alternative to variation and sequence graphs for representing a pangenome, that is, a collection of genomic sequences to be analyzed jointly or to be used as a reference. It can also be seen as a directed acyclic graph, whose edges are labeled by strings.We define notions of matching statistics of two ED strings, and we show how these can be used to conceive similarity measures for two pangenomes, and also consequently infer distance measures. We then show that both of these measures can be computed efficiently, both in theory and in practice, by employing the notion of an “intersection graph” of two ED strings.We used real SARS-CoV-2 datasets and our matching statistics method to reproduce a well-established clades classification of SARS-CoV-2, thus showing that the classification obtained by our method is in accordance with the existing one.

On computing sets of integers with maximum number of pairs summing to powers of 2

ABSTRACT. We address the problem of finding sets of integers of a given size with a maximum number of pairs summing to powers of 2. By fixing particular pairs, this problem reduces to finding a labeling of the vertices of a given graph with pairwise distinct integers such that the endpoint labels for each edge sum to a power of 2. We propose an efficient algorithm for this problem, which at its core relies on another algorithm that, given two sets of linear homogeneous polynomials with integer coefficients, computes all variable assignments to powers of 2 that nullify polynomials from the first set but not from the second. With the proposed algorithms, we determine the maximum size of graphs of order n that admit such a labeling for all n ≤ 20. We also identify the minimal forbidden subgraphs of order ≤ 11, whose presence prevents the
graphs from having such a labeling.

Efficient Algorithms for Decomposing Integers as Sums of Few Tetrahedral Numbers

ABSTRACT. Pollock conjectures that every natural number can be expressed as a sum of at most five tetrahedral numbers. It remains unknown whether this conjecture holds, and Watson proved that sums of at most eight tetrahedral numbers suffice to express all natural numbers.

We devise two algorithms to decompose integers as sums of few tetrahedral numbers. Our first algorithm can decompose any given integer $n$ into a sum of at most eight tetrahedral numbers in $O(\log^3 n/\log\log n)$ time with probability $1-1/n^{\Omega(1)}$, assuming the extended Riemann hypothesis. Our second algorithm can deterministically decompose all integers in $[1, \ell]$ into sums of the fewest possible tetrahedral numbers in $O(\ell)$ time using $O(\ell^{2/3})$ space, assuming a conjecture over the integers in $[1, \ell]$ and the Pollock's conjecture on tetrahedral numbers. While the conjectures for all numbers is unproven, its validity for all numbers in $[1, \ell]$ can be verified using $O(\ell)$ time.

As a result of our second algorithm, we can show that the Pollock's conjecture holds for all natural numbers up to $2.82 \times 10^{21}$. This significantly improves upon the previous known bound of $3.77 \times 10^{15}$.

Simple random sampling of binary forests with fixed number of nodes and trees

ABSTRACT. We generalize the classical algorithm of Rémy for random sampling of full binary trees with given number of leaves. As a result, we give a simple linear time algorithm for random generation of full binary forests with known number of trees and leaves. The algorithm is obtained from an elegant bijection that we construct in order to give a direct prove of the well-known fact that these forests are counted by the k-th fold self-convolution of the Catalan numbers. Via some well-known bijections, the given algorithm can be used to sample random objects from several other classes enumerated by self-convolutions of the Catalan numbers, e.g., binary forests with given number of trees, lists of given number of balanced strings and others.

Approximation Algorithms for Node-Weighted Directed Steiner Problems

ABSTRACT. Guha et al. [STOC, 1999] and Moss and Rabani [SIAM J. Comput., 2007] introduced two variants of the Steiner problem in undirected graphs in which the nodes are associated with two values, called costs and prizes. In the \emph{budgeted rooted node-weighted Steiner tree} problem, we are given an undirected graph $G$ with $n$ nodes, a predefined node $r$, costs and prizes associated to the nodes of $G$, and a budget $B$. The aim is to find a tree in $G$ rooted at $r$ such that the total cost of its nodes is at most $B$ and the total prize is maximized. In the \emph{quota rooted node-weighted Steiner tree} problem, we are given a quota $Q$, instead of the budget, and we aim at minimizing the cost of a tree rooted at $r$ whose overall prize is at least $Q$.

If the graph is undirected both problems can be approximated within polylogarithmic factors, possibly with a constant-factor budget violation, [Bateni et al. SIAM J. Comput., 2018][Moss and Rabani SIAM J. Comput., 2007]. If the graph is directed, the budgeted problem can be approximated within a $O\left(\frac{\log n'}{\log \log n'}\right)$ factor in quasi-polynomial time, where $n'$ is the number of vertices in an optimal solution [Ghuge and Nagarajan~SODA\ 2020], and within a factor $O(\frac{1}{\epsilon^{3}}\sqrt{B})$ with a budget violation of $1+\epsilon$, for any $\epsilon \in (0, 1]$, in polynomial time [D'Angelo, Delfaraz, and Gilbert~ISAAC~2022].

In this paper, we provide two algorithms for the budgeted and quota problems on directed graphs that achieve, respectively, an $O(\frac{1}{\epsilon^2}n^{2/3}\ln{n})$-approximation at the cost of a budget violation of a factor of at most $1+\epsilon$, for any $\epsilon \in (0, 1]$, and an approximation factor of $O(n^{2/3}\ln{n})$ at the cost of a violation of the quota constraint by a factor of at most $2$. We develop a technique resorting on a standard flow-based linear programming relaxation to compute a tree with good trade-off between prize and cost, which allows us to provide polynomial time approximation algorithms for both problems. We provide the first approximation algorithms for these problems that run in polynomial time and guarantee an approximation factor depending only on the number of vertices $n$.

Complexity Framework for Forbidden Subgraphs~IV: The Steiner Forest Problem

ABSTRACT. We study {Steiner Forest} on H-subgraph-free graphs: graphs that do not contain some fixed graph H as a (not necessarily induced) subgraph. We are motivated by a recent framework that completely characterizes the complexity of many problems on H-subgraph-free graphs. However, in contrast to e.g.\ the related {Steiner Tree} problem, {Steiner Forest} falls outside this framework. Hence, the complexity of { Steiner Forest} on H-subgraph-free graphs remained tantalizingly open.

In this paper, we make significant progress towards determining the complexity of { Steiner Forest} on H-subgraph-free graphs. Our main results are four novel polynomial-time algorithms for different excluded graphs $H$ that are central to further understanding its complexity. Along the way, we study the complexity of { Steiner Forest} for graphs with a small c-deletion set, that is, a small set X of vertices such that each component of G-X has size at most c. Using this parameter, we give two noteworthy algorithms that we later employ as subroutines. First, we present a significantly faster parameterized algorithm for {Steiner Forest} parameterized
by |X| when c=1 (i.e.\ the vertex cover number).
Second, we prove that {\sc Steiner Forest} is polynomial-time solvable for graphs with a 2-deletion set of size at most 2.
The latter result is tight, as the problem is \NP-complete for graphs with a 3-deletion set of size 2.

Approximating Spanning Tree Congestion on Bounded Degree Graphs

ABSTRACT. Given a graph $G$ and a spanning tree $T$ of $G$, the congestion of an edge
$e\in E(T)$, with respect to $G$ and $T$, is the number of edges $uv$ in $G$
such that the unique path in $T$ connecting the vertices $u$ and $v$ traverses
the edge $e$. Given a connected graph $G$, the {\em spanning tree congestion}
problem is to construct a spanning tree $T$ that minimizes its maximum edge
congestion.

It is known that the problem is NP-hard, and that {\em every} spanning tree is
an $n/2$-approximation, but it is not even known whether an
$o(n)$-approximation is possible in polynomial time; by $n$ we denote the number
of vertices in the graph $G$.

We consider the problem on graphs with maximum degree bounded by
$\Delta=polylog(n)$ and describe an $o(n)$-approximation algorithm; note that
even on this restricted class of graphs the spanning tree congestion can be of
order $n\cdot polylog(n)$.

Minimizing distances between vertices and edges through tree $t$-spanners

ABSTRACT. A tree $t$-spanner of a graph $G$ is a spanning tree $T$ of $G$ in which any two adjacent vertices of $G$ have distance at most $t$ in~$T$. We say that $G$ is $t$-admissible if it admits a tree $t$-spanner, and $\sigma$ is the smallest $t$ for which $G$ is $t$-admissible. It is well-known that deciding whether $G$ has a tree $t$-spanner (the {\sc $t$-admissibility} problem) is in \P \ for $t\leq 2$, it is \NP-complete for $t\geq 4$, and it is a long open problem to decide {\sc $3$-admissibility}. {\sc Edge $t$-admissibility} is a variation of the former problem, where the goal is decide whether the line graph of $G$ contains a tree $t$-spanner $T$, indicating that adjacent edges of $G$ have distance at most $t$ in $T$.
It is known that {\sc edge $t$-admissibility} is in \P \ for $t \leq 3$, while it is \NP-complete for $t\geq 8$. We investigate the complexity of dealing with minimizing distances at same time vertices and edges. This is the {\sc Total admissibility} problem. We prove that {\sc total $t$-admissibility} is \NP-complete, even for bipartite or planar graphs. Besides, total graphs include middle-graphs and almost-total graphs as subgraphs, which satisfy the operation we define as \emph{clique-augmenting} of graphs $G$, denoted as $CA(G)$. We prove that deciding {\sc tree $t$-admissibility} for clique-augmenting graphs is \NP-complete. We also prove that graphs $CA(G)$ can be classified into two types ($\sigma(CA(G)) = \sigma(G)$ or $\sigma(CA(G)) = \sigma(G) +1$) and it is \NP-complete to decide whether $\sigma(CA(G)) = \sigma(G)$. Moreover, by showing special properties, we present several tractable cases to obtain $\sigma(CA(G))$.

Computing maximal palindromes in non-standard matching models

ABSTRACT. Palindromes are popular and important objects in textual data processing, bioinformatics, and combinatorics on words. Let S = XaY be a string where X and Y are of the same length, and a is either a single character or the empty string. Then, there exist two alternative definitions for palindromes: S is said to be a palindrome if S is equal to its reversal S^R (Reversal-based definition); or if its right-arm Y is equal to the reversal of its left-arm X^R (Symmetry-based definition). It is clear that if the "equality" (\approx) used in both definitions is exact character matching (=), then the two definitions are the same. However, if we apply other string-equality criteria \approx, including the complementary-matching model for biological sequences, the Cartesian-tree model [Park et al., TCS 2020], the parameterized model [Baker, JCSS 1996], the order-preserving model [Kim et al., TCS 2014], and the palindromic-structure model [I et al., TCS 2013], then are the reversal-based palindromes and the symmetry-based palindromes the same? To the best of our knowledge, no previous work has considered or answered this natural question. In this paper, we first provide answers to this question, and then present efficient algorithms for computing all maximal palindromes under the non-standard matching models in a given string. After confirming that Gusfield's offline suffix-tree-based algorithm for computing maximal symmetry-based palindromes can be readily extended to the aforementioned matching models, we show how to extend Manacher's online algorithm for computing maximal reversal-based palindromes in linear time for all the aforementioned matching models.

Computing Minimal Absent Words and Extended Bispecial Factors with CDAWG Space

ABSTRACT. A string w is said to be a minimal absent word (MAW) for a string S if w does not occur in S and any proper substring of w occurs in S. We focus on non-trivial MAWs which are of length at least 2. Finding such non-trivial MAWs for a given string is motivated for applications in bioinformatics and data compression. Fujishige et al. [TCS 2023] proposed a data structure of size Θ(n) that can output the set MAW(S) of all MAWs for a given string S of length n in O(n+|MAW(S)|) time, based on the directed acyclic word graph (DAWG). In this paper, we present a more space efficient data structure based on the compact DAWG (CDAWG), which can output MAW(S) in O(|MAW(S)|) time with O(e) space, where e denotes the minimum of the sizes of the CDAWGs for S and for its reversal S^R. For any strings of length n, it holds that e < 2n, and for highly repetitive strings e can be sublinear (up to logarithmic) in n. We also show that MAWs and their generalization minimal rare words have close relationships with extended bispecial factors, via the CDAWG.

Computing Longest Common Subsequence under Cartesian-Tree Matching Model

ABSTRACT. Two strings of the same length are said to Cartesian-tree match (CT-match) if their Cartesian-trees are isomorphic [Park et al., TCS 2020]. Cartesian-tree matching is a natural model that allows for capturing similarities of numerical sequences. Oizumi et al. [CPM 2022] showed that subsequence pattern matching under CT-matching model can be solved in polynomial time. This current article follows and extends this line of research: We present the first polynomial-time algorithm that finds the longest common subsequence under CT-matching of two given strings S and T of length n, in O(n^6) time and O(n^4) space for general ordered alphabets. We then show that the problem has a faster solution in the binary case, by presenting an O(n^2 / log n)-time and space algorithm.

ABSTRACT. Leaf powers and k-leaf powers have been studied for over 20 years, but there are still several aspects of this graph class that are poorly understood. One such aspect is the leaf rank of leaf powers, i.e. the smallest number k such that a graph G is a k-leaf power. Computing the leaf rank of leaf powers has proved a hard task, and furthermore, results about the asymptotic growth of the leaf rank as a function of the number of vertices in the graph have been few and far between. We present an infinite family of rooted directed path graphs that are leaf powers, and prove that they have leaf rank exponential in the number of vertices (utilizing a type of subtree model first presented by Rautenbach [Some remarks about leaf roots]). This answers an open question by Brandstädt et al. [Rooted directed path graphs are leaf powers].