CM2021: 20TH COPPER MOUNTAIN CONFERENCE ON MULTIGRID METHODS
PROGRAM FOR THURSDAY, APRIL 1ST
Days:
previous day
next day
all days

View: session overviewtalk overview

15:00-16:40 Session 10A: Algebraic multigrid (Part 2 of 2)
Chair:
Location: Bighorn A
15:00
Space-time block preconditioning for incompressible flow, and nonsymmetric AMG
PRESENTER: Ben Southworth

ABSTRACT. Parallel-in-time (PinT) methods introduce parallelism to the temporal dimension of numerical ODEs/PDEs to allow for the efficient use of more processors. Most often in PinT methods, the temporal dimension is treated largely separately from the spatial dimension. However, another way to pose parallel-in-time is to treat time as a spatial variable -- for example, time-dependent advection-diffusion in d dimensions can be seen as a steady advection-diffusion problem in d+1 dimensions. On single-variable advection-diffusion problems, such a "space-time" approach has been shown to achieve greater speedups over sequential time stepping, as well as speedups at lower processor counts, compared with traditional PinT methods. In this talk, we first discuss new developments in a classical AMG method for nonsymmetric problems, that incorporates a simple local measure of advection vs. diffusion in designing the restriction operator. The goal is a black-box-like AMG solver that is naturally robust from pure (upwinded) advection to diffusion, with a focus on time-dependent advection-diffusion problems.

Although progress has been made on space-time solvers for single-variable PDEs, very little work has been done on effective space-time solvers for time-dependent systems of PDEs. In the second part of this talk, we introduce the concept of space-time block preconditioning for incompressible Navier Stokes. Building on significant research on block preconditioning for steady PDEs and sequential time stepping, each space-time preconditioning iteration requires (approximately) solving a space-time advection-diffusion equation, followed by a set of independent spatial problems at each time step, to approximate the Schur complement. Theoretical motivation is provided for the new method, and its robustness and scalability is shown on multiple model problems in incompressible flow. Overall the new method permits fast, PinT solution of incompressible flow, given a fast and robust PinT solver for advection-diffusion.

15:25
A geometrically-informed algebraic multigrid preconditioned iterative approach for solving high-order finite element systems
PRESENTER: Songzhe Xu

ABSTRACT. Algebraic multigrid (AMG) is conventionally applied in a black-box fashion, agnostic of the underlying geometry. In this work, we propose that using geometric information--when available--to assist setting up the AMG hierarchy is beneficial, especially for solving linear systems resulting from high-order finite element discretizations. For geometric multigrid, it is known that using p-coarsening before h-coarsening can provide better scalability, but setting up p-coarsening is not trivial in AMG. Our method, called geometrically-informed algebraic multigrid (GIAMG), with minimal information of the geometry from the user, is able to set up a grid hierarchy that includes p-coarsening at the top grids. A major advantage of using p coarsening with AMG---beyond the benefits known in the context of GMG---is the increased sparsification of coarse grid operators. We extensively evaluate GIAMG by testing on the 3D Helmholtz and incompressible Navier--Stokes operators, and demonstrate mesh-independent convergence, and excellent parallel scalability. We also compare the performance of GIAMG with existing AMG packages, including Hypre and ML.

15:50
Matrix-Matrix based AMG interpolations and the GPU implementations
PRESENTER: Ruipeng Li

ABSTRACT. A new class of distance-two interpolation methods for algebraic multigrid (AMG) that can be formulated in terms of sparse matrix-matrix multiplications is presented and analyzed. Compared with the existing distance-two prolongation operators, the proposed algorithms exhibit improved efficiency and portability to various computing platforms, since they allow to easily exploit existing high-performance sparse matrix kernels. The new interpolation methods have been implemented in hypre, a widely-used parallel multigrid solver library. With the proposed interpolations, the overall time of hypre's BoomerAMG setup can be considerably reduced, while sustaining equivalent, sometimes improved, convergence rates. Numerical results for a variety of test problems on parallel machines equipped with GPUs are presented that support the superiority of the proposed interpolation operators.

16:15
Aggregative Multiscale AMG: A robust interpolation approach for (plain) aggregative AMG based on ideas of Algebraic Multiscale

ABSTRACT. A growing extent of application areas for algebraic multigrid (AMG) solvers pose further challenges for AMG. We especially consider problems with highly inhomogeneous coefficients and with an increasing density of their matrix stencils. Such applications are, for instance, complex sub-surface simulations in geo-engineering or data analysis, e.g., social network graphs. That is, we require AMG setups with rather low operator complexities but at the same time rather robust interpolation schemes.

The coarsening and interpolation strategy for Aggregative Multiscale AMG (AM-AMG) is a generalization of Algebraic Multiscale (AMS) from reservoir simulations [1], which relies on the coarsening process for a (uniform) structured grid, in combination with the basic principles of plain aggregative AMG. [2] The AM-AMG coarsening strategy is based on the construction of aggregates, which makes it purely algebraic. Just as in AMS, three point classes (vertices, edges and interiors) are introduced. These point classes are used to structure the coarsening process as the aggregates have shared “borders” and, thus, are directly connected to some other aggregates (up to a maximum number). Additionally these point classes are necessary for the interpolation calculation. The interpolation is a direct solution of the linear system reduced to each of those created aggregates, inspired from the physical background of AMS.

This new strategy for the interpolation significantly improves the robustness of the plain aggregative AMG [3,4]; especially in cases with inhomogeneous material coefficients. Yet, the method is not as robust as Ruge-Stüben AMG [5]. However, the parallelization potential is much higher than for Ruge-Stüben AMG as the interpolation is calculated independently for each aggregate. Thus, this interpolation strategy is natively parallelizable. In contrast, the coarsening process is more challenging to parallelize than for plain aggregative AMG due to the shared aggregate borders.

In contrast to AMG in general, the AM-AMG setup construction does not rely on a distinction between strong and weak couplings. The strength of the coupling is implicitly incorporated in the aggregate based construction of the interpolation. This is particularly advantageous when the coupling strength is not symmetric. A further advantage of AM-AMG is the improved controllability of the hierarchy's operator complexity, i.e., its memory consumption. This advantage is based on the aggregation inspired coarsening strategy, which allows for an adaption of the coarsening rate. That is especially important with increasing density of the matrix stencils. Furthermore it’s possible to easily reuse parts of the setup as the interpolation is calculated separate for each aggregate.

For symmetric M-matrices, we will embed this interpolation in the theoretical framework described in [5]. That is, we show that the AM-AMG approach fulfils the “accuracy” of interpolation (respectively, the approximation property) as explained in detail in [5].

[1] Hui, Z.: Algebraic Multiscale Finite-Volume Methods for Reservoir Simulation, Ph.D. thesis, Standford University (2010) [2] Ehrmann, S., Gries, S., Schweitzer, M. A.: Transition of Algebraic Multiscale to Algebraic Multigrid. In: ECMOR XVI-16th European Conference on the Mathematics of Oil Recovery, European Accosciation of Geoscientists & Engineers, 2018, p. 1-18 [3] Notay, Y.: Aggregation-Based Algebraic Multilevel Preconditioning. SIAM Journal on Matrix analysis and applications 27(4), 998-1018 (2006) [4] Vaněk, P., Mandel, J., Brezina, M.: Algebraic Multigrid by Smoothed Aggregation for Second and Fourth Order Eliptic Problems. Computing 56(3), 179-196 (1996) [5] Ruge, J.W., Stüben, K.: Algebraic Multigrid. In: Multigrid mehods, pp. 73-130. SIAM(1987)

15:00-16:40 Session 10B: Multigrid and factorization methods
Location: Bighorn B
15:00
Nonnegative Unimodal Matrix Factorization

ABSTRACT. We introduce a new Nonnegative Matrix Factorization (NMF) model called Nonnegative Unimodal Matrix Factorization (NuMF), which adds on top of NMF the unimodal condition on the columns of the basis matrix. NuMF finds applications for example in analytical chemistry. We propose a simple but naive brute-force heuristics strategy based on accelerated projected gradient. It is then improved by using multi-grid for which we prove that the restriction operator preserves the unimodality. We also present two preliminary results regarding the uniqueness of the solution, that is, the identifiability, of NuMF. Empirical results on synthetic and real datasets confirm the effectiveness of the algorithm and illustrate the theoretical results on NuMF.

15:25
New Two-level GPU-Accelerated Incomplete LU Preconditioners in hypre
PRESENTER: Tianshi Xu

ABSTRACT. This talk presents a parallel preconditioning approach based on incomplete LU (ILU) factorizations in the framework of Domain Decomposition (DD) for general sparse linear systems. We focus on distributed memory parallel architectures, specifically, those that are equipped with graphic processing units (GPUs). In addition to block Jacobi, we present robust two-level ILU Schur complement-based approaches, where different strategies are presented to solve the coarse-level reduced system. We study the construction of multilevel ILU via interpolation and restriction, and combine these strategies with modified ILU methods in the construction of the coarse-level operator, in order to effectively remove smooth errors. We leverage available GPU-based sparse matrix kernels to accelerate the setup and the solve phases of the proposed ILU preconditioners. We evaluate the robustness and efficiency of the proposed methods as a smoother for algebraic multigrid (AMG) and as a preconditioner for Krylov subspace methods on some well-documented test examples and several more challenging problems.

15:50
A multigrid-inspired approach for the Augmented Block Cimmino Distributed solver
PRESENTER: Philippe Leleux

ABSTRACT. The Augmented Block Cimmino Distributed Solver (ABCD Solver, http://abcd.enseeiht.fr/) is a hybrid method designed to solve large sparse unsymmetric linear systems of the form: Ax=b, where A is a full row rank m x n sparse matrix, x is a vector of size n and b is a vector of size m. The approach is based on the block Cimmino row projection method which is applied on the system partitioned in row blocks. Starting from an arbitrary initial estimate, a block Cimmino iteration improves the estimated solution by summing the projections of the current iterate on the subspaces spanned by the blocks of rows to converge to a solution. The convergence of block Cimmino is known to be slow and we rather solve the symmetric positive definite Hx = k obtained when considering the fix point of the iterations. To accelerate the convergence of the block Cimmino iterations, we solve this system using a stabilized block conjugate gradient (BCG). This acceleration of the block Cimmino iterations is simply called the block Cimmino method (BC). The solver also offers the possibility to construct a larger system through an augmentation of the matrix such that the numerical orthogonality between partitions is enforced. As a result, the augmented block Cimmino method converges in one single iteration, and we obtain the solution of the original system. This results in a pseudo-direct method (ABCD) where the solution depending on projections, as in BC, and on the direct solution of a condensed system involving a relatively smaller condensed matrix S. This matrix can be interpreted as the Schur complement of a domain decomposition method applied on the normal equations of the augmented system.

While the ABCD method can be labeled as a pseudo-direct solver, the solution of the Schur complement S with a direct solver can be expensive in terms of computation and memory, due to its possibly large size and density. For large PDE problems, we then propose an extension of the augmentation approach, in which we relax the strict orthogonality between blocks so as to reduce the size of the augmentation, and thus S. The purpose is a better control of the memory needs and computations. To do so, we exploit ideas from the multigrid framework. We assume that we have at hand several levels of increasingly refined grids for the discretization of the problem, as well as the prolongation operator P to transfer the information from a coarse level to the finest level of grids. The original system is then augmented very similarly to the ABCD method, by enforcing the orthogonality between partitions on a coarse level, thanks to the prolongation operator. While in the exact ABCD approach the size of the augmentation can be large for highly connected subdomains, this new approach gives a way to control explicitly this size through the choice of a more or less coarse level of grid. The result is an augmented system with approximated orthogonality on the fine grid level discretized operator. The convergence of the block Cimmino method on this system is no longer obtained in 1 iteration since the orthogonality is only approximated. The expectation is that with this new augmentation (which is actually performed on the representation of the PDE problem on the coarse grid developed in the fine grid finite element basis) will be enough to capture the low frequency interactions between partitions, and the partitions be will appropriately decoupled. Thus the iterative BC method should have a fast convergence. Finally, additional constraints are added, similarly to the ABCD method, to get back the solution of the original system. The central issue of this relaxed approach is the construction and solution of these additional constraints in the system, including the Schur complement matrix. We introduce a construction method based on the application of BC with linear convergence on canonical vectors. The downside of this method is that the obtained constraints are very dense. Alternatively, we can apply directly the inverse of S, without construction, using a Krylov solver with internal projections computed with the BC method. The latter approach requires the construction of a good preconditioner for the matrix S.

We apply our approach on 3 challenging 2D PDE problems: an Helmholtz problem, a convection-diffusion with recirculating wind, and a diffusion problem with jumping coefficients in the domain. We demonstrate the fast linear convergence of the BC method when applied on the matrix augmented with the relaxed augmentation on all problems. Even using a very coarse level of grid for the augmentation allows to change from a convergence characterized by plateaus when BC is applied on the original system, to a linear (predictable) convergence when applied on the augmented system. Through the right choice of a coarse level, a trade-off must be found between a very fast convergence, and a large augmentation which induces the construction and solution of a Schur complement matrix S of corresponding size.

16:15
A New Block Preconditioner for Implicit Runge-Kutta Methods for Parabolic PDE Problems
PRESENTER: Md Masud Rana

ABSTRACT. A new preconditioner based on a block $LDU$ factorization with algebraic multigrid subsolves for scalability is introduced for the large, structured systems appearing in implicit Runge--Kutta time integration of parabolic partial differential equations. This preconditioner is compared in condition number and eigenvalue distribution, and in numerical experiments with others in the literature: block Jacobi, block Gauss-Seidel, and the optimized block Gauss-Seidel method of Staff, Mardal, and Nilssen [Modeling, Identification and Control, 27 (2006), pp. 109--123]. Experiments are run on two test problems, a $2D$ heat equation and a model advection-diffusion problem, using implicit Runge--Kutta methods with two to seven stages. We find that the new preconditioner outperforms the others, with the improvement becoming more pronounced as spatial discretization is refined and as temporal order is increased.

17:20-19:00 Session 11A: Structured and matrix-free methods (Part 2 of 2)
Location: Bighorn A
17:20
Non-invasive multigrid for semi-structured grids

ABSTRACT. In this talk, we look beyond multigrid (MG) methods for hierarchical hybrid grids (HHG) to develop a more general and less invasive approach. Methods involving HHG utilize regular refinements of a coarse unstructured mesh to inform the multigrid algorithm of structure on advanced computer architectures. We present a new mathematical framework for describing our method that allows for more complex meshes and increased flexibility. We then show the relation between this method and traditional MG, and we present numerical results to validate our new approach.

17:45
Improving the fast diagonalization method for non-separable problems with subspace correction
PRESENTER: Pablo Brubeck

ABSTRACT. For problems with smooth solutions, high-order methods have very good convergence properties, and in some cases they do not exhibit locking phenomena found in low-order methods. Moreover, due to data-locality and high arithmetic intensity, they are better suited to make efficient use of modern parallel hardware architectures. Unfortunately, the conditioning of the Galerkin matrices is severely affected by $p$, the polynomial degree of the approximation. In order to obtain practical iterative solvers, we require good preconditioners.

As relaxation, we employ a novel combination of an approximate fast diagonalization method and subspace correction. The scheme is essentially point-block Jacobi in the space of eigenfunctions of a separable approximation to the local stiffness matrix of each cell. The relaxation depends on the tensor-product structure of quadrilateral and hexahedral elements, in a similar manner to sum factorization. We apply this relaxation within a hybrid additive Schwarz / $p$-multigrid method with optimizable penalization of the facet degrees of freedom. This yields a contractive smoother which also preserves the symmetry of the problem, unlike the more standard weighted approaches.

We employ this relaxation method in two algorithms: a $p$-MG preconditioner and a full approximation scheme nonlinear solver. We demonstrate how to combine these two efficiently in a nested iteration with a cascadic outer cycle and inner V-cycles. Other solvers, such as geometric and algebraic multigrid, may be employed for the $p$-coarse level. The associated computational costs are $O(p^d)$ to approximate the local stiffness matrix in $d$ dimensions, $O(p^{d+1})$ to apply or update the relaxation, while memory requirements are kept at $O(p^d)$. We present nonlinear examples such as the $p$-Laplacian and incompressible hyperelasticity.

18:10
A Space-Time Multigrid Preconditioner for CFD

ABSTRACT. The goal of our research is the construction of efficient Jacobian-free preconditioners for high order Discontinuous Galerkin (DG) discretizations with implicit time integration. We are motivated by three-dimensional unsteady compressible flow applications, which often result in large stiff systems. Implicit time integrators overcome the impact upon restrictive CFL conditions but leave the problem to solve huge nonlinear systems. Space-time discontinuous Galerkin methods provide such implicit solvers and are of variable order in all spatial and the temporal directions, simplifying to an implicit Euler time stepping method in case of a first order DG discretization in time.

We suggest to use a preconditioned Jacobian-free Newton-Krylov method and construct the preconditioner using multigrid (MG) methods, which we base on a lower order replacement operator (Birken, Gassner, Versbach 2019), i.e a first order Finite Volume (FV) discretization defined on a subcell grid. To do so, the nodal DOFs from the DG method need to be reinterpreted as DOFs for the FV method. This ansatz allows us to base the construction of the preconditioner on the Jacobian of the FV which is a relatively cheap approximation of the DG Jacobian. Moreover, we are able to use available knowledge about fast multigrid methods for FV discretizations.

In this talk we present the idea of constructing FV based MG preconditioners for space-time DG solvers, and suggest different transfer options for the DOFs in the DG and the FV operator. So far, the MG preconditioner has been constructed and tested for implicit Euler time stepping and we show numerical results for this case. In order to extend the preconditioning idea to space-time problems with higher temporal discretization, we present a Local Fourier Analysis (LFA) for a new space-time multigrid algorithm for solving an advection model problem, similar to (Gander, Neumüller 2016). We show numerical results for this analysis which are produced in the Distributed and Unified Numerics Environment (DUNE).

18:35
Optimization of two-level methods for DG discretizations of reaction-diffusion equations

ABSTRACT. We analyze and optimize two-level methods applied to a symmetric interior penalty discontinuous Galerkin finite element discretization of a singularly perturbed reaction-diffusion equation. Previous analyses of such methods have been performed numerically by Hemker et. al. for the Poisson problem. Our main innovation is that we obtain explicit formulas for the optimal relaxation parameter of the two-level method for the Poisson problem in 1D, and very accurate closed form approximation formulas for the optimal choice in the reaction-diffusion case in all regimes. Our Local Fourier Analysis, which we perform at the matrix level to make it more accessible to the linear algebra community, shows that for DG penalization parameter values used in practice, it is better to use cell block-Jacobi smoothers of Schwarz type, in contrast to earlier results suggesting that point block-Jacobi smoothers are preferable, based on a smoothing analysis alone. Our analysis also reveals how the performance of the iterative solver depends on the DG penalization parameter, and what value should be chosen to get the fastest iterative solver, providing a new, direct link between DG discretization and iterative solver performance. We illustrate our analysis with numerical experiments and comparisons in higher dimensions and different geometries.

17:20-19:00 Session 11B: Multigrid and Schwarz methods
Location: Bighorn B
17:20
Additive Schwarz methods for serendipity elements

ABSTRACT. While solving Partial Differential Equations (PDEs) with finite element method (FEM), serendipity elements allow us to obtain the same order of accuracy as rectangular tensor-product elements with many fewer degrees of freedom (DOFs). To realize the possible computational savings, we develop some additive Schwarz methods (ASM) for the p-version FEM. Adapting arguments from Pavarino for the tensor-product case, we prove that patch smoothers give conditioning estimates independent of the polynomial degree for a model problem. We also combine this with a low-order global operator to give an optimal two-grid method, with conditioning estimates independent of the mesh size and polynomial degree. The theory holds for serendipity elements in two and three dimensions, and can be extended to full multigrid algorithms. Numerical experiments using Firedrake and PETSc confirm this theory and demonstrate efficiency relative to standard elements.

17:45
Scalable Chebyshev-Accelerated Schwarz Preconditioning for GPUs
PRESENTER: Malachi Phillips

ABSTRACT. The pressure solve resulting from the spectral element discretization of the incompressible Navier-Stokes equation requires fast, robust, and scalable preconditioning. In the current work, a novel Chebyshev-accelerated Schwarz preconditioning scheme is introduced, targeting GPU architectures. Convergence properties of Chebyshev-accelerated Jacobi, Schwarz, and Chebyshev-accelerated Schwarz schemes are compared. Performance and scalability of the various preconditioning strategies are presented, indicating that the Chebyshev-accelerated Schwarz scheme provides better convergence and performance over the other preconditioning schemes.

18:10
Adaptive coarse spaces for Schwarz methods based on decompositions of the domain decomposition interface
PRESENTER: Jascha Knepper

ABSTRACT. We propose robust coarse spaces for two-level overlapping Schwarz preconditioners which are based on decompositions of the domain decomposition interface. These coarse spaces can be seen as extensions of energy-minimizing coarse spaces of GDSW (Generalized–Dryja–Smith–Widlund) type. The resulting two-level methods are robust for second-order elliptic problems, even in presence of highly heterogeneous coefficient functions, and reduce to standard GDSW type algorithms if no additional coarse basis functions are used. Numerical results are presented for heterogeneous linear elasticity model problems in three dimensions.

18:35
Three-level Fast and Robust Overlapping Schwarz (FROSch) preconditioners

ABSTRACT. The GDSW preconditioner is a two-level overlapping Schwarz preconditioner with an energy-minimizing coarse space, which can be constructed in an algebraic fashion from the fully assembled stiffness matrix. A parallel implementation of the GDSW preconditioner, as well as an implementation of a reduced dimensional coarse space (RGDSW) are contained in the Fast and Robust Overlapping Schwarz framework of the Trilinos software library. To improve the parallel scalability, a three-level extension has recently been introduced to the framework by recursively applying the GDSW/RGDSW preconditioner to the coarse problem. Results, obtained on up to 85000 cores of the SuperMUC-NG supercomputer, are presented, showing the extended parallel scalability.

19:00-19:15Break