previous day
next day
all days

View: session overviewtalk overview

07:30-08:30Breakfast Buffet
08:00-10:05 Session 7A: Parallel time integration (Part 2 of 3)
Location: Bighorn C
On selecting coarse-grid operators for Parareal and MGRIT applied to linear advection

ABSTRACT. We consider the parallel time integration of the linear advection equation with the Parareal and two-level multigrid-reduction-in-time (MGRIT) algorithms. Our aim is to develop a better understanding of the convergence behaviour of these algorithms for this problem, which is known to be poor relative to the diffusion equation, its model parabolic counterpart. Using Fourier analysis, we derive new convergence estimates for these algorithms which, in conjunction with existing convergence theory, provide insight into the origins of this poor performance. We then use this theory to explore improved coarse-grid time-stepping operators. For several high-order discretizations of the advection equation, we demonstrate that there exist non-standard coarse-grid time stepping operators that yield significant improvements over the standard choice of rediscretization.

Convergence analysis for parallel-in-time solution of hyperbolic systems

ABSTRACT. Hardware trends and scaling limits have driven the development of algorithms that allow space-time parallelism. These methods consider the solution of time-dependent systems of partial differential equations (PDEs), and allow simultaneous solution across multiple time steps, in contrast to classical time-stepping approaches considering the sequential solution of one time step after the other. While this is not a new idea, the interest in developing new space-time and time-parallel approaches for a variety of PDEs has grown in recent years. With such intense effort in the development of new schemes and with the observation that convergence of these methods is poor for hyperbolic problems relative to parabolic-type PDEs, there is a pressing need for complementary analysis tools, to provide understanding of the relative performance and to inform the optimization of algorithmic parameters as schemes are adapted to new problems.

In this talk, we compare and contrast three such analysis schemes for parallel-in-time algorithms of the Parareal and multigrid-reduction-in-time (MGRIT) methodologies. These three mode analysis tools differ, in particular, in the treatment of the time dimension: (1) space-time local Fourier analysis uses a Fourier ansatz in space and time, (2) semi-algebraic mode analysis couples standard local Fourier analysis approaches in space with algebraic computation in time, and (3) in a two-level reduction-based theory, error propagation is considered only on the coarse time grid. We compare and contrast their predictions for two specific hyperbolic model problems, linear advection in one dimension and incompressible linear elasticity in two dimensions.

Hyperbolic PDEs and Parallel-In-Time: A Revival in Reduction Methods

ABSTRACT. Recently, there has been renewed interest in reduction-based multigrid methods. The multigrid reduction in time algorithm (MGRIT) is a geometric-type multigrid algorithm to add parallelism to the inherently sequential process of time integration, and reduction-based algebraic multigrid (AMG) based on an approximate ideal restriction (AIR) is a new AMG method that has proven highly effective on upwind discretizations of linear hyperbolic PDEs. This talk introduces new norm-based convergence theory for both algorithms.

Exact bounds for two-grid error convergence of MGRIT and Parareal are developed in the l^2- and A*A-norms. In particular, this theory provides insight on why hyperbolic PDEs, particularly using explicit time integration, are difficult for MGRIT and Parareal. It further motivates a rediscretization of the coarse-grid time-propagation operator, and numerical results offer exciting new potential for hyperbolic parallel-in-time (PIT) algorithms. For AIR, theory demonstrates why AIR is effective on certain highly nonsymmetric (hyperbolic-type) matrices, which other solvers struggle with. Numerical results complement the theory, and AIR is then shown as an alternative, fully algebraic PIT method, able to achieve fast convergence on full space-time discretizations of linear hyperbolic PDEs.

Computation of Gaussian quadrature rules using a multi-level Parareal-like parallel approach applied to the Stieltjes procedure
SPEAKER: Thibaut Lunet

ABSTRACT. Gaussian quadrature rules are nowadays a powerful tool to compute integrals with particular measures that arise in many scientific applications (e.g radiation transfer for the Earth’s atmosphere). To a given measure, one associates a sequence of orthogonal polynomials, uniquely determined by a set of three-term recurrence coefficients (a_k, b_k), k in N, from which one retrieves the quadrature nodes and weights using standard linear algebra methods. For some measures, the associated orthogonal polynomials are well known and studied (e.g Lagrange, Chebyshev, Laguerre, Hermite). However, often non-classical measure arise in applications, and therefore one needs to compute the associated unknown coefficients (a_k, b_k).

To do so, W. Gautschi developed the "discretization method" [W. Gautschi, 1991], that approximates the scalar product associated to the unknown measure, and uses this approximation to retrieve the three-term recurrence coefficients. The Stieltjes procedure is a well-known straightforward approach which allows to compute the (a_k, b_k) coefficients from any approximated scalar product. But this approach also suffers from severe numerical instabilities when used with finite precision arithmetic.

In this talk, we present a new approach to perform the Stieltjes procedure based on Parareal [Lions et al., 2001], a well known algorithm allowing to compute the solution of time-dependent problems while inducing parallelism in the time dimension. It was shown in [Gander and Vanderwalle, 2007] that Parareal can be interpreted as a Newton method using a Jacobian matrix approximated on a coarse time grid. We apply the same idea to the Stieltjes procedure, where a two level grid is build for the (a_k, b_k) coefficients and/or the approximate scalar product. We illustrate this new approach (stability, induced parallelism) using numerical experiments.

Analysis of an overlapping variant of parareal
SPEAKER: Felix Kwok

ABSTRACT. In this talk, we introduce a variant of the parareal algorithm in which there is overlap between the coarse subintervals. We show in the linear case that this method is equivalent to the MGRIT algorithm with the F(CF)^\nu smoother, where \nu determines the size of the overlap. We give a proof of general superlinear convergence of the resulting method, and explain under which conditions this overlapping variant leads to a faster algorithm.

08:00-10:05 Session 7B: Multigrid and discretization
Location: Bighorn B
Adaptive FEM for the fractional Laplacian: a priori and a posteriori error estimates, efficient implementation and multigrid solver

ABSTRACT. We explore the connection between fractional order partial differential equations in two or more spatial dimensions with boundary integral operators to develop techniques that enable one to efficiently tackle the integral fractional Laplacian. We develop all of the components needed to construct an adaptive finite element code that can be used to approximate fractional partial differential equations, on non-trivial domains in \(d\geq 1\) dimensions. Our main approach consists of taking tools that have been shown to be effective for adaptive boundary element methods and, where necessary, modifying them so that they can be applied to the fractional PDE case. Improved a priori error estimates are derived for the case of quasi-uniform meshes which are seen to deliver sub-optimal rates of convergence owing to the presence of singularities. Attention is then turned to the development of an a posteriori error estimate and error indicators which are suitable for driving an adaptive refinement procedure. We assume that the resulting refined meshes are locally quasi-uniform and develop efficient methods for the assembly of the resulting linear algebraic systems and their solution using multigrid. The storage of the dense matrices along with efficient techniques for computing the dense matrix vector products needed for the iterative solution is also considered. Importantly, the approximation does not make any strong assumptions on the shape of the underlying domain and does not rely on any special structure of the matrix that could be exploited by fast transforms. The performance and efficiency of the resulting algorithm is illustrated for a variety of examples.

p-multigrid based solvers for Isogeometric Analysis
SPEAKER: Roel Tielen

ABSTRACT. Over the years, Isogeometric Analysis (IgA) [1] has shown to be a successful alternative to the Finite Element Method, both in academia and industrial applications. However, solving the resulting linear systems efficiently remains a challenging task. For instance, the condition number of the stiffness matrix scales quadratically with the mesh width h, but, in contrast to standard Finite Elements, exponentially with the order of the approximation p [2]. The performance of (standard)iterative solvers thus decreases fast for larger values of p.

In this talk we present a solution strategy for IgA discretizations that is based on p-multigrid techniques used both as a solver and as a preconditioner in an outer Krylov subspace iteration method. In contrast to (geometric) h-multigrid methods, where a hierarchy of coarser and finer meshes is constructed, the approach makes use of a hierarchy of B-spline based discretizations of different approximation orders. The ‘coarse grid’ correction is determined at level p = 1, which enables us to use well established solution techniques developed for low-order Lagrange finite elements. Since both prolongation and restriction operators are defined as mappings between arbitrary spline spaces, combined coarsening in both h and p is possible, leading to a flexible hp-multigrid.

We present numerical results obtained with both p- and hp-multigrid methods for different benchmark problems on non-trivial geometries. It follows from a spectral analysis, that the coarse grid correction and the smoothing procedure complement each other quite well for both methods. In particular we investigate the performance of a non-standard smoother based on a ILUT factorization leading to convergence rates independent of both h and p. Furthermore, the application of p-multigrid within a heterogeneous HPC framework is discussed.

REFERENCES [1] T.J.R. Hughes, J.A. Cottrell and Y. Bazilevs, ”Isogeometric Analysis: CAD, Finite Elements,NURBS, Exact Geometry and Mesh Refinement”, Computer Methods in Applied Mechanics and Engineering, 194, 4135 − 4195, 2005 [2] K.P.S. Gahalaut, J.K. Kraus, S.K. Tomar. ”Multigrid methods for isogeometric discretization”, Computer Methods in Applied Mechanics and Engineering, 253, 413 − 425, 2013

A Multilevel Approach for Trace System in HDG Discretizations

ABSTRACT. We propose a multilevel approach for trace system resulting from hybridized discontinuous Galerkin (HDG) methods. The key is to blend ideas from nested dissection, domain decomposition, and high-order characteristic of HDG discretizations. Specifically, we first create a coarse solver by eliminating and/or limiting the front growth in nested dissection. This is accomplished by projecting the trace data into a sequence of same or high-order polynomials on a set of increasingly $h-$coarser edges/faces. We then combine the coarse solver with a block-Jacobi fine scale solver to form a two-level solver/preconditioner. Numerical experiments indicate that the performance of the resulting two-level solver/preconditioner depends only on the smoothness of the solution and is independent of the nature of the PDE under consideration. While the proposed algorithms are developed within the HDG framework, they are applicable to other hybrid(ized) high-order finite element methods. Moreover, we show that our multilevel algorithms can be interpreted as a multigrid method with specific intergrid transfer and smoothing operators. With several numerical examples from Poisson, pure transport, and convection-diffusion equations we demonstrate the robustness and scalability of the algorithms.

Nonlinear multigrid solvers for finite volume discretization of subsurface flow

ABSTRACT. We present nonlinear multigrid solvers using full approximation scheme (FAS) for some subsurface flow problems discretized by the finite volume method. One difficulty is how to build coarse spaces with approximation properties. For finite element methods, approximation properties can ensured by including appropriate polynomials in the approximation space. However, this option is not available for finite volume discretizations. Our approach is to construct basis functions for the coarse spaces based on spectral decomposition of the local discrete operator. The performance of the proposed nonlinear multigrid solver will be presented.

10:05-10:25Coffee and Tea Break
10:25-12:30 Session 8A: Multilevel methods for stochastic problems
Location: Bighorn C
Advanced Multi-Level Sampling Methods for Large-Scale Bayesian Inverse Problems
SPEAKER: Tiangang Cui

ABSTRACT. Exploring the posterior distribution in Bayesian inverse problems can quickly exceed available computationally resources if each forward-model solve is computationally demanding. In many situations, however, there is not only the expensive full forward model available. Rather, there exists a hierarchy of models that describe the same phenomenon as the full model but with varying computational costs and accuracies. For instance, approximations obtained by the grid coarsening. We present some recent advances on accelerating Markov chain Monte Carlo based and non-Markov chain based samplers for solving Bayesian inverse problems by exploiting the structure of available model hierarchies.

A hierarchical multilevel Markov chain Monte Carlo approach for large-scale Bayesian inference

ABSTRACT. Markov chain Monte Carlo (MCMC), while a standard approach for nonlinear Bayesian inference of high-dimensional problems, is not feasible for large-scale applications. With the increase in problem size and thus simulation cost, MCMC becomes intractable, and acceleration approaches become necessary. Of particular interest within the scope of acceleration methods is the recently developed hierarchical multilevel MCMC approach in Dodwell et al. 2015 based on the Karhunen-Loeve expansion (KLE) that performs MCMC on the coarse grid problem and adds MCMC estimates of multilevel correction terms to account for the error between solutions on adjacent grids. While it has been shown to reduce the cost in comparison to standard MCMC for model 2D problems, a drawback is that the KLE-based hierarchy is not well-suited (and not even feasible) for 3D problems, as the KLE-based hierarchical sampling approach does not scale with the size of the problem. In this work, we explore a new PDE-based hierarchical sampling formulation based on algebraic multigrid and apply it to this existing multilevel MCMC framework. In particular, we form a scalable hierarchical sampling method for a 3D subsurface flow example, and investigate the cost reduction and scaling of this approach in comparison to standard MCMC.

T.J. Dodwell, C. Ketelsen, R. Scheichl, and A.L. Teckentrup. A hierarchical multilevel Markov chain Monte Carlo algorithm with applications to uncertainty quantification in subsurface flow. SIAM/ASA Journal on Uncertainty Quantification, 3(1):1075–1108, 2015.

Multilevel Monte Carlo methods for statistical solutions of hyperbolic conservation laws
SPEAKER: Kjetil Lye

ABSTRACT. An open question in the field of hyperbolic conservation laws is the question of well-posedness. Recent theoretical and numerical evidence have indicated that multidimensional systems of hyperbolic conservation laws exhibit random behavior, even with deterministic initial data. We use the framework of statistical solutions to model this inherit randomness. We review the theory of statistical solutions for conservation laws.

Afterwards, we introduce a convergent numerical method for computing the statistical solution of conservation laws, and prove that it converges in the Wasserstein distance through narrow convergence for the case of scalar conservation laws. We furthermore show that employing a multilevel Monte Carlo algorithm can lead to a significant speed up in computational time for scalar hyperbolic conservation laws.

For the scalar case, we validate our theory by computing the structure functions of the Burgers’ equation with random initial data. We especially focus on Brownian initial data, and the measurement of the scalings of the structure functions. The results agree well with the theory, and we get the expected convergence rate. We furthermore show that we can get faster computations using Multilevel Monte-Carlo for computing the statistical solutions of scalar conservation laws.

In the case of systems of equations in multiple space dimensions, we test our theory against the compressible Euler equations in two space dimensions. We furthermore show examples where multilevel Monte Carlo yields a speed up, and cases where it does not work for computing statistical solutions.

An asymptotic preserving multilevel Monte Carlo method for particle based simulation of kinetic equations
SPEAKER: Emil Loevbak

ABSTRACT. An asymptotic preserving multilevel Monte Carlo method for particle based simulation of kinetic equations

Applications in particle simulation often suffer from the effects of a time-scale separation and stiffness. One has to perform simulations with a small time step in order to track the system's fast dynamics, but one also has to simulate over a long time-horizon in order to capture the system's slow dynamics. This time-scale separation is commonly characterized by means of a scaling parameter related to the mean free particle path, for example in hyperbolic particle-based transport equations. In the limit of a vanishing scaling parameter, the hyperbolic transport equation converges to a parabolic diffusion equation. However, simulations of the transport equation in this region suffer from extreme time step restriction constraints, and this because of stability reasons.

In [1], a new Monte Carlo scheme was developed that converges in the diffusive limit, and avoids these stability constraints. This so-called asymptotic preserving Monte Carlo methods achieves its favorable convergence properties by introducing a certain linear and vanishing modelling error in the time step size. In the present work, we apply the multilevel Monte Carlo method [2] to this asymptotic preserving scheme. As such, the precise, but expensive, Monte Carlo simulation is replaced with a cheap, but biased, coarse simulation followed by a sequence of corrections that remove the bias. We will demonstrate that this multilevel method is able to accelerate the original scheme by one or two orders of magnitude. We also show that the bias corrections and variances in this scheme have a very different structure from what is typically observed in other multilevel Monte Carlo applications. In this talk some analytical results will be showcased for a two-speed equation, as well as an experimental verification thereof. An extension of the multilevel asymptotic preserving Monte Carlo method to more general transport equations will also be presented.


[1] G. Dimarco, L. Pareschi and G. Samaey. Asymptotic-Preserving Monte Carlo methods for transport equations in the diffusive limit. SIAM Journal on Scientific Computing, 40: A504–A528, 2018. [2] M.B. Giles Multilevel Monte Carlo methods. Acta Numerica, 24: 259–328, 2015.

MG/OPT and multilevel Monte Carlo for robust PDE constrained optimization

ABSTRACT. We present an algorithm based on the MG/OPT framework (a multigrid optimization framework) to solve optimization problems constrained by PDEs with uncertain coefficients. For stochastic problems, the relevant quantities, such as the gradient and Hessian, contain expected value operators. In order to be able to deal with high dimensional or irregular stochastic spaces, these are evaluated using a multilevel Monte Carlo (MLMC) method. Each of the MG/OPT levels then contains multiple underlying MLMC levels. A finer MG/OPT level still corresponds to a finer discretization level, since the gradient is returned on a finer discretization level. The MG/OPT hierarchy allows the algorithm to exploit the structure inherent in the PDE, speeding up the convergence to the optimum (regardless of the problem being deterministic or stochastic). In contrast, the MLMC hierarchy exists to exploit structure present in the stochastic dimensions of the problem. One can observe a large reduction in the number of samples required on expensive levels, and therefore in computational time.

10:25-12:30 Session 8B: Multilevel methods
Location: Bighorn B
Rounding error analysis for mixed precision multigrid solvers

ABSTRACT. This talk is motivated by a simple, yet relatively unexplored question: Do multigrid methods suffer from ill-conditioning ? To answer this question, we present a normwise forward error analysis as well as numerical examples confirming that multigrid methods do indeed suffer from ill-conditioning. Using classic ideas based on iterative refinement, we show how to alleviate this problem by using mixed-precision arithmetic. Multigrid methods have previously been used as preconditioners in mixed-precision solvers, but without theoretical proofs of convergence. This talk establishes a theoretical framework with rigorous proofs for a mixed-precision version of multigrid for solving the algebraic equations that arise from discretizing linear elliptic partial differential equations (PDEs). These PDEs lead to equations whose matrices are sparse and symmetric positive definite, which enables the use of the so-called energy or A norm to establish convergence and error estimates. Concretely, it allows us to develop and analyze bounds on the convergence behavior of multigrid as a function of the matrix condition number. Both theoretical and numerical results are developed confirming that convergence to the level of discretization accuracy can be achieved by mixed-precision versions of V-cycles and full multigrid (FMG) even when the condition number times unit roundoff is at or above one. Our framework is inspired by the results of E. Carson and N. J. Higham in "A New Analysis of Iterative Refinement and Its Application to Accurate Solution of Ill-Conditioned Sparse Linear Systems", but ultimately provides tighter bounds for many PDEs.

Engineering multilevel support vector machines

ABSTRACT. The computational complexity of solving nonlinear support vector machine (SVM) is prohibitive on large-scale data. In particular, this issue becomes very sensitive when the data represents additional difficulties such as highly imbalanced class sizes. Typically, nonlinear kernels produce significantly higher classification quality to linear kernels but introduce extra kernel and model parameters which requires computationally expensive fitting. This increases the quality but also reduces the performance dramatically. In this talk we will introduce several versions of fast multilevel methods for regular and weighted SVM and discuss some of its algorithmic components including strict and weighted aggregation (inspired by AMG) coarsening. The multilevel method opens several attractive ways that accelerate SVM solvers including trade-off between balanced and imbalanced data, control over performance measures, and fine validation of aggregated training. Our framework is implemented using PETSc which allows an easy integration with scientific computing tasks. The experimental results demonstrate significant speed up compared to the state-of-the-art nonlinear SVM libraries. Our source code, documentation and parameters are available at . The talk is based on the preprint Ehsan Sadrfaridpour, Talayeh Razzaghi, Ilya Safro "Engineering fast multilevel support vector machines", 2018,

A blocked adaptive cross approximation algorithm and its hierarchical generalization

ABSTRACT. This talk presents a hierarchical low-rank decomposition algorithm assuming any matrix element can be rapidly computed. The proposed algorithm computes rank-revealing decompositions of sub-matrices with a blocked adaptive cross approximation (BACA) algorithm, followed by a hierarchical merge operation via truncated singular value decompositions (H-BACA). The proposed algorithm significantly improves the convergence of the baseline ACA algorithm and achieves low computational complexity.

Performance of the Scheduled Relaxation Jacobi method in a geometric multilevel setting

ABSTRACT. The Scheduled Relaxation Jacobi (SRJ) method has been recently shown to lead to a substantial acceleration of the standard Jacobi relaxation in single-level systems [1]. The core principle inspiring this method is a dynamical tuning of the overrelaxation parameter so as to achieve an optimal path to convergence. The performance of overrelaxation in a multigrid setting, however, can be nontrivial to establish in general. In this talk, I will report on the ongoing progress in the application of SRJ to a nonlinear, geometric multilevel solver [2] based on the Cactus framework [3].

[1] X.I.Yang & R.Mittal, Acceleration of the Jacobi iterative method by factors exceeding 100 using Scheduled Relaxation, Journal of Computational Physics 274, 695 – 708 (2014) [2] E. Bentivegna, Solving the Einstein constraints in periodic spaces with a multigrid approach, Classical and Quantum Gravity 31, 035004 (2014) [3] T. Goodale et al, The Cactus Framework and Toolkit: Design and Applications, In Vector and Parallel Processing -- VECPAR'2002, 5th International Conference, Lecture Notes in Computer Science (2003)

Local Fourier Analysis and Robust Optimization
SPEAKER: Yunhui He

ABSTRACT. Local Fourier analysis (LFA) is a useful analysis tool to predict and analyze actual performance of multigrid methods and domain decomposition algorithms. A major use of local Fourier analysis is to optimize algorithmic parameters, such as relaxation weights, to achieve good convergence properties. This can be posed as a ``minimax'' optimization problem, which is nonconvex and nonsmooth. Analytical solution is possible for simple minimax problems in LFA. However, there are often many algorithmic parameters to be determined to obtain efficient algorithms, resulting in minimax problems for which analytical solution and brute-force sampling are impractical. In this talk, we consider robust optimization algorithms to solve these minimax problems, focusing on more efficient methods than brute-force sampling, especially those that utilize gradients. Different examples, with known and unknown analytical solutions, are presented to show the effectiveness of our approach.

12:30-16:00Lunch Break
16:00-16:30Coffee and Tea Break
16:30-18:35 Session 9A: Parallel algebraic multigrid software & performance
Location: Bighorn C
The Analysis of SA-AMG Method by Applying Hybrid MPI/OpenMP Parallelization on Cluster Supercomputer System
SPEAKER: Naoya Nomura

ABSTRACT. A smoothed aggregation algebraic multigrid (SA-AMG) method is among the fastest and scalable solver for large-scale linear equation Ax = b. In this talk, we demonstrate the convergence and parallel performance of our SA-AMG library by applying Hybrid MPI/OpenMP parallelization on cluster computing system. SA-AMG was proposed by Petr Vanek et al. in 1992. SA-AMG achieves good convergence and scalability by damping various wavelength components efficiently. To damp these components, SA-AMG creates multi-level matrices with arbitrary policies. The created matrices are hierarchically smaller than the dimension of the original coefficient matrix. By using these multi-level matrices, error components are damped efficiently. To parallelize the SA-AMG method, we applied an OpenMP/MPI hybrid parallelization with a domain decomposition technique. Target applications of our study are large-scale and have geometrical information. On these applications, the hybrid parallelization and the domain decomposition technique are suitable. In this talk, we demonstrate numerical evaluations using several cluster supercomputer systems.

Multi-step Node-Aware Communication in Parallel AMG
SPEAKER: Amanda Bienz

ABSTRACT. Parallel algebraic multigrid (AMG) lacks scalability due to large costs associated with MPI point-to-point communication. At large scales, communication dominates the cost of both the initial construction of the multi-grid hierarchy as well as the iterative solve phase. The main source of MPI costs throughout AMG comes from the communication of sparse matrices and their associated vectors. This cost can be greatly reduced by limiting the number and size of messages injected into the network, trading inter-node messages for less costly intra-node.

This talk investigates multiple methods of node-aware communication and analyzes the effects of each on the communication of sparse matrices as well as their associated vectors. The node-aware communication is applied to various methods of both the setup and solve phases of AMG. Furthermore, the effects on both Ruge-Stuben and Smoothed Aggregation solvers are analyzed.

Fast distributed triple-matrix multiplication on GPUs in AMG setup
SPEAKER: Ruipeng Li

ABSTRACT. Modern many-core processors such as graphics processing units (GPUs) are becoming an integral part of many high performance computing systems nowadays. These processors yield enormous raw processing power in the form of massive SIMD parallelism. Accelerating algebraic multigrid (AMG) methods on GPUs has drawn a lot of research attention in recent years. For instance, in the current release of HYPRE's BoomerAMG package, the solve phase has been ported to GPUs, whereas the setup can still be computed on CPUs only. In this talk, we will present our work on accelerating BoomerAMG's setup on multiple GPUs. In particular, we focus on the computation of distributed triple-matrix multiplications (i.e., the Galerkin product), which often represents a significant cost of the entire setup phase of AMG. We will discuss in detail the composing parts of this computation that include several fast GPU sparse matrix kernels and communications between GPUs. The recent results as well as future work will also be included.

Improving Scalability of Algebraic Multigrid by a Specialized MATVEC and Divide and Conquer MatMat Product
SPEAKER: Majid Rasouli

ABSTRACT. Algebraic Multigrid (AMG) is a popular linear system solver and preconditioner for large sparse linear systems, especially the ones obtained from the discretization of elliptic operators. The large-scale performance and scalability of AMG depend on two linear algebra primitives, matrix-vector product (MATVEC) and matrix-matrix product (MATMULT). MATVEC is the dominant operation in the solve phase of AMG and an important one in the setup phase. For large linear systems, the communication during MATVEC becomes significant and is usually the bottleneck for scalability. We propose four ways to reduce the cost of MATVEC.

A key bottleneck for the scalability of the setup phase of AMG is the MATMULT required for computing the sparse grid representation. While significant work has been done on optimizing sparse MATMULT, in our solver, we optimize it in the context of the matrix multiplications encountered in the setup phase of AMG. Specifically, we designed a new divide and conquer sparse MATMULT, that is scalable and makes use of the specific structure of the restriction and prolongation matrices. This combined with an efficient communication pattern during parallel MATMULT provides better scalability for our AMG setup.

16:30-18:35 Session 9B: Applications (Part 1 of 2)
Location: Bighorn B
Application of Algebraic Multigrid Preconditoned Block Solvers in a Validation Study of Simulating Human Heart Function

ABSTRACT. Simulating total heart function requires computational fluid dynamics (CFD) as well as sophisticated almost incompressible nonlinear elasticty solvers. Finite elements are the natural choice for solving nonlinear mechanics problems. In recent years the attention for finite element based CFD solvers has also increased. The discretization of both problems lead to saddle-point like algebraic equations which have to be solved in a robust way. In this study we present our results stemming from two validation studies for our software CARP conducted in collaboration with the Medical University of Vienna, Austria and the University of Graz, Austria. The CFD equations are discretized using a residual based variational multiscale (RBVMS) approach whereas the elasticity equations are discretized using the MINI element as a simple stable finite element pairing. In both cases algebraic multigrid preconditioned GMRES was used to solve the resulting algebraic systems. We will demonstrate the effectiveness, and scalability of our software. This research was supported by BioTechMed Graz and NAWI Graz.

Multigrid for Bundle Adjustment

ABSTRACT. Bundle Adjustment is a nonlinear least-squares optimization problem used in computer vision to align images and create maps. The common optimization method uses repeated solves of linear systems with a special block structure. Usually choices for the linear system solver are Cholesky factorization or conjugate gradients with block jacobi preconditioning. However, as problem sizes become bigger (pushed, for example, by the need of maps for self driving cars), existing methods start to falter. We examine using existing smoothed aggregation Multigrid methods for these linear systems and propose some improvements for this specific problem.

Nitsche's Method for Abstract Variational Constrained Minimization, with Applications to Plates and Shells.

ABSTRACT. Multigrid methods aimed at elliptic PDEs are among the fastest solution techniques known today. Kirchhoff-Love plates and shells are governed by such elliptic PDEs, but the penalty or Lagrange multiplier methods often used for enforcing boundary conditions result in either badly ill-conditioned matrices or less tractable saddle point formulations. In fact, these difficulties arise for the much larger class of variationally constrained minimization problems. This talk presents a new framework that uses Nitsche's method to enforce boundary conditions for a broad class of variationally constrained minimization problems suitable for any PDE arising from an Euler-Lagrange equation. The key components in the framework are a generalized Green's identity, a trace inequality, and a generalized Cauchy-Schwarz inequality. Using the proposed method we arrive at SPD systems that are better suited to multigrid techniques.

Specifically with an eye towards spline-based isogeometric discretizations of elasticity, we show that our approach recovers an existing Nitsche formulation of the Kirchhoff-Love plate and generalizes to a new Nitsche formulation for the Kirchhoff-Love shell. Moreover, we show that our approach leads to a symmetric positive definite linear system with variationally consistent weak-enforcement of Dirichlet- and Neumann-type boundary conditions. This is accomplished by the introduction of "ersatz forces" and their complementary "corner forces" arising from integration by parts along the domain boundary. In addition, the abstract variational framework provides lower-bound estimates for the penalty parameter, which is the culprit of parasitic ill-conditioning in classical penalty methods. These lower-bound estimates rely on trace inequality constants approximated by way of a generalized eigenproblem that ensures coercivity of the Nitsche formulation. Using the method, we prove optimal convergence rates in the energy norm and in lower-order Sobolev norms with respect to polynomial degree, and we present supporting numerical results for a range of shell problems.

Multigrid Optimization for Large-Scale Ptychographic Phase Retrieval

ABSTRACT. Ptychography is a popular imaging technique that combines diffractive imaging with scanning microscopy. The technique consists of a coherent beam that is scanned across an object in a series of overlapping positions, leading to reliable and improved reconstructions. Ptychographic microscopes allow for large fields to be imaged at high resolution at the cost of additional computational expense. In this work, we propose a multigrid-based optimization framework to reduce the computational burdens of large-scale ptychographic phase retrieval. Our proposed method exploits the inherent hierarchical structures in ptychography through tailored restriction and prolongation operators for the object and data domains. Our numerical results show that our proposed scheme accelerates the convergence of its underlying solver and outperforms the Ptychographic Iterative Engine, a workhorse in the optics community.