View: session overviewtalk overview

08:00 | Three Multigrid Interpretations of the Parareal Algorithm SPEAKER: Martin J. Gander ABSTRACT. The original parareal algorithm is a two level method. If one wants to extend it to a multilevel method, it is instructive to first interpret the parareal algorithm as a two grid method, and to identify the corresponding multigrid components like the smoother, the restriction and prolongation and the coarse grid operator. I will explain three different such interpretations in my presentation, which all can be used to obtain multilevel variants of the parareal algorithm. The third variant leads to the MGRIT algorithm when the F-smoother is replaced by an FCF-smoother. |

08:25 | Space-Time Finite Element Solvers of Parabolic Evolution Problems with Non-smooth Solutions ABSTRACT. We consider locally stabilized, conforming finite element schemes on completely unstructured simplicial space-time meshes for the numerical solution of parabolic initial-boundary value problems with variable coefficients that may be discontinuous in space and time. Discontinuous coefficients, non-smooth boundaries, changing boundary conditions, non-smooth or incompatible initial conditions, and non-smooth right-hand sides can lead to non-smooth solutions. We present new a priori estimates for low-regularity solutions. In order to avoid reduced convergence rates appearing in the case of uniform mesh refinement, we also consider adaptive refinement procedures based on residual a posteriori error indicators. The huge system of space-time finite element equations is then solved by means of GMRES preconditioned by algebraic multigrid. In particular, in the 4D space-time case that is 3D in space, simultaneous space-time adaptivity and parallelization can considerably reduce the computational time. The space-time finite element solver was implemented in the framework of MFEM. We present different numerical experiments. The numerical results nicely confirm our theoretical findings. The parallel version of the code shows an excellent parallel performance. |

08:50 | Multilevel convergence theory for multigrid-reduction-in-time (MGRIT) SPEAKER: Andreas Hessenthaler ABSTRACT. Parallel-in-time integration methods explore parallelism in the temporal domain and can offer a reduction in wall clock time compared to sequential time stepping algorithms. One such method is multigrid-reduction-in-time (MGRIT), which is an O(N) solver that employs a hierarchy of time grids and a parallel, iterative coarse-grid correction scheme based on multigrid reduction [1]. In this talk, we will present a multilevel convergence framework for MGRIT [2] as a generalization of previous two-grid estimates [3,4]. The framework provides a priori upper bounds on the convergence of MGRIT V- and F-cycles with F- and FCF-relaxation for linear partial differential equations, under the assumption that the time stepping operators on each time grid level are stable and simultaneously diagonalizable. Residual and error propagation operators are derived, analyzed directly and bounded in norm, both numerically and analytically. We further present analytic formulae for approximating the convergence factor of multilevel V-cycles with F- and FCF-relaxation. The theoretical results are validated using parabolic and hyperbolic model problems and A-stable and L-stable Runge-Kutta time integration schemes of orders 1-4. The sharpness of the proposed bounds is demonstrated, highlighting their benefits for exploring MGRIT performance a priori. We further discuss the potential of the theory with respect to developing extensions of MGRIT for current and future applications. [1] Falgout et al. SISC (2014). [2] Hessenthaler et al. In preparation, LLNL-JRNL-763460-DRAFT. [3] Dobrev et al. SISC (2017). [4] Southworth. Submitted. This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. |

09:15 | Multigrid Reduction in Time preconditioning for parallel-in-time Krylov solver SPEAKER: Ryo Yoda ABSTRACT. In a massively parallel environment, the space direction parallelism of time integration is exhausted rapidly. Multigrid Reduction in Time (MGRIT) is an algorithm for extracting the time direction parallelism of the time integration. This is a multilevel method of constructing a coarse level in the time direction with an enlarged time-step width. An enlarged time-step width commonly results in unstable problems. We propose the method that uses MGRIT as a preconditioner of the Krylov subspace method. Compared with the MGRIT solver, this proposed method is expected to be more stable. We described the MGRIT preconditioning matrix and confirmed the effect of preconditioning via the eigenvalue distribution. In our test problem, we confirmed that the number of iterations and execution time was reduced to more than 50\%, and showed an example of becoming a more stable solver. |

09:40 | Space-time adaptivity with multigrid reduction in time for the compressible Navier-Stokes equations SPEAKER: Joshua Christopher ABSTRACT. A parallelized fully-adaptive space-time mesh refinement algorithm using multigrid reduction-in-time (MGRIT) is applied to the unsteady compressible Navier-Stokes equations to solve fluids problems. Previously, fully-adaptive space-time methods have primarily used sequential time marching to integrate the time domain. Despite parallelization in the spatial domain, wall clock times have remained high due to the computational cost per time-step and the large number of desired time-steps. Furthermore, architectural trends in high-performance computing have shifted from ever-increasing clock speeds to greater concurrency. This motivates a need to parallelize the time domain. The spatial parallelization consists of a partitioning of the domain into a nested hierarchy of Cartesian grids that is adaptively refined at regions and flow features of interest. The MGRIT algorithm is demonstrated using an explicit time integration scheme applied to Couette flow, where error is compared to the analytic solution and a convergence study is performed. The eventual goal, after further development and optimization, is to conduct a performance comparison against a sequential-in-time version of the algorithm. |

08:00 | Error Analysis of Inline ZFP Compression for Multigrid Methods SPEAKER: Alyson Fox ABSTRACT. Multigrid methods represent a class of scalable and efficient iterative solvers for linear systems, which are typically used in the numerical solution of many large-scale application. In large-scale computations, communication costs have become a limiting factor. We propose using ZFP, a lossy compression algorithm, inline within Multigrid methods to significantly reduce the communication costs while maintaining sufficient accuracy. Due to the local structure of ZFP, ZFP can easily be integrated into numerical simulations without changing the underlying algorithms. However, since ZFP is a lossy algorithm it will introduce some error, thus, it is important to understand if the error caused by ZFP overwhelms other traditional sources of error, such as discretization error. The goal of this work is to analyze the stability of using ZFP in Multigrid methods. We use previous error of ZFP to establish error bounds for the continual use of ZFP when integrated into Multigrid methods. |

08:25 | Fast and backward stable parallel MGS-GMRES SPEAKER: Katarzyna Swirydowicz ABSTRACT. Krylov subspace solvers are widely used in applications. Modern HPC architectures based on accelerators are changing the solver performance landscape. Scalable Krylov and AMG algorithms are of paramount importance. In this talk, we present new results on low-synchronization variants of GMRES. The communication bottleneck in the parallel implementation of GMRES is the Gram-Schmidt process. Our new approach completely eliminates this bottleneck. In the talk we show how the combination of the low-synchronization Gram-Schmidt and fine-tuned parallel implementation yields significant speedups. To speed up the computations further, we accelerate a fast GPU implementation of GMRES with a lightweight HYPRE-AMG preconditioner. We also extend our work to other Krylov iterative solver algorithms that require orthogonality among vectors. |

08:50 | Optimizing communication of Krylov subspace methods using blocking and pipelining techniques SPEAKER: Nils-Arne Dreier ABSTRACT. In exascale programs the communication overhead will more and more increase as more processors are getting involved. Especially in Krylov subspace methods, global communications, like computing a scalar product are getting more expensive and introduce global synchronization. We are tackling this problems by combining the concepts of block Krylov subspace methods and pipelining techniques. The former decreases the communication by computing multiple scalar products in one communication scheme. The latter optimizes the computation by overlapping communication and computation. Block Krylov subspace methods appear in several contexts, like multiple right-hand sides, multiple preconditioners or so-called enlarged Krylov subspace methods. In this talk we compare and discuss these methods and highlight some practical aspects. Furthermore we will show some technical details of the implementation in the C++ framework DUNE. |

09:15 | adaFACx - smoothed restriction to damp additive multigrid SPEAKER: Charles Murray ABSTRACT. A prominent challenge Multigrid faces on parallel computers is limited concurrency due to small coarse grid solves. Additive Multigrid can achieve greater parallelism than Multiplicative Multigrid since smoothing on coarser levels is decoupled from that on finer levels. As the number of coarse levels increases, stability decreases however due to additive coarse grid levels duplicating the smoothing of errors. We could damp corrections based on their level but then the rate of convergence might suffer. Inspired by AFACx and MultAdditive Multigrid we propose the introduction of an additional grid space using a smoothed intergrid transfer operator. Corrections from this additional grid damp the additive corrections of specific resolution levels reducing overcorrection. Further stability is gained from BoxMG intergrid transfer operators. Our use of HTMG to realise a Full Approximation Storage implementation allows a straightforward realisation of arbitrary dynamic AMR on geometric multiscale grids with algebraic operators. Fine grid rediscretisation with algebraic coarse grids means we can obtain a Quasi Matrix-free implementation. The code realises a single-touch policy. Our approach shows promise in diffusion equations with complex discontinuous material parameters. |

09:40 | Toward interoperability support between scalable linear solver libraries in hypre SPEAKER: Sarah Osborn ABSTRACT. Hypre is a software library that provides high performance multilevel preconditioners and solvers for the solution of large sparse linear systems on massively parallel computers. One of its unique features is the notion of conceptual (linear system) interfaces, which allow users to access hypre in the way they naturally think about their problems. However, there are other independently developed packages that also provide scalable linear solvers and preconditioners using different underlying data structures. There is a need for interoperability among linear solver libraries, which will easily allow for more solvers to be available to users and facilitate comparison between solver performance. This talk will present current progress and plans for developing interoperability support in hypre. This feature will generate the necessary data structures within the hypre framework for a user to call sequential and distributed LU factorization routines from SuperLU and SuperLU_Dist, and multigrid and Krylov solvers from PETSc and Trilinos. Numerical results will be presented to demonstrate the performance of this new functionality. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. |

10:25 | Learning Prolongation for Diffusion Problems with Random Coefficients SPEAKER: Irad Yavneh ABSTRACT. The overwhelming surge in world-wide interest and development of Machine Learning techniques has led to the availability of powerful tools for Neural Network (NN) based learning. These new tools are becoming more and more prominent in the solution of hard problems in artificial intelligence, including computer vision, speech and text understanding, and they are propagating into many other areas of science and engineering. We propose to harness NN tools for automatically learning multigrid components (prolongation, restriction, relaxation, and ultimately coarse-grid variables). As a first step, we construct NN architectures for learning rules for devising prolongation operators for two-dimensional diffusion problems with random coefficients on structured grids, locally based on elements of the discretization matrix. The prolongation rules are learned in an offline training phase, imposing the standard 9-point prolongation sparsity pattern of bilinear interpolation. The resulting solver is then used efficiently with no further learning for any PDE within the prescribed class. Numerical experiments demonstrate that the resulting solver exhibits faster convergence factor per cycle than the classical Black-Box multigrid algorithm. We discuss the prospects of learning other components, such as the relaxation operator, and extending our approach to unstructured grids. |

10:50 | Multilevel Methods for Neural Networks SPEAKER: Colin Ponce ABSTRACT. Neural networks have become an extremely popular machine learning technique in the past decade, having effectively solved a number of problems previously considered difficult. Despite this success, it is computationally challenging to scale to large neural networks. Properly training at scale sometimes requires weeks, even on sizeable computer clusters. This computational bottleneck limits the utility of such neural networks to those with access to such computational resources, and even then constrains the number of experiments one can run at scale. The current workhorse approach to training neural networks is stochastic gradient descent. It is in some sense not particularly surprising that this technique scales poorly to scale neural networks, as it is somewhat analogous to the one-level Jacobi iteration for linear systems of equations. In this talk, we will present a technique for constructing a hierarchy of neural networks and using this hierarchy to perform a multilevel training iteration. We coarsen by aggregating neurons within a layer, and we smooth using the traditional stochastic gradient descent. We will discuss recent results, as well as future work. |

11:15 | Can we do better? Using machine learning to optimize multigrid methods SPEAKER: Alexander Katrutsa ABSTRACT. This presentation discusses recent results on using machine learning approach to optimize parameters of the multigrid method. As parameters of multigrid method we consider restriction/prolongation operators and relaxation parameters in post- and pre-smoothing iterative processes. The main idea of the proposed method is to represent one iteration of multigrid method in the form of a computational graph such that every node represents elementary operation (e.g. matrix by vector product or Galerkin projection) that can be differentiated w.r.t. parameters. After that, we introduce stochastic functional which is an unbiased estimation of upper bound for spectral radius of the iteration matrix of multigrid method. Thus, we state minimization problem to find parameters of multigrid method such that convergence speed to be as fast as possible. To solve the stated minimization problem we use the modern modification of stochastic gradient descent. The stochastic gradient of the proposed functional can be computed with automatic differentiation tool like Autograd. Although the cost of the computational procedure in its current form is very high, it allows us to show that even for simple problems we can find better parameters in the standard multigrid methods. The developed framework can be considered as a numerical tool for the justification of "how optimal" are the parameters used. Development of fast (surrogate) models for computation of these optimal parameters is a topic of future research. |

11:40 | SPEAKER: Jonas Schmitt ABSTRACT. For many linear and nonlinear systems that arise from the discretization of partial differential equations the construction of an efficient multigrid solver is a challenging task. Here we present a novel approach for the optimization of geometric multigrid methods that is based on evolutionary computation, a generic program optimization technique inspired by the principle of natural evolution. A multigrid solver is represented as a tree of mathematical expressions which we generate based on a tailored grammar. The quality of each solver is evaluated in terms of convergence and compute performance using automated Local Fourier Analysis (LFA) and roofline performance modeling, respectively. Based on these objectives a multi-objective optimization is performed using strongly typed genetic programming with a non-dominated sorting based selection. To evaluate the model-based prediction and to target concrete applications, scalable implementations of an evolved solver can be automatically generated with the ExaStencils code generation framework. We demonstrate our approach by constructing multigrid solvers for Poisson's equation with constant and variable coefficients. |

10:25 | PCPATCH - A new preconditioner in PETSc to express subspace correction methods SPEAKER: Florian Wechsung ABSTRACT. Small block overlapping, and non-overlapping, Schwarz methods are theoretically highly attractive as multilevel smoothers for a wide variety of problems that are not amenable to point relaxation methods. Examples include monolithic Vanka smoothers for Stokes, overlapping vertex-patch decompositions for H(div) and H(curl) problems, along with nearly incompressible elasticity and augmented Lagrangian schemes. While it is possible to manually program these different schemes, their use in general purpose libraries has been held back by a lack of generic, composable interfaces. We present a new approach to the specification and development such Schwarz methods in PETSc that cleanly separates the topological space decomposition from the discretisation and assembly of the equations. Our preconditioner is flexible enough to support overlapping and non-overlapping, multiplicative and additive Schwarz methods, and can be used to formulate line, and plane smoothers, Vanka iterations, amongst others. We illustrate these new features with some examples utilising the Firedrake finite element library. |

10:50 | A Reynolds-robust augmented Lagrangian preconditioner for the 3D stationary Navier-Stokes SPEAKER: Patrick Farrell ABSTRACT. In Benzi & Olshanskii (SIAM SISC 28 (2006)) a preconditioner of augmented Lagrangian type was presented for the two-dimensional stationary incompressible Navier--Stokes equations that exhibits convergence almost independent of Reynolds number. It relies on a highly specialized multigrid scheme involving a custom prolongation operator and is tightly coupled to the use of piecewise constant finite elements for the pressure. In this talk we present three extensions of their work. First, we adapt their scheme to three dimensions; the element employed by Benzi & Olshanskii is not inf-sup stable in three dimensions, and its obvious generalization is extremely expensive. We propose an alternative finite element for the velocity space and an associated modification of the prolongation operator that is much cheaper. Second, using a newly developed module for subspace correction methods in PETSc, we have released a general implementation of the solver on arbitrary unstructured meshes; to the best of our knowledge no other implementations of the full scheme have been undertaken since the initial development in 2006. Third, we have extended the approach to the Scott-Vogelius element pair, yielding the first Reynolds-robust preconditioner for a Reynolds-robust discretisation of the stationary Navier--Stokes equations. We present numerical results running on up to 25,000 cores demonstrating convergence independent of the mesh and nearly independent of the Reynolds number from Re = 10 to Re = 5000. |

11:15 | Braess-Sarazin relaxation for H(div) and H(curl) SPEAKER: Scott MacLachlan ABSTRACT. Multigrid methods for H(div) and H(curl) are well-known in the literature and in use in many practical applications. Typical approaches for finite-element discretizations using Raviart-Thomas or Nédélec elements are based either on the Arnold-Falk-Winther approach of collective relaxation or the Hiptmair-Xu approach that makes use of mimetic relationships within the de Rham cohomology associated with H(div) and H(curl). In this talk, we consider constrained problems on H(div) and H(curl) and discuss the relationship of Arnold-Falk-Winther and Hiptmair-Xu relaxation to the classical Vanka and Braess-Sarazin relaxation schemes for fluids in this context. In particular, we prove that Hiptmair’s original relaxation schemes for H(div) and H(curl) can be viewed as Braess-Sarazin relaxation schemes. Furthermore, this viewpoint suggests generalizations of the (inexact) Braess-Sarazin framework suitable for use for these problems, which are confirmed to be effective in numerical results. |

11:40 | A monolithic mixed-dimensional multigrid method for single-phase flow in fractured porous media SPEAKER: Carmen Rodrigo ABSTRACT. A monolithic multigrid method is proposed for the efficient numerical solution of single-phase flow problems in fractured porous media. Two-dimensional arbitrary fracture networks with vertical and/or horizontal possibly intersecting fractures are considered. The key point in the design of the multigrid solver is to combine two-dimensional multigrid components (smoother and inter-grid transfer operators) in the porous matrix with their one-dimensional counterparts within the fractures, giving rise to a mixed-dimensional multigrid method. Numerical experiments demonstrate the robustness of the monolithic mixed-dimensional multigrid method with respect to the permeability of the fractures, the grid-size and the number of fractures in the network. |

12:05 | On Scalable Block Preconditiooners for Implicit / IMEX FE Continuum Plasma Physics Models SPEAKER: John Shadid ABSTRACT. John N. Shadid, Roger P. Pawlowski, Edward P. Phillips, Paul T. Lin, Sidafa Conde, R. Tuminaro, SIbu Mabuza Continuum models of plasma physics systems require the solution of the governing partial differential equations describing conservation of mass, momentum, and energy, along with various forms of approximations to Maxwell's equations. The resulting systems are nonsymmetric, strongly nonlinear, and exhibit a significant range of time- and length-scales. To enable accurate and stable approximation of these systems a range of spatial and temporal discretization methods are employed. For finite element methods these include variational multiscale methods and structure-preserving approaches. For time integration two well-structured approaches are fully-implicit and implicit-explicit type methods. The requirement to accommodate disparate spatial discretizations, and allow the flexible assignment of mechanisms as explicit or implicit operators, implies a wide variation in unknown coupling, ordering, and the conditioning of the implicit sub-system. Our approach to help overcome these challenges has been the development of robust, scalable, and efficient fully-coupled physics-based multilevel preconditioned Newton-Krylov type iterative solvers. To discuss the structure of these algorithms, and to demonstrate the flexibility of this approach various forms of magnetohydrodynamic and multifluid electromagnetic plasma models are considered. Results are presented on robustness, efficiency, and the parallel and algorithmic scaling of the methods with weak scaling results up to 1M cores. |

16:30 | Algebraic Multgirid: theory and practice ABSTRACT. This talk focuses on developing a generalized bootstrap algebraic multigrid algorithm for solving linear sparse matrix equations. As a motivation of the proposed generalization, we consider an optimal form of classical algebraic multigrid interpolation that has as columns eigenvectors with small eigenvalues of the generalized eigen-problem involving the system matrix and its symmetrized smoother. We use this optimal form to design an algorithm for choosing and analyzing the suitability of the coarse grid. In addition, it provides insights into the design of the bootstrap algebraic multigrid setup algorithm that we propose, which uses as a main tool a multilevel eigensolver to compute approximations to these eigenvectors. A notable feature of the approach is that it allows for general block smoothers and, as such, is well suited for systems of partial differential equations. In addition, we combine the GAMG setup algorithm with a least-angle regression coarsening scheme that uses local regression to improve the choice of the coarse variables. These new algorithms and their performance are illustrated numerically for scalar diffusion problems with highly varying (discontinuous) diffusion coefficient, Maxwell equations and for the linear elasticity system of partial differential equations. |

16:55 | An Adaptive AMG preconditioner for ill-conditioned structural problems SPEAKER: Carlo Janna ABSTRACT. Across a broad range of structural mechanics applications, the use of numerical simulations is pervasive with the demand for more and more accurate, complex and reliable models exponentially increasing. In this context, the Finite Element Method (FEM) still remains the most widely used approach in biomechanical, civil, aerospace, mechanical, reservoir engineering and so on. To approximate the continuous solution, the partial differential equations (PDEs) governing the mechanical problem are discretized giving rise to an algebraic linear (or linearized) system of equations. The solution of this system is often the most time-consuming part of the entire simulation process and may take up to $70\%-80\%$ of the total computational time [1]. Accelerating the solution to linear systems of equations is of paramount importance, considering that nowadays models with billions of unknowns are targeted. In the context of structural mechanics, models giving rise to severely ill-conditioned systems are quite common. This may happen when the model is characterized by materials with significantly different rheological properties, e.g., when soft tissues and a bone are connected together. Otherwise, a system may be ill-conditioned in case of highly distorted grids, with several elements having non-regular shapes, e.g., when creating an automatic unstructured mesh directly from CAD or discretizing a complex geometry such as a geological formation. Due to their intrinsic ill-conditioning, structural problems are traditionally solved through direct solution methods as they are always reliable and do not require an experienced user. On the other side, iterative methods, whenever a suitable preconditioner is available, are often faster and more scalable on parallel computers. Ill-conditioned structural problems prove generally challenging to classical AMG methods and usually aggregation based AMG is preferred in elasticity [2,3]. This communication focuses on Symmetric Positive Definite (SPD) linear systems and a classical AMG method based on adaptive smoothing and prolongation is proposed as a preconditioner for structural problems. We follow and extend the idea outlined in [4] with the aim of improving its performance and usability. We propose as smoother the Adaptive Factorized Sparse Approximate Inverse (aFSAI) which gives an approximation of the system matrix inverse in factored form. Its main advantage is the flexibility as the smoother strength can be auto-tuned on the problem at hand, by distinction with the commonly used weighted Jacobi and Gauss-Seidel methods. The construction of the coarse problem follows the path of bootstrap and adaptive AMG which is to automatically build an approximation of the near-null space components of the smoothed matrix. In mechanical problems, we construct the test space starting from the rigid body bodes of the model and then use it to compute an affinity-based strength of connections. Finally, the interpolation operator is computed in a least-squares sense to include the test space in its range and maximize sparsity at the same time. The main drawback of this kind of methods, that we are currently addressing in this communication, is their usability, which is not straightforward since they depend on a high number of user-specified parameters. The developed algorithm has been applied to the solution of large linear systems arising from the discretization of several challenging mechanical problems, and the results prove the efficiency and robustness of the method. References: [1] S. Koric and A. Gupta, Sparse matrix factorization in the implicit finite element method on petascale architecture, Computer Methods in Applied Mechanics and Engineering, 302 (2016), 281-292. [2] M. Brezina, C. Tong and R. Becker, Parallel Algebraic Multigrids for Structural Mechanics, SIAM Journal on Scientific Computing, 27 (2006), 1534–1554. [3] M. Adams, Evaluation of three unstructured multigrid methods on {3D} finite element problems in solid mechanics, International Journal for Numerical Methods in Engineering, 55 (2002), 519-534. [4] V.A. Paludetto Magri, A. Franceschini and C. Janna, A novel algebraic multigrid approach based on adaptive smoothing and prolongation for ill-conditioned system, SIAM Journal on Scientific Computing, 41 (2019), A190-A219. |

17:20 | Automatic Construction of AMG Smoothers SPEAKER: Lisa Claus ABSTRACT. Algebraic Multigrid (AMG) is used to speed up linear system solves in a wide variety of applications. This paper is concentrated on expanding AMG’s applicability to important new classes of problems through algorithms that automatically construct advanced smoothing techniques when needed. AMG algorithms are usually designed by first assuming that the smoother is a simple pointwise smoother, then great effort is put into constructing an interpolation and corresponding coarse-grid correction that complements the smoother and leads to fast O(N) convergence. The so-called weak approximation property and basic two-grid theory are used to guide algorithm development. However, for some classes of problems, pointwise smoothers are not sufficient for achieving the desired O(N) computational complexity. In this paper, we use two-grid theory to motivate the development of new algorithms for automatically constructing more complex (non-pointwise) smoothers. In addition, we introduce the idea of automatically constructing complementary interpolation operators. As a relevant application, we consider a curl-curl problem (commonly referred to as the second-order definite Maxwell equations) that often arises in time-domain electromagnetic simulations. We use a Nédélec H(curl)-conforming finite element approach to discretize the problem and demonstrate how our new AMG smoother algorithms recover the well known Arnold-Falk-Winther and Hiptmair smoothers. We also discuss future directions and the potential of these AMG smoothers in more general application settings. |

17:45 | Algebraic Multigrid for Systems of Elliptic Boundary-Value Problems ABSTRACT. This talk presents an algebraic multigrid (AMG) method for solving systems of elliptic boundary-value problems. It is well known that multigrid for systems of elliptic equations faces many challenges that do not arise in most scalar equations. These challenges include strong inter-variable couplings, multi-dimensional and possibly large near-nullspaces, analytically unknown near nullspaces, delicate selection of coarse degrees of freedom (cdofs), complex construction of intergrid operators, etc. On top of these are also non-self-adjointness and well-posedness issues of the system. In this talk, we will assume that the boundary-value operator is one-to-one and onto, and hence, the system is well-posed. We will consider only the selection of cdofs and the construction of the interpolation operator. This selection of cdofs is an extension of the Ruge-Stuben algorithm based on a new strength of connection between the nodal degrees of freedom, i.e. all degrees of freedom located at a gridpoint. It is based on a local correlation matrix measure applied to a set of smoothed test vectors derived from a relaxation-based procedure. With this measure, selection of the cdofs is then determined by the number of strongly correlated connections at each node, and this can be processed by Ruge-Stuben procedure. With this selection, the interpolation operator is constructed using a BAMG procedure. We apply this procedure either over the smoothed test vectors to obtain an inter-variable interpolation scheme, or over the like-variable components of the smoothed test vectors to obtain an intra-variable interpolation. Moreover, defining also a measure on the correlation between only the intra-variable couplings and comparing it to the measure on the full correlation matrix, we also consider mixed intra- and inter-variable interpolation. |

18:10 | Parallel performance of algebraic multigrid with domain decomposition ABSTRACT. The need for constant communication between processors in order to perform distributed matrix-vector multiplications during a multigrid cycle is a significant performance bottleneck for algebraic multigrid (AMG) as core counts continue to grow on modern machines. This talk examines the design, implementation, and parallel performance of a novel algorithm designed specifically to limit communication, Algebraic Multigrid with Domain Decomposition (AMG-DD). The goal of AMG-DD is to provide a low-communication alternative to standard AMG V-cycles by enabling significantly more independent computational work to be done between communication steps. Thus, AMG-DD is particularly well suited to computational environments where the cost of communication is high compared to the cost of computation. Parallel performance results for AMG-DD are shown for a variety of algorithm design choices and for a variety of elliptic PDE problems in 2 and 3 dimensions. |

16:30 | Highly Parallel Additive Multigrid with Scaled Correction SPEAKER: Syam Vangara ABSTRACT. Although highly parallel, convergence rate of additive multigrid is typically observed to be poor in comparison to multiplicative multigrid. In this work, convergence rate of additive multigrid is improved through scaling the correction such that the upper bound of the difference between residuals of two-level additive and multiplicative multigrid cycles is minimized. Proof of convergence for the proposed additive multigrid with scaled correction is presented. Convergence rate, when solving Poisson equation, is estimated through local Fourier analysis and compared to multiplicative multigrid. We also provide examples of application of such a highly parallel algorithm with the improved convergence. |

16:55 | Parallel Multigrid with Adaptive Multilevel hCGA on Manycore Clusters SPEAKER: Kengo Nakajima ABSTRACT. The parallel multigrid method is expected to play an important role in large-scale scientific computing on post-peta/exa-scale supercomputer systems, and it also includes serial and parallel communication processes which are generally expensive. In the previous work [1], new format for sparse matrix storage based on sliced Ellpack-Itpack (ELL) format was proposed for optimization of serial communication in data transfer through memories, and hierarchical coarse grid aggregation (hCGA) was introduced for optimization of parallel communication by message passing. The proposed methods were implemented for pGW3D-FVM, a parallel code for 3D groundwater flow simulations using parallel conjugate gradient (CG) solver with multigrid preconditioner (MGCG). The parallel MGCG solver using the sliced ELL format and hCGA provided excellent performance improvement on the Fujistu FX10 supercomputer system at the University of Tokyo [2]. Because the hCGA can only handle 2-hierarchica-levels, we are developing AM-hCGA (Adaptive Multilevel hCGA) for multiple hierarchical levels (more than three). In this presentation, we will present preliminary results of AM-hCGA on the Oakforest-PACS, Joint Center for Advanced High Performance Computing (JCAHPC) [3], which consists of 8,208 nodes of Intel Xeon Phi (Knights Landing). Effects of pipelined algorithms on conjugate gradient method [4] are also evaluated. [1] Nakajima, K., Optimization of Serial and Parallel Communications for Parallel Geometric Multigrid Method, Proceedings of the 20th IEEE International Conference for Parallel and Distributed Systems (ICPADS 2014) 25-32, Hsin-Chu, Taiwan, 2014 [2] Oakleaf-FX, Information Technology Center, The University of Tokyo, https://www.cc.u-tokyo.ac.jp/supercomputer/fx10/ (in Japanese) [3] Oakforest-PACS, Joint Center for Advanced High Performance Computing (JCAHPC): https://www.cc.u-tokyo.ac.jp/en/supercomputer/ofp/service/ [4] Ghysels, P., Vanroose, W., Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm, Parallel Computing 40, 224-238, 2014 |

17:20 | revisiting Householder orthogonalization scheme for iterative methods ABSTRACT. Orthogonalization schemes are critical for the convergence and the performance of many iterative solvers. (Typically Arnoldi-based Krylov methods.) In this talk we revisit the use of Householder reflections in the context of Krylov solvers. We provide efficient implementations in the context of one right-hand side but also in the context of block iterative methods. Our orthogonalization suite also has application in the context of dense linear algebra. |

17:45 | An efficient asynchronous incomplete LU iteration for convection-diffusion problems on structured grids SPEAKER: Aditya Kashi ABSTRACT. Asynchronous iterations, or chaotic relaxation as proposed initially by Chazan and Miranker, have recently been investigated in the context of fine-grain parallel solvers. Some researchers have investigated the performance of chaotic relaxation and asynchronous ILU preconditioning, mainly focused on solving linear problems from the University of Florida sparse matrix collection. As such, the focus has been on iterations for general matrix structures. For industrial design applications, especially in the computational fluid dynamics community, there are ongoing efforts to find fine-grain parallel implicit solvers for fluid flow problems on structured grids. In such applications, structured grid solvers on traditional computer architectures are reasonably mature. However, structured-grid solvers are also well-suited for porting to massively parallel processor architectures such as graphics processing units and other many-core processors with wide vector units. In this work, we present an asynchronous variant of an ILU(0) factorization well-suited for structured grids. The form of the ILU factorization we use is non-standard, even though it is not new. It is, however, well suited for conversion to an asynchronous iteration, and in our knowledge such a conversion has not been demonstrated before. We show that our asynchronous iteration converges to the ILU(0) factorization whenever the latter exists. We investigate the performance of the solver over a range of number of parallel threads and Peclet numbers (degree of dominance of convection over diffusion). We will demonstrate experimentally that the asynchronous iteration sufficiently converges to the ILU(0) factorization in only very few iterations in many cases. For completeness, we also study the asynchronous application of the ILU factors during the solve phase. |

18:10 | Algorithm Exploitation & Evolving AI/Cognitive Examples on IBM’s Data Centric Systems ABSTRACT. The volume, variety, velocity and veracity of data is pushing how we think about computer systems. IBM Research’s Data Centric Solutions organization has been developing systems that handle large data sets shortening time to solution. This group has created a data centric architecture initially delivered to the DoE labs at the end of 2017 and being completed in 2018. As various features to improve data handling now exist in these systems, we need to begin to rethink the algorithms and their implementations to exploit these features. This data centric view is also relevant for Artificial Intelligence (AI) and Machine Learning (ML). In this talk, I will briefly describe the architecture and point out some of hardware and software features ready for exploitation. I will show how we are using these data centric AI/cognitive computing systems to address some industrial and societal challenges in new ways actually coupling HPC and AI presenting a few case studies. |