View: session overviewtalk overview

09:00 | Algebraic multigrid using a stencil-CSR hybrid format on GPUs ABSTRACT. We present a new sparse matrix format which captures the matrix structure typical for discretized partial differential equations (PDEs) with piecewise-constant coefficients. The format uses a stencil representation for some blocks of matrix rows, typically corresponding to regions with constant coefficients, whereas other rows are encoded in the general Compressed Sparse Row (CSR) format. The stencil representation saves memory and is suitable for SIMD-like parallelism as available on GPUs. Further, we present a proof-of-concept GPU-accelerated aggregation-based algebraic multigrid solver for linear systems whose matrices are represented in the new format. The solver is compared with the CSR-based solver AmgX from NVIDIA on a number of model problems. The results suggest that for systems with over a million of unknowns the solver significantly outperforms AmgX in terms of both run time and memory usage. |

09:25 | Solving dense linear systems on the GPU with hierarchical-matrix approximations PRESENTER: Chao Chen ABSTRACT. Hierarchical matrices find their use in various science and engineering applications. We approximate dense matrices with a hierarchically compressed format, which reduces both the flops and the memory footprint for various linear algebra operations. One advantage of the compressed format is that it enables solving much larger problems thanks to the storage reduction. Another significant advantage is that it enables using batched linear algebra kernels to achieve high performance on GPUs. Consider the solution of linear systems as an example. A significant portion of the calculation can be cast into batched GEMM. As a result, the GPU algorithm achieves significant speedups over CPU algorithms parallelized on multicores. |

09:50 | Reducing Communication Costs in ECG with Optimal Node Aware Communication PRESENTER: Shelby Lockhart ABSTRACT. Krylov methods are a key way of solving large sparse linear systems of equations, but suffer from poor strong scalabilty on distributed memory machines due to high synchronization costs from large numbers of collective communication calls alongside a low computational workload. Enlarged Krylov methods address this issue by decreasing the total iterations to convergence, an artifact of splitting the initial residual that results in operations on block vectors. In this talk, we present a performance study of an Enlarged Krylov Method, Enlarged Conjugate Gradients, noting the impact of block vectors on parallel performance at scale. We introduce a new point-to-point communication approach based on node aware communication techniques that decreases the amount of time spent in the point-to-point communication of block vector values, increasing the efficiency of the method at scale on modern machines. |

10:15 | Resilient Algorithmic Building Blocks for Decentralized Iterative Computing in the Skynet Software PRESENTER: Colin Ponce ABSTRACT. In any distributed, network-connected system today, either large or small, computers of various types are present throughout the system, leading to “edge computing” as an increasingly important paradigm. Often these edge devices are low-powered, inexpensive, and unreliable, either due to harsh physical conditions, the threat of cyber intrusions, or simple cost considerations. Collaborative Autonomy is an emerging class of computational techniques and software that enables these edge devices to self-organize and work together as a collective, without the presence of a single central master computer, to solve computational problems and make decisions. A key principle of Collaborative Autonomy is the focus on inherent algorithmic resilience, so that the collective system can adapt around problems in the computational environment, such as hardware malfunctions, network failures, and cyber intrusions. This algorithmic resilience aims to ensure the system as a whole has a high degree of reliability even when no devices do individually. In this edge computing setting, numerical problems are often best solved through iterative, possibly asynchronous, methods. In this talk, we will present the Skynet software under development at LLNL and its novel approach to enabling algorithmic resilience for decentralized iterative methods. Specifically, Skynet focuses on providing a suite of resilient variants of algorithmic “building blocks,” such as reduce and averaging methods, that are widespread across iterative algorithms. By implementing an iterative method using this building-block interface, a user can often achieve algorithmic resilience to problems in the computational environment “for free” with little-to-no additional implementation or algorithm design work. We will present a number of common algorithmic building blocks supported by the Skynet software and will show experiments demonstrating the impact of making these building blocks resilient. Funded by LLNL LDRD projects 21-FS-007 and 22-ERD-045. Prepared by LLNL under Contract DE-AC52-07NA27344. LLNL-ABS-xxxxxx. |

10:40 | Optimal Size of the Block in Block GMRES on GPUs: Computational Model and Experiments PRESENTER: Andrew Higgins ABSTRACT. The block version of GMRES (BGMRES) is most advantageous over the single right hand side (RHS) counterpart when the cost of communication is high while the cost of floating point operations is not. This is the case on modern Graphics Processing Units (GPUs), while it is generally not the case on traditional Central Processing Units (CPUs). In this talk, experiments on both GPUs and CPUs are shown that compare the performance of BGMRES against GMRES as the number of RHS increases. The experiments indicate that there are many cases in which BGMRES is slower than GMRES on CPUs, but faster on GPUs. Furthermore, when varying the number of RHS on the GPU, there is an optimal number of RHS where BGMRES is clearly most advantageous over GMRES. A computational model is developed using hardware specific parameters, showing qualitatively where this optimal number of RHS is, and this model also helps explain the phenomena observed in the experiments. |

09:00 | Preconditioning for linear systems arising in implicit time integration schemes for Maxwell equations on locally refined spatial grids PRESENTER: Pratik Mahadeo Kumbhar ABSTRACT. In this talk, we consider the efficient implementation of implicit time integration schemes for spatially discretized linear Maxwell equations on locally refined meshes. In particular, our interest is in problems where only a few of the mesh elements are small while the majority of the elements is much larger. We suggest to approximate the solution of the linear systems arising in each time step by a preconditioned Krylov subspace method, e.g., the quasi minimal residual method by Freund and Nachtigal [1]. Motivated by the analysis of locally implicit methods by Hochbruck and Sturm [2], we show how to construct a preconditioner in such a way that the number of iterations required to achieve a certain accuracy is bounded independently of the diameter of the small mesh elements. The proof is done by using Faber polynomials and complex approximation theory. The cost to apply the preconditioner consists of the solution of a small linear system, whose dimension corresponds to the degrees of freedom within the fine part of the mesh (and its next coarse neighbors). If this dimension is small compared to the size of the full mesh, the preconditioner is very efficient. We conclude by verifying our theoretical results with numerical experiments for a fourth order Gauss-Legendre Runge--Kutta method. References: [1] R. W. Freund, and N. M. Nachtigal, QMR: a quasi-minimal residual method for non-Hermitian linear systems, Numer. Math., 60 (1991), pp. 315-339. [2] M. Hochbruck, and A. Sturm, Error Analysis of a Second-Order Locally Implicit Method for Linear Maxwell’s Equations, SIAM J. Numer. Anal., 54 (2016), pp. 3167-3191. |

09:25 | Iteration Methods for Nonlinear Frequency-Dependent Phonon Transport Problems PRESENTER: Nicholas Whitman ABSTRACT. Mesoscopic heat transfer models are based on the nonlinear Boltzmann transport equation (BTE) for phonons. The phonon BTE is an integro-differential equation for a multidimensional single-particle distribution function that depends on the position, phonon frequency, direction of motion, phonon polarization, and time. We consider steady-state models with the single-mode relaxation-time (SMRT) approximation for the collision integral of the phonon BTE. The discretization of the phonon BTE leads to a very large system of nonlinear equations which requires efficient iteration methods. In this study we apply a linearization approach, borrowed from proven and frequently utilized methods for thermal radiation transport problems in high-energy density physics, to iteratively solve non-linear temperature-and-frequency dependent phonon transport problems – simulating heat transport at the microscale. The resulting phonon intensities are consistent with the material temperature. The linearization approach requires iteration on both the pseudo-scattering source (as an inner iteration) and the material temperature (as an outer iteration). Numerical investigations have revealed that the iterative performance significantly degrades as the acoustic thickness of the problem increases. As expected, the unaccelerated inner pseudo-scattering iterations were inefficient due to the purely scattering nature of the problems. The outer material temperature iterations, based on a Newton’s method approach, became extremely inefficient as well. A gray diffusion synthetic acceleration (DSA) methodology was applied to accelerate the convergence rate of the pseudo-scattering source iterations. Gray DSA method was found to be highly beneficial at reducing the number of inner iterations required per temperature iteration. To address the slow convergence on the outer material temperature iteration result, an Anderson acceleration (AA) scheme was employed. The obtained results showed that the AA scheme reduces the number of material temperature iterations required to reach convergence by about a factor of two. By combining both the Gray DSA and AA, we have been able to demonstrate reductions in the computational time required by a factor of 60 – 130. These findings highlight a major difference in the linearization method’s performance compared to that previously demonstrated in thermal radiation transport problems. An important distinction between these two physics is the phonon mean-free-path’s dependence on temperature. As temperature increases, the phonon mean-free-path generally decreases whereas in thermal radiation transport the photon mean-free-path increases. We argue that this is the primary reason for the difference in observed iterative efficiency when these acceleration techniques are applied to phonon and thermal radiation transport. To demonstrate the performance of the proposed iteration method, we will present numerical results for multi-group phonon transport problems. |

09:50 | On constructing a physics-based initial iterate for solving drift-diffusion equations PRESENTER: Hengbin An ABSTRACT. The drift-diffusion equations are a class of nonlinear systems in the application of semiconductor devices. High efficient solution methods for solving these equations play a significant role in numerical simulations. Some nonlinear iterative methods, including Newton method and Gummel method, are usually used to solve the discretized drift-diffusion equations. Therefore, a nonlinear initial iterate should be given. The nonlinear initial iterate has a strong influence on the solution method. Typically, the related physical values in the equilibrium state, including the potential, the electron concentration, and the hole concentration, are used as the initial iterate for solving the equations with a given bias voltage. Due to the strong nonlinearity, by using the typical initial iterate, the nonlinear iterative methods converge slowly or even diverge when the given bias voltage is large. In this paper, an efficient initial iterate is constructed by exploiting the application features of the drift-diffusion equations. The key idea is to construct an approximate equation of the drift-diffusion equations by neglecting the recombination effect and casting the effects of the bias voltage on carriers' quasi-Fermi potential. And the initial iterate is obtained by solving the approximate equation. Some physical interpretations and theoretical analyses for the proposed method are given. Numerical results show that the proposed initial iterate is effective, and the number of nonlinear iterations can be reduced by about 90\% at most compared to the typical initial iterate. In some cases, the nonlinear iteration does not converge when the typical initial iterate is used, while the iteration converges if the new initial iterate is used. |

10:15 | Flexible infinite GMRES for parameterized linear systems PRESENTER: Siobhan Correnty ABSTRACT. We consider parameterized linear systems $A(\mu) x(\mu) = b$ for many different $\mu$, where $A(\mu)$ is large, sparse and nonsingular, with a nonlinear analytic dependence on $\mu$. In this work we propose to compute a partial parameterization $\tilde{x} \approx x(\mu)$, where $\tilde{x}(\mu)$ is cheap to evaluate for many values of $\mu$. Our method is based on a companion linearization (without truncation) where $\mu$ appears only linearly. This allows us to combine the well-established Krylov subspace method for linear systems, preconditioned GMRES, with algorithms for nonlinear eigenvalue problems (NEPs) to generate a basis for the Krylov subspace. Moreover, in this work we consider a flexible framework for GMRES, allowing for an inexact iteration dependent application of the preconditioner. This novel approach leads to a significant improvement in complexity over the method infinite GMRES, as proposed in [Jarlebring, Correnty 2021]. The competitiveness of the method is illustrated with large-scale problems arising from a finite element discretization of a Helmholtz equation with parameterized material coefficient. |

10:40 | The role of conservative and consistent iterative methods for conservation laws PRESENTER: Viktor Linders ABSTRACT. Iterative methods play an integral part in science, not least as a way of approximating the solutions to implicit discretizations of physical models. Frequently, the success of such discretizations relies on the ability to preserve physical invariants such as mass, momentum, energy and/or entropy. While this serves as a design principle for discretizations of partial differential equations, iterative methods are typically not designed based on such ideas. In this talk we consider the application of iterative methods in the context of hyperbolic conservation laws, and in particular fluid flow models. We focus on the question whether iterative methods conserve mass, both globally and locally, and what implications mass conservation has on the convergence of the iterations. As it turns out, many commonly used methods are able to preserve the global conservation of an underlying implicit scheme. This includes pseudo-time iterations, Newton's method, Krylov subspace methods etc. However, there are prominent exceptions, in particular the Jacobi and Gauss-Seidel iterations. The stronger requirement on the discretization of local conservation is a critical ingredient in the important Lax-Wendroff theorem, which expresses conditions under which the limit of a convergent discretization is a weak solution of the conservation law. We show that local conservation is preserved by pseudo-time iterations, Krylov subspace methods and Newton’s method, under certain restrictions on the space-time discretization. Further, we present an extension of the Lax-Wendroff theorem that incorporates pseudo-time iterations. The proof reveals that the resulting method in general is inconsistent, unless special care is taken in the choice of pseudo-time steps. This inconsistency materializes in the form of slowed down time. A simple technique based on the explicit Euler method can alleviate the inconsistency introduced by the pseudo-time iterations. Further, the same technique can be used to generate improved initial guesses for other iterative methods. We explore the technique in the context of Newton and Newton-Krylov methods to gain new insights into the role of consistency and convergence for iterative methods. |

09:00 | AMG-preconditioned primal-dual Newton-Krylov solvers for viscous-plastic sea-ice models ABSTRACT. We present a scalable, robust and convergent solver for the viscous-plastic Hibler seaice model, which is commonly used to model floating ice on large scales with a continuum model. The Hibler model involves a viscous-plastic constitutive relation to model the continuous deformation of sea-ice as well as its breakup, which results in so- called leaks. The strong nonlinearity in the rheology makes the robust numerical solution of these models challenging. We propose a novel Newton method to for the viscous-plastic momentum equation whose cost per iteration is identical to the cost of standard Newton or Picard methods. In contrast to existing method, it converges faster and robustly with respect to mesh refinement, and thus allows high-resolution and fully resolved seaice modeling. Combined with an algebraic multigrid-based preconditioned Krylov method for the Newton linearized systems, which have highly varying coefficients, the resulting solver scales well and can be used in parallel. We present highly resolved benchmark solutions and solve problems with up to 8.4 million spatial unknowns. |

09:25 | ILU Smoothers for C-AMG with Scaled Triangular Factors PRESENTER: Kasia Swirydowicz ABSTRACT. ILU smoothers can be effective in the algebraic multigrid $V$-cycle. However, direct triangular solves are comparatively slow on GPUs. Previous work by Chow and Patel (2015) and Antz et al. (2015) proposed an iterative approach to solve these systems. Unfortunately, when the Jacobi iteration is applied to highly non-normal upper or lower triangular factors, the iterations will diverge. An ILU smoother is introduced for classical Ruge-Stuben C-AMG that applies row and/or column scaling to mitigate the non-normality of the upper triangular factor. Our approach facilitates the use of Jacobi iteration in place of the inherently sequential triangular solve. Because the scaling is applied to the upper triangular factor as opposed to the global matrix, it can be done locally on an MPI-rank for a diagonal block of the global matrix. Numerical results and parallel performance are presented for the Nalu-Wind pressure continuity and PeleLM nodal projection pressure solvers. An ILUT Schur complement smoother, based on Xu et al (2021), that iteratively solves the Schur system along subdomain (MPI rank) boundaries using GMRES, is also introduced to maintain a constant iteration count and improve strong-scaling. For large problem sizes, GMRES+AMG with iterative triangular solves executes at least five times faster than when using direct solves on the ORNL Summit supercomputer. |

09:50 | Investigation of a Preconditioned Conjugate Gradient Method applied to Navier-Stokes Discretizations ABSTRACT. This paper concerns the investigation of a Preconditioned Conjugate Gradient Method developed to solve Divergence Conforming Navier-Stokes Discretizations. A previously constructed Multigrid Scheme is integrated as a preconditioner to the Conjugate Gradient Method. This Multigrid approach harnesses Isogeometric Analysis with divergecne conforming B-Splines as basis functions. 24 numerical simulations are analyzed to determine if Preconditioned Conjugate Gradient is an effective and robust solution method for the tested discretizations (the Generalized Oseen Flow problem). Additionally, the performance of this method was compared to a baseline Conjugate Gradient code to determine the utility of adding the complexity of preconditioning. It was determined that, for this problem, preconditioning is robust and provides several advantages over baseline Conjugate Gradient Method, including order of magnitude decreases in iterations to convergence. |

10:15 | ROM Closures and Stabilizations for Under-Resolved Turbulent Flows PRESENTER: Traian Iliescu ABSTRACT. In this talk, I will survey reduced order model (ROM) closures and stabilizations for under-resolved turbulent flows. Over the past decade, several closure and stabilization strategies have been developed to tackle the ROM inaccuracy in the convection-dominated, under-resolved regime, i.e., when the number of degrees of freedom is too small to capture the complex underlying dynamics. I will present regularized ROMs, which are stabilizations that employ spatial filtering to alleviate the spurious numerical oscillations generally produced by standard ROMs in the convection-dominated, under-resolved regime. I will also survey three classes of ROM closures, i.e., correction terms that increase the ROM accuracy: (i) functional closures, which are based on physical insight; (ii) structural closures, which are developed by using mathematical arguments; and (iii) data-driven closures, which leverage available data. Throughout my talk, I will highlight the impact made by data on classical numerical methods over the past decade. I will also emphasize the role played by physical constraints in data-driven modeling of ROM closures and stabilizations. |

10:40 | Residual Tracking and Stopping for Iterative Random Sketching PRESENTER: Nathaniel Pritchard ABSTRACT. Iterative random sketching (IRS) offers a computationally expedient approach to solving linear systems. However, IRS' benefits can only be realized if the procedure can be appropriately tracked and stopped---otherwise, the algorithm may stop before the desired accuracy is achieved, or it may run longer than necessary. Unfortunately, IRS solvers cannot access the residual norm without undermining their computational efficiency. While iterative random sketching solvers have access to noisy estimates of the residual, such estimates turn out to be insufficient to generate accurate estimates and confidence bounds for the true residual. Thus, in this work, we propose a moving average estimator for the system's residual, and rigorously develop practical uncertainty sets for our estimator. We then demonstrate the accuracy of our methods on a number of linear systems problems. |