View: session overviewtalk overview
08:00 | A finite-grid stable implicit particle-in-cell algorithm for gyrokinetic electromagnetic simulations of magnetized plasmas PRESENTER: Benjamin Sturdevant ABSTRACT. Gyrokinetic simulation models have emerged as the standard approach for studies of low-frequency (compared to gyrofrequency) dynamics in well-magnetized plasmas. In such regimes, gyrokinetic models can substantially relax resolution requirements in simulations due to an analytical elimination of the shortest/fastest spatio-temporal scales supported in the Vlasov-Maxwell system. Numerical robustness, however, is a long-standing challenge for gyrokinetic particle-in-cell (PIC) simulations. For example, there is a well-known cancellation problem in Ampère’s law for electromagnetic gyrokinetic PIC simulations in high-beta (ratio of plasma pressure to magnetic pressure) regimes. At low-beta, simulations are plagued by both restrictive CFL conditions due to the shear Alfvén wave and by finite-grid (aliasing) instabilities. Here, we present an implicit PIC method based on the work of G. Chen and L. Chacón [1], which is free of the Ampère cancellation problem and can overcome CFL restrictions due to the shear Alfvén wave. In addition, finite-grid instabilities are eliminated by enforcing discrete charge conservation [2,3]. To handle the resulting system of nonlinear equations from the implicit discretization, we employ the strategy outlined in [1] to formulate the system in terms of field equation residuals. This formulation requires far less solver memory than an equivalent residual formulated in terms of particle quantities. A preconditioned Picard iteration scheme, accelerated with Anderson mixing, is then applied. The preconditioner is derived from an electron fluid model, which captures the stiff modes of the kinetic system, and includes additional effects to account for the numerics of the PIC method. [1] G. Chen, L. Chacón, Comput. Phy. Comm. 197, 73-87, 2015. [2] B. Sturdevant, S. Ku, L. Chacón, et al., Phys. Plasmas, 28, 072505 (2021). [3] B. Sturdevant and L. Chacón, J. Comput. Phys., 464, 111330 (2022). |
08:25 | An implicit particle code with exact energy and charge conservation for dynamically compressing plasmas: Solver scaling and performance PRESENTER: Justin Angus ABSTRACT. A collisional particle code based on implicit energy- and charge-conserving methods is used to study the dynamic compression of plasmas, a phenomenon typical in many areas of plasma physics including inertial confinement fusion and Z-pinch plasmas. The particle-suppressed Jacobian-Free Newton-Krylov (JFNK) method is implemented as a fixed-point iteration scheme for the particle positions. In this implementation, the computational cost of a function evaluation in the linear stage of JFNK is reduced by changing an operation that scales with the number of particles to a grid-based operation using mass matrices. Additionally, the elements of the mass matrices are used to construct an exact kinetic-based preconditioner. The algorithm is implemented in PICNIC, an MPI-based code for particle methods. The preconditioned JFNK method is implemented using the SNES/KSP/PC modules in PETSc. The preconditioner matrices are solved using the Additive Schwarz method (ASM) with local incomplete LU (ILU) on each block. Near ideal wall time improvements are obtained with increasing time step size for values that under-resolve the plasma period and the CFL associated with light waves by factors of 100. Steps taken to achieve this performance are discussed. Finally, future work to implement this method in the GPU-capable WarpX particle code is discussed. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 and was supported by the LLNL-LDRD Program under Project No. 23-ERD-007. LLNL-ABS-859702 |
08:50 | A multiscale hybrid particle-Maxwellian Coulomb-Collision Algorithm for Hybrid Kinetic-Fluid Simulations PRESENTER: Guangye Chen ABSTRACT. In many plasma systems, Coulomb collisions, the primary interactions between charged particles, often exhibit significant time-scale separation. Traditional Monte Carlo (MC) methods [1] have a timestep accuracy constraint ν∆t ≪ 1 to resolve the collision frequency (ν) effectively [2]. This constraint becomes particularly stringent in scenarios involving self-collisions in the presence of high-Z species or inter-species collisions with substantial mass disparities, rendering such simulations extremely expensive or impractical. To overcome these challenges, we propose a multiscale hybrid particle-Maxwellian model for hybrid kinetic-fluid simulations. Specifically, we introduce a collisional algorithm that utilizes Maxwellians (i.e., isotropic Gaussians) [4] for both highly collisional kinetic species and fluid entities. We employ the Lemons method [3] for particle-Maxwellian collisions, enhanced with a more careful treatment of low-relative-speed particles. Additionally, we introduce a new scheme that extends the standard TA method for pairing particle-particle binary collisions to accommodate arbitrary particle weights without compromising conservation properties. The method is strictly conservative and may significantly outperform standard MC methods, with orders of magnitude improvement in computational efficiency. We will substantiate the accuracy and performance of the proposed method through several examples of varying complexity, encompassing both relaxation and transport problems. [1] Takizuka, T., Abe, H. Journal of computational physics, (1977) 25: 205-219. [2] Dimits, A. M., Wang, C., Caflisch, R., Cohen, B. I., & Huang, Y. Journal of Computational Physics (2009) 228: 4881-4892. [3] Lemons, D.S., Winske, D., Daughton, W. and Albright, B., Journal of Computational Physics, (2009) 228(5), pp.1391-1403. [4] EchimMM,LemaireJ,&Lie-SvendsenØ.Surveysingeophysics(2011)32:1-70. |
09:15 | Nonlinearly partitioned time integration for multiphysics PRESENTER: Ben Southworth ABSTRACT. In multiscale and multiphysics simulation, a predominant challenge is the accurate coupling of physics of different scales, stiffnesses, and dimensionalities. The underlying problems are usually time dependent, making the time integration scheme a fundamental component of the accuracy. Remarkably, most large-scale multiscale or multiphysics codes use a first-order operator split or (semi-)implicit integration scheme. Such approaches often yield poor accuracy, and can also have poor computational efficiency. There are technical reasons that more advanced and higher order time integration schemes have not been adopted however. One major challenge in realistic multiphysics is the nonlinear coupling of different scales or stiffnesses. Here I present a new class of nonlinearly partitioned Runge-Kutta (NPRK) methods that facilitate high-order integration of arbitrary nonlinear partitions of ODEs. Order conditions for an arbitrary number of partitions are derived via a novel edge-colored rooted-tree analysis. I then demonstrate NPRK methods on novel nonlinearly partitioned formulations of thermal radiative transfer and radiation hydrodynamics, demonstrating orders of magnitude improvement in wallclock time and accuracy compared with current standard (semi-)implicit and operator split approaches, respectively. These examples demonstrate the significant overlap between linear and nonlinear preconditioning with developing nonlinear partitions for use with NPRK methods, as well as the potential computational benefit in using approximations within a nonlinearly partitioned integration scheme rather than to fully resolve implicit equations. |
09:40 | Solving 1D-space collisional electron-Boltzamnn transport equation PRESENTER: Milinda Fernando ABSTRACT. The collisional Boltzmann kinetic equation for low-temperature plasmas finds important applications in the semiconductor industry, materials science, and many other applications in Science and Engineering. We present a framework to solve the 1D-space Boltzmann equation using the Chebyshev collocation method in space and Galerkin approximation in the velocity space. The developed solver supports user-specified multi-term expansion approximation for the electron distribution function with electron-heavy and electron-electron Coulomb interactions. The above extends the traditional state-of-the-art two-term approximation (i.e., distribution function approximation using isotropic and single anisotropic terms) used in the field. In this presentation, we provide details on the implicit scheme developed, convergence studies, and cross-verification studies between the developed solver, and the paticle-in-cell DSMC method. |
08:00 | Stochastic Average Model Methods PRESENTER: Stefan Wild ABSTRACT. We consider iterative methods for finite-sum minimization problems in which the summand functions are computationally expensive, making it undesirable to evaluate all summands on every iteration. We present the idea of stochastic average model methods, which sample component functions according to a probability distribution on component functions that minimizes an upper bound on the variance of the resulting stochastic model. We present promising numerical results on a corresponding extension to the derivative-free model-based trust-region solver POUNDERS. |
08:25 | Low-rank Gradient Flow – a First Order Algorithm for Non-convex Optimization ABSTRACT. We introduce a novel class of first-order methods for unconstrained optimization, called low-rank gradient flows (LRGFs). The idea behind these methods is to construct at every optimization step a low-rank quadratic surrogate for the cost function, followed by an analytic solve for the gradient flow on the surrogate model; the optimization step concludes with a line search on the curve representing the gradient flow. It is shown that the above steps are condensed in a very simple formula for the gradient flow, at a cost per step that is comparable to that of a nonlinear conjugate gradient algorithm. The fact that the line search is conducted along a curve distinguishes LRGF from other first order optimization methods, where the line search is conducted along a search direction, that is, a straight line. This may also help LRGF better navigate the geometry of the cost function, allowing the method to avoid local minima more often than other first-order methods, as shown by numerical experiments. For higher dimensional problems the convergence can be accelerated using a multilevel strategy based on reduced order models. |
08:50 | A Multilevel Method for Self-Concordant Minimization PRESENTER: Nick Tsipinakis ABSTRACT. The analysis of second-order optimization methods based either on sub-sampling, randomization or sketching has two serious shortcomings compared to the conventional Newton method. The first shortcoming is that the analysis of the iterates has been shown to be scale-invariant only under specific assumptions on the problem structure. The second shortfall is that the fast convergence rates of second-order methods have only been established by making assumptions regarding the input data. In this work, we propose a randomized Newton method for self-concordant functions to address both shortfalls. We propose a Self-concordant Iterative-minimization-Galerkin-based Multilevel Algorithm (SIGMA) and establish its super-linear convergence rate using the theory of self-concordant functions. Our analysis is based on the connections between multigrid optimization methods, and the role of coarse-grained or reduced-order models in the computation of search directions. We take advantage of the insights from the analysis to significantly improve the performance of second-order methods in machine learning applications. We report encouraging initial experiments that suggest SIGMA outperforms other state-of-the-art sub-sampled/sketched Newton methods for both medium and large-scale problems. |
10:25 | Solvers and preconditioners enabling flexible IMEX integration of fusion plasmas in COGENT* PRESENTER: Lee Ricketson ABSTRACT. We present progress on the COGENT code, designed to simulate kinetic edge plasmas in tokamaks in complex geometries spanning both sides of the magnetic separatrix. The code is unique both for its high-order, mapped multiblock meshing that allows alignment with both magnetic field lines and flux surfaces, and for its flexible IMEX time integration framework that permits stepping over various stiff timescales. This talk will focus on the latter, discussing our solver and preconditioning strategies for the linear systems arising from the Jacobian-Free Newton Krylov methods COGENT uses to solve the nonlinear systems arising from IMEX discretization. We will highlight preconditioning case studies, including recent progress on an electromagnetic model and treatment of the electrostatic Alfven wave. We will show verification results demonstrating high-order accuracy with multiple implicit terms, and discuss open problems and opportunities for improving robustness and efficiency of these solvers. *This material is based on work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics Program under Contract DE-AC52-07NA27344 at Lawrence Livermore National Laboratory. |
10:50 | Scalable multiphysics block preconditioning of a fully-implicit VMS visco-resistive MHD formulation with application to Magnetic Confinement Fusion PRESENTER: Jesus Bonilla ABSTRACT. A stabilized finite element (FE) discretization of the single fluid low Mach number compressible resistive magnetohydrodynamics (MHD) equations is presented. The resulting method is intended to model macroscopic plasma instabilities and disruptions in complex 3D Tokamak devices used for exploring magnetic confinement fusion (MCF). The stabilized FE description is developed from a variational multiscale stabilization (VMS) approach to handle different shortcomings of the standard Galerkin FE discretization that arise at the regimes of interest. Namely, the resulting VMS FE operators are used for the stabilization of: strongly convective flow effects, the low Mach number nearly incompressible flow limit, and the saddle point structure of the Lagrange multiplier constraint used for enforcing divergence-free magnetic fields. The multiphysics block structure of the Newton linearized discrete system will be described along with the challenges it induces on scalable iterative solvers. The results of several numerical tests in evaluating the effectiveness of the VMS formulation will be presented. These will include computations of a vertical displacement event (VDE) and a few plasma instabilities in realistic ITER geometries. * This work was partially supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics Program. It has also been partially supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research and Office of Fusion Energy Sciences, Scientific Discovery through Advanced Computing (SciDAC) program. |
11:15 | A block preconditioner for resistive MHD simulations PRESENTER: Peter Ohm ABSTRACT. We discuss a block preconditioner for the visco-resistive low Mach number compressible magnetohydrodynamics model. This model requires the solution of the governing partial differential equations (PDEs) describing conservation of mass, momentum, and thermal energy, along with various reduced forms of Maxwell’s equations for the electromagnetic fields. The resulting systems are characterized by strong nonlinear and nonsymmetric coupling of fluid and electromagnetic phenomena, as well as the significant range of time- and length-scales that these interactions produce. These characteristics make scalable and efficient iterative solution, of the resulting poorly-conditioned discrete systems, extremely difficult. The block preconditioner considered here is based on an approximate operator splitting approach which can isolate certain coupled systems, allowing them to be handled independently. Here we use the splitting to create two independent 2x2 block systems, a magnetics-flow system and a magnetics-constraint system. Targeting ARM architectures, the supercomputer Fugaku in particular, we investigate various preconditioner reuse options to reduce setup costs. We demonstrate this approach for various resistive MHD problems that are relevant to magnetic confinement fusion applications. |
11:40 | An energy minimization approach to H-curl AMG PRESENTER: Raymond Tuminaro ABSTRACT. A new AMG algorithm is devised for H-curl electro-magnetics problems discretized via 1st order edge elements. The key feature of the algorithm is the construction of interpolation operators that satisfy a commuting relationship originally highlighted by Reitzinger and Schoberl. Specifically, interpolation of coarse nodal quantities followed by application of an appropriate discrete gradient operator should be equivalent to first applying the discrete gradient on the coarse level and then interpolating the resulting edge quantities. It is possible to show that satisfaction of t his commuting relationship guarantees a coarse gradient operator lies within the null space of the coarse level curl-curl matrix, which is obtained by Galerkin projection using the edge interpolation operator. The new algorithm leverages an already-computed nodal interpolation operator to then define an edge interpolant that satisfies commuting and energy minimization properties. Specifically, the method is based on an adaptation of energy minimization AMG ideas. This new algorithm can be viewed as a generalization of earlier ideas by Reitzinger and Schoberl, which was restricted to leveraging a somewhat subpar constant nodal interpolation operator. The new algorithm can essentially start with any standard nodal interpolation operator. Thus, it is possible to leverage traditional AMG schemes to first generate nodal interpolation operators. We show that the new algorithm can produce "ideal geometric" edge interpolants in some limited cases. Numerical results are provided showing the overall efficacy, comparing with traditional Reitzinger and Schoberl schemes and with traditional auxiliary-based AMG methods. |
12:05 | A Newton-based Grad-Shafranov solver for tokamak equilibrium PRESENTER: Qi Tang ABSTRACT. Equilibriums in magnetic confinement devices result from force balancing between the Lorentz force and the plasma pressure gradient. In an axisymmetric configuration like a tokamak, such an equilibrium is described by an elliptic equation for the poloidal magnetic flux, commonly known as the Grad-Shafranov equation. It is challenging to develop a scalable and accurate Grad-Shafranov solver, since it is a fully nonlinear optimization problem with a free boundary. In this work, we develop a Newton-based Grad-Shafranov solver based on adaptive finite elements and physics-based preconditioning. The free-boundary interaction leads to the evaluation of a domain-dependent nonlinear form of which its contribution to the Jacobian matrix is achieved through shape calculus. The optimization problem aims to minimize the distance between the plasma boundary and specified control points while satisfying two non-trivial constraints, corresponding to the nonlinear finite element discretization of the Grad-Shafranov equation and a constraint on the total plasma current involving a nonlocal coupling term. The linear system is solved by a block factorization and physics-based preconditioning, which relies on a reformulation using a matrix identity to overcome the nonlocal coupling. Finally, AMG is called for subblock elliptic operators. The unique contributions of this work include the treatment of a global constraint, physics-based preconditioning, nonlocal reformulation, and the implementation of adaptive high-order finite elements. It is found that the resulting Newton solver is robust, successfully addressing the challenging case to find a Taylor-state equilibrium where conventional Picard-based solvers fail to converge. |
10:25 | Improving Multilevel Monte Carlo Scaling with Level-Dependent Redistribution PRESENTER: Hillary Fairbanks ABSTRACT. In this work we develop multilevel methods -- building from multigrid and multilevel Monte Carlo (MLMC) -- to accelerate uncertainty quantification (UQ) for partial differential equations with random coefficients in large-scale settings. In algebraic multigrid, the fine grid mesh is distributed and then locally coarsened to form the multigrid hierarchy. In MLMC, simulations at all levels in this hierarchy are utilized to accelerate the estimation of statistical moments. As these approaches move to large-scale settings, it becomes necessary to exploit multiple forms of parallelism. In MLMC parallel simulations at the coarse level are commonly accomplished through job scheduling, as there are too many cores over which to distribute the coarse mesh. In this work, we avoid job scheduling for MLMC, and instead develop approaches that can extend the multilevel hierarchy by distributing coarse levels to a reduced number of cores within the same job. To do this, we develop a level-dependent redistribution technique using parallel sparse-matrix operations, allowing us to augment our multilevel hierarchy with additional coarse levels. By doing so, we optimize the coarse level scaling and reduce the overall CPU time of MLMC. In this presentation, we will provide a high-level overview of our PDE-based random coefficient sampling approach used for MLMC, the developed redistribution method, and provide numerical examples to show improvements in coarse grid scaling. |
10:50 | Faster solution of linear Bayesian smoothing problems using model reduction for ensemble Kalman inversion PRESENTER: Pavlos Stavrinides ABSTRACT. We consider the Bayesian smoothing problem of inferring the initial state of a linear dynamical system, given noisy linear output measurements after the initial time. The ensemble Kalman inversion (EKI) method is an adjoint-free iterative method for estimating the posterior distribution of this inference problem. However, accuracy of EKI depends on having a large ensemble of particles, where each particle requires evolving the dynamical system. When the system is high-dimensional, the cost per particle is high, leading to EKI either having prohibitive cost or high sampling error. In this work, we use Balanced truncation for Bayesian smoothing (BTBS) to accelerate solution of the smoothing problem via EKI. BTBS is a model reduction method that adapts balanced truncation, a system-theoretic projection-based model reduction method, to the Bayesian smoothing problem. Numerical results show that reduced EKI models achieve the same accuracy as the full EKI algorithm with multiple-orders-of-magnitude reduction in computational cost. |
11:15 | Multigrid Monte Carlo revisited: fast sampling of Gaussian Random Fields PRESENTER: Eike Mueller ABSTRACT. The fast simulation of Gaussian random fields plays a pivotal role in many research areas such as Uncertainty Quantification, data science and spatial statistics. In theory, it is a well understood and solved problem, but in practice the efficiency and performance of traditional sampling procedures degenerates quickly when the random field is discretised on a grid with spatial resolution going to zero. Most existing algorithms, such as Cholesky factorisation samplers, do not scale well on large-scale parallel computers. On the other hand, stationary, iterative approaches such as the Gibbs sampler become extremely inefficient at high grid resolution. Already in the late 1980s, Goodman, Sokal and their collaborators wrote a series of papers aimed at accelerating the Gibbs sampler using multigrid ideas. The key observation is the intricate connection of random samplers, such as the Gibbs method, with iterative solvers for linear systems: both methods have very similar convergence properties. Based on this observation, they proposed the so-called multigrid Monte Carlo (MGMC) method - a random analogue of the multigrid method for solving discretised partial differential equations. We revisit the MGMC algorithm and provide rigorous theoretical justifications for the optimal scalability of the method for large scale problems: we show that the cost for generating an independent sample grows linearly with the problem size. Our analysis also includes the important setting of a Gaussian random field conditioned on noisy data via a Bayesian approach; in this case, the precision matrix is a finite-rank perturbation of some differential operator. By using a bespoke variant of the Gibbs sampler with a low-rank update on each level of the multigrid hierarchy we are able to generate Markov chains with a fixed, grid-independent integrated autocorrelation time. Our theoretical analysis is confirmed by numerical experiments for sampling rough Gaussian random fields in two and three dimensions. In the latter case and at high resolution MGMC is able to produce independent samples at a faster rate than a Cholesky sampler based on the CHOLMOD and Eigen libraries. |
11:40 | Accelerating Multi-Level Markov Chain Monte Carlo Methods Using Machine Learning Models PRESENTER: Sohail Reddy ABSTRACT. Uncertainty quantification (UQ) studies, for example Bayesian inversion by Markov Chain Monte Carlo (MCMC), for high-risk applications typically require accurate modeling of the system dynamics which are often modelled as a system of partial differential equations (PDEs) and solved numerically. These high-fidelity simulations require fine-scale discretization of the spatial and temporal domain, and hence are often computationally expensive which renders their use in UQ studies intractable. Hence, machine learning models are often used to reduce the computing cost at a loss of accuracy. However, training/constructing such machine learning models requires a dataset whose size grows with the dimensionality of the problem, thereby limiting their use to coarse-grained discretizations. In such cases, multi-level/multi-fidelity methods, where low-fidelity models augment or are used as filters to the fine-scale, high-fidelity simulations, offer a viable alternative. A natural choice for hierarchy of varying fidelity models for multi-level sampling is through the (geometric) multigrid hierarchy used to accelerate the solution of large-scale problems. The coarsest level in the hierarchy can be further extended using a machine learning surrogate, since training such models for small-scale problems is more feasible. Using such a surrogate-augmented multigrid hierarchy, we develop an accelerated, multi-level Markov Chain Monte Carlo (multi-level MCMC) approach with application to large scale problems in groundwater (Darcy) flow which is used to estimate the posterior distribution and statistical moments of the permeability field and mass flux (quantity-of-interest). We couple the surrogate model and the PDE model on the coarsest level using a surrogate-transition delayed acceptance MCMC scheme and demonstrate a substantial increase in the coarse-level acceptance rate. We present an efficient, scalable approach for hierarchical, conditioned sampling of Gaussian Random Fields (GRF), which serves as a parameterization of the permeability field. We present results for cases with varying number of level in the hierarchy and compare the multi-level approach to the traditional single-level MCMC. We also compare our surrogate-augmented multi-level MCMC approach to the standard PDE-based multi-level approach and demonstrate a significant speed up and increase in acceptance rate over the PDE-based approach while maintaining comparable accuracy. Finally, we prove that our machine learning accelerated approach yields a consistent MCMC algorithm that preserves detailed balance and ergodicity. |
16:30 | New Developments in the AMLI-cycle Method ABSTRACT. The algebraic multilevel iteration (AMLI) cycle is a variant of the multilevel cycle that uses Chebyshev polynomials to define the coarse-level solver. Other types of polynomials, such as the best approximation of 1/x polynomial, can also be used in the AMLI-cycle method. The AMLI-cycle method was proposed to remedy the degeneration of the standard V- or W-cycle for difficulty problems. However, it is not widely used in practice since it requires accurate estimations of the extreme eigenvalues, and its implementation could be a bit involved. In this talk, motivated by momentum acceleration, we propose new polynomials that can be used in the AMLI-cycle method without estimating the extreme eigenvalues and prove that the resulting AMLI-cycle can be as good as the AMLI-cycle using the Chebyshev polynomials. In addition, the implementation is more straightforward, which makes it easy to use in practice. Finally, the numerical experiments validate our theoretical results. This is a joint work with Chunyan Niu and and Yunhui He. |
16:55 | Toward an algebraic multigrid method for the indefinite Helmholtz equation PRESENTER: Clément Richefort ABSTRACT. It is well known that multigrid methods are very competitive in solving a wide range of SPD problems. However achieving such performance for non-SPD matrices remains an open problem. In particular, three main issues may arise when solving a Helmholtz problem. Some eigenvalues become negative or even complex, requiring the choice of an adapted smoother for capturing them, and because the near-kernel space is oscillatory, the geometric smoothness assumption cannot be used to build efficient interpolation rules. Moreover, the coarse correction is not equivalent to a projection method since the indefinite matrix does not define a norm. We present some investigations about designing a method that converges in a constant number of iterations with respect to the wavenumber. The method builds on an ideal reduction-based framework and related theory for SPD matrices to correct an initial least squares minimization coarse selection operator formed from a set of smoothed random vectors. A new coarse correction is proposed to minimize the residual in an appropriate norm for indefinite problems. |
17:20 | Adaptive Composite AMG Solvers Utilizing Graph Modularity Coarsening PRESENTER: Austen Nelson ABSTRACT. We provide a theoretical justification for the construction of adaptive composite solvers based on a sequence of AMG (algebraic multigrid) μ-cycle methods that exploit error components that the current solver cannot damp efficiently. Each component is an aggregation based AMG where the aggregates are constructed using the popular in graph community detection modularity matrix. The latter utilizes the given matrix and the error component vector the current solver cannot handle. The performance of the resulting adaptive composite solver is illustrated on a set of difficult sparse matrices both from discretized PDEs and with more general nature. |
17:45 | Generalized Optimal AMG Convergence Theory for Nonsymmetric and Indefinite Problems PRESENTER: Ahsan Ali ABSTRACT. Algebraic multigrid (AMG) is known to be an effective solver for many sparse symmetric positive definite (SPD) linear systems. For SPD systems, convergence theory of AMG is well-understood in terms of the $A$-norm but in a nonsymmetric setting such an energy norm is non-existent. For this reason, convergence of AMG for nonsymmetric system of equations remain an open area of research. Existing nonsymmetric AMG algorithms for this setting mostly rely on heuristics motivated by SPD convergence theory. In the SPD setting, classical form of optimal AMG interpolation provides a useful insight in determining two grid convergence rate of the method. In this work, we discuss a generalization of the optimal AMG convergence theory targeting nonsymmetric problems by constructing a $2\times 2$ block symmetric indefinite system so that the Petrov–Galerkin AMG process for the nonsymmetric matrix $A$ can be recast as a Galerkin AMG process for a symmetric system. We show that using this generalization of the optimal interpolation theory, one can obtain same identity on the convergence rate of the resulting two-grid method derived in the SPD setting. We also provide supporting numerical results for the nonsymmetric advection-diffusion problems. |
18:10 | Semi-structured algebraic multigrid (SSAMG) PRESENTER: Wayne Mitchell ABSTRACT. The semi-structured algebraic multigrid (SSAMG) algorithm implemented in hypre is designed to solve scalar PDEs discretized on meshes with structured regions that may be arbitrarily coupled together. SSAMG exploits the structure to achieve high performance while retaining robustness by employing algebraic multigrid principles to handle the unstructured couplings. This talk will cover the implementation difficulties associated with SSAMG, recent work to improve the interpolation operators used in SSAMG, and the performance of SSAMG on problems coming from a radiation-diffusion application. |
16:30 | Stability Analysis of Two-Level Nonlinear Acceleration Method for the Linear Boltzmann Transport Equation ABSTRACT. Mathematical models of particle transport in a physical system are based on the Boltzmann transport equation (BTE). The BTE is applied for modeling neutron transport in nuclear reactors, high-energy density physics, astrophysics phenomena, dynamics of plasma, propagation of photons in atmosphere of planets, climate and global atmospheric phenomena. It is an integro-differential equation for the particle distribution function in the phase space and time. The differential operator is hyperbolic. The integral term describes redistribution of particles in angle and energy due to their collisions and involves integration over the whole phase space. Iteration methods are used to solve the BTE. Basic iteration techniques can converge very slowly in a large class of important problems. A family of iteration methods for the linear BTE consists of two main groups: (i) acceleration methods for additive corrections and (ii) iterative methods formulated specifically for the solution. Efficient two-level iterative (TLI) methods are defined by a system of the high-order BTE and low-order equations for moments of the transport solution (distribution function). The additive correction TLI methods are stable provided that spatial discretization of low-order equations is algebraically consistent with a scheme for the BTE. On the contrary, the TLI methods for solution can be unconditionally stable when the BTE and low-order equations are discretized independently. The methods with independent discretization are not purely acceleration methods. The solutions of high-order transport and low-order equations differ by truncation error on finite spatial meshes. Two grid functions of solution tend to each other with mesh refinement. Independent discretization provides flexibility in solving the BTE and gives significant advantage in many classes of transport problems. Acceleration TLI methods based on a system of the BTE and low-order equations discretized independently are of significant interest for variety of application. This talk presents a Fourier analysis of a nonlinear acceleration TLI method with independent discretization. The iterative method applies elements of the macro-track transport acceleration method and the Quasidiffusion (VEF) method developed for reactor physics and high-energy density physics applications. The system of low-order equations is formulated by a projection operator approach and consists of two parts: (1) the low-order problem that evaluates a measure of discrepancy between a discrete low-order equation and discretized high-order transport equation, (2) the low-order problem that generates data for prolongation operator using the obtained grid function of the discrepancy measure. The projection and prolongation operators are defined in a way to formulate a pure acceleration method. Exact moment closures are defined to close the nonlinear system of the high-order BTE and low-order equations. In this study, one-group particle transport problems in 1D slab geometry are considered. The Fourier analysis is applied to the linearized equations of the iterative method in the vicinity of the solution in an infinite medium. Numerical results demonstrating performance of the TLI algorithm are presented. |
16:55 | A Data Driven, Eigenvalue Based Adaptive Time Stepping Method for Nonlinear Physics Simulation PRESENTER: Ethan Smith ABSTRACT. We have developed a novel, data driven approach to adaptive time stepping for nonlinear physics simulations. This iterative method uses a variant of the Dynamic Mode Decomposition that permits a variable time step to compute a best approximate linear operator which evolves the system. This process requires at each step computing the SVD of a snapshot data matrix, the size of which is controlled using a user-defined sliding time window to control the size of the data matrix and optionally through the use of sampling methods to further reduce computation costs. In this way, we can estimate the largest, most restrictive eigenvalue present in the underlying system without knowing anything about the physics involved and define a timestep smaller than the inverse of this eigenvalue. This way we can guarantee the solver advances in time with respect to the time scale of the most restrictive eigenvalue, but now we gain the advantage of taking large steps when the underlying systems will tolerate them. We will demonstrate the benefits and tradeoffs of using such a methodology on several nonlinear systems of engineering interest. |
17:20 | Dynamic and Deterministic Neutron Transport on GPUs using One-cell Inversion PRESENTER: Joanna Piper Morgan ABSTRACT. Finding deterministic solutions to the transient, energy-dependent, discrete ordinance (Sn) neutron transport equation typically requires iterative schemes to treat the scattering and fission source terms. We explore the performance of the one-cell inversion iteration scheme, also called parallel-block Jacobi, on graphics processing units (GPU) and make comparisons to a source iteration scheme, also called transport sweeps or Richardson iterations. We consider a higher-order discretization method with simple corner balance (in space) and multiple balance (in time) to add more work to the threads and improve accuracy. We have previously shown that this discretization scheme is second-order accurate. We also use Gauss–Legendre quadrature to discretize the integral component of the Sn transport equation. One-cell inversion uses a single linear-algebra step to compute all angular fluxes (which describe neutron flow in space and time over energy and direction of travel) within a cell, assuming that the angular fluxes incident on the surfaces of the cell are known from a previous iteration. This is done by forming a dense linear system in every cell, which scales with the number of angles in a quadrature set times the number of groups in energy. Within an iteration these systems are solved partially independently of one another, then information about a cell’s nearest neighbor is communicated between iterations. This makes the scheme readily parallelizable over the number of cells, and ideal for use with batched linear algebra solvers, commonly implemented by off-the-shelf vendor libraries. On the other hand, source iteration allows the scattering source to lag while solving angular flux in every ordinate via transport sweeps through the spatial domain. This scheme is readily parallelized over the number of angles in a quadrature set, as the source term is known from the previous iteration, allowing the angular flux in each ordinate to be computed independently. This also makes the problem non-local: reading and writing information across the whole problem space within an iteration. In GPU computations, this can dramatically decrease performance. Also, parallelizing over the dimension with the most degrees of freedom will often lead to the best performance on many-core systems. Neutron transport problems often have many more cells than angles in a quadrature set. These two schemes differ in both the number of iterations required to converge to the same tolerance and the time it takes to compute each iteration. Because there is no communication of information between cells within an iteration, one-cell inversion can require more iterations to converge for some problems. Specifically, as cellular optical thickness decreases, one-cell inversion’s relative performance degrades compared with source iteration. However, each iteration runs faster on a GPU, so the solution may converge in less time. We show in mono-energetic problems that, despite one-cell inversion requiring additional iterations to converge, those iterations are computationally faster, leading to a significant overall speedup over source iteration for quadrature sets of up to 128 angles (where there are many more cells then angles). In multi-group problems, we show that one-cell inversion on the GPU is significantly faster than implementations on CPU, and we are currently comparing performance to source iteration on the GPU. Potential future work includes exploring synthetic acceleration methods, preconditioners (including multi-grid methods for energy dependent problems), and higher spatial dimension implementations for one-cell inversion iterative scheme. These developments may further decrease the complexity of GPU-enabled, transient, energy-dependent, deterministic neutron-transport applications while also potentially, decreasing compute times. |
17:45 | Decoupled Diffusion Synthetic Acceleration for Deterministic Neutral Particle Transport PRESENTER: Joseph Coale ABSTRACT. Deterministic neutral particle transport typically involves an iterative solution of the discretized Boltzmann transport equation (BTE). The particle energy variable is discretized with a multigroup approximation along with a discrete-ordinates discretization of the particle direction, together with a variety of spatial discretizations. When a problem is dominated by scattering interactions, the so-called diffusion regime, the iterative convergence rate can be unacceptably slow. A well-known and very effective technique that can be used to improve convergence in such situations is the diffusion synthetic acceleration (DSA) method. It works by attenuating the low-frequency error modes that dominate the convergence of unaccelerated fixed-point iterations. Those modes are linear in angle, which, for neutral particle transport, means the diffusion approximation will be effective in attenuating the slowly converging error modes. Of course, DSA increases the per-iteration computational cost. This could be an issue when there is group-to-group scattering present in the neutral-particle interaction data (nuclear cross-sections), in which case the diffusion equations are coupled across energy groups. This means that either a full space-energy matrix must be solved or that the coupling terms are moved to the right-hand side and the diffusion equations must then be solved in its own inner-outer iteration. Either way, in large scale problems, the cost of solving the fully coupled DSA equations could potentially outweigh the improvement in convergence rate. A new DSA method has been developed to decouple the energy-dependent diffusion equations such that the diffusion equation for each energy group can be solved independently while still maintaining the rapid and theoretically optimal convergence rate of the fully coupled DSA method. The new method, decoupled-DSA (DDSA), makes use of a change of variables based on a diagonalization of the group-to-group scattering operator defined for each material in a problem. It is important to note that the change of variables introduces coupling among energy groups in the diffusion equation boundary conditions, but techniques are introduced that indicate some or all of the coupling can be ignored without a significant adverse impact on acceleration effectiveness. A long established and well-known concern associated with the DSA method in general is spatial discretization “consistency” between the diffusion equations and the BTE. Lack of consistency will result in reduction in effectiveness, while consistent discretizations will result in an optimal convergence rate. Solving the diffusion equations in first order (or “P1”) form can allow certain spatial discretizations to be used that are consistent or nearly consistent with higher-order or more robust BTE spatial discretizations. A similar change of variables technique is applied to the first order form of the energy-dependent diffusion equations and associated boundary conditions to obtain the theoretically optimal convergence rate. Fourier analysis and numerical experiments confirm and illustrate the rapid rate of convergence associated with the less costly DDSA method compared to the standard fully coupled DSA approach. Los Alamos report LA-UR-23-33430. This work was supported by the U.S. Department of Energy through the Los Alamos National Laboratory. Los Alamos National Laboratory is operated by Triad National Security, LLC, for the National Nuclear Security Administration of U.S. Department of Energy (Contract No. 89233218CNA000001). The content of the information does not necessarily reflect the position or the policy of the federal government, and no official endorsement should be inferred. |