View: session overviewtalk overview

09:00 | GPMR: An Iterative Method for Unsymmetric Partitioned Linear Systems PRESENTER: Alexis Montoison ABSTRACT. We introduce an iterative method named GPMR for solving 2x2 block unsymmetric linear systems. GPMR is based on a new process that simultaneously reduces two rectangular matrices to upper Hessenberg form and is closely related to the block-Arnoldi process. We compare the performance of GPMR with GMRES on linear systems from the SuiteSparse Matrix Collection. In our experiments, GPMR terminates significantly earlier than GMRES on a residual-based stopping condition with an improvement ranging from around 10% up to 50% in terms of number of iterations. |

09:25 | Convergence of GMRES with respect to the right-hand-side vector PRESENTER: Shikhar Shah ABSTRACT. The generalized minimum residual method (GMRES) has a rich body of convergence theory related to the spectrum and normality of the coefficient matrix. In preconditioning, GMRES can be used to perform linear solves with the preconditioning matrices (e.g., the incomplete LU factors). Here, the GMRES polynomial must be fixed between preconditioning iterations. In this case, the convergence of GMRES can have strong dependence on the right-hand-side vector. Specifically, the locations of the roots of the GMRES polynomial with respect to the pseudospectrum of the preconditioning matrix affects the quality of the linear solve. We characterize this dependence and employ it to choose a robust preconditioner. |

09:50 | Linear Asymptotic Convergence Analysis of Anderson Acceleration, with Krylov Formulation in the Linear Case PRESENTER: Hans De Sterck ABSTRACT. We consider Anderson acceleration (AA) with a moving window of size m, and investigate the linear asymptotic convergence behaviour of AA(m) applied to linear and nonlinear fixed-point methods. Anderson acceleration has been shown empirically to be very effective for accelerating nonlinear solvers and optimization methods, with broad applications in computational science, data analysis and machine learning, but there is no theory to explain and quantify the asymptotic convergence improvement that is observed in practice. We first observe numerically that the root-linear convergence factor of sequences generated by AA(m) strongly depends on the initial condition, and that the acceleration coefficients oscillate while the approximation converges to the fixed point. To shed light on this behaviour, we write AA(m) itself as an augmented fixed-point method and establish that the iteration function of the AA(m) fixed-point iteration is not differentiable at the fixed point (but the directional derivatives exist). This allows the root-linear convergence factor to be strongly dependent on the initial guess. We also find that the acceleration coefficient function is not continuous, thus allowing the coefficients to oscillate while the approximation converges. To further investigate AA(m) convergence, we consider the case of accelerating linear fixed-point methods and write AA(m) with the usual recursive initial guess process as a Krylov space method. We obtain polynomial residual update formulas for AA(m) and derive an (m+2)-term recurrence relation for the AA(m) polynomials. A direct consequence is that k steps of AA(m) cannot produce a residual that is smaller in the 2-norm than the residual obtained by GMRES(k). AA(m) with m arbitrary initial guesses is a multi-Krylov space method. The recurrence relations also reveal that AA(m) possesses a memory effect: as a result of the windowing in AA(m), the damping effect provided by fixed-point iterations accumulates multiplicatively (contrary to the case of restarted GMRES(m)), and is combined with polynomial acceleration to lead to effective convergence. We further derive orthogonality relations for the AA(m) residuals, and for AA(1) we find a lower bound for the acceleration coefficient. Results are also obtained for the influence of the initial guess on the convergence speed of AA(1). While these results reveal many interesting findings about asymptotic convergence of AA(m), the main question to quantify the asymptotic convergence acceleration provided by AA(m) remains an open problem. Specifically, we now know numerically that the root-linear convergence factor of sequences generated by AA(m) strongly depends on the initial condition and for many problems the worst-case convergence factor over all initial guesses is much smaller than the convergence factor of the initial fixed-point method, but we don't know how to compute this worst-case convergence factor. |

10:15 | GCR equivalent multiple right-hand side GMRES PRESENTER: Bulat Valiakhmetov ABSTRACT. In this paper we present a GMRES-like iterative Krylov subspace method for solving a system of linear equations with a non-symmetric matrix and multiple right-hand sides. It minimizes residual norms and in exact arithmetic is equivalent to a GCR method. Unlike GCR, the new method requires only one basis storage. Also, as it does not use a blockwise matrix-vector product, it can be applied to sequentially accessed right-hand sides. We provide a detailed algorithm description and its template implementation in C++ with MPI and OpenMP parallelism. Several numerical experiments and a comparison with a GCR implementation are presented as well. |

10:40 | Post-Modern PM-GMRES PRESENTER: Stephen Thomas ABSTRACT. The GMRES algorithm of Saad and Schultz (1986) for nonsymmetric linear systems relies on the Arnoldi expansion for the Krylov basis. The algorithm computes the $QR$ factorization of the matrix $B = [\: \vect{r}_0, AV_m\:]$. Despite an ${\cal O}(\eps)\kappa(B)$ loss of orthogonality, the modified Gram-Schmidt (MGS) formulation was shown to be backward stable in the seminal papers by Paige, et al. (2006) and Paige and Strako\v{s} (2002). Classical Gram-Schmidt (CGS) exhibits an ${\cal O}(\eps)\kappa^2(B)$ loss of orthogonality, whereas DCGS-2 (CGS with delayed reorthogonalization) reduces this to ${\cal O}(\eps)$ in practice (without a formal proof). We present a post-modern (viz not classical) GMRES algorithm based on Ruhe (1983) and the low-synch algorithms of Swirydowicz et al (2020) that achieves ${\cal O}(\eps) \|Av_k \|_2 / h_{k+1,k}$ loss of orthogonality. By projecting the vector $Av_m$ with Gauss-Seidel onto the orthogonal complement of the space spanned by the computed Krylov vectors $\bv_m$ where $\bv_m^T\bv_m = I + L_m + L_m^T$, we can further demonstrate that the loss of orthogonality closely follows ${\cal O}(\eps)$. For a broad class of matrices, unlike MGS-GMRES, significant loss of orthogonality does not occur and the relative residual no longer stagnates for highly non-normal systems. The Krylov vectors remain linearly independent and the smallest singular value of $\tv_m$ is close to one. We also demonstrate that Henrici's departure from normality of the lower triangular matrix $T_m \approx (\:V_m^TV_m\:)^{-1}$ in the Gram-Schmidt projector $P = I - V_mT_mV_m^T$ is an appropriate quantity for detecting the loss of orthogonality. |

09:00 | An adaptive scalable fully implicit algorithm based on stabilized finite element for reduced visco-resistive MHD PRESENTER: Qi Tang ABSTRACT. The magnetohydrodynamics (MHD) equations are continuum models used in the study of a wide range of plasma physics systems, including the evolution of complex plasma dynamics in tokamak disruptions. However, efficient numerical solution methods for MHD are extremely challenging due to disparate time and length scales, strong hyperbolic phenomena, and nonlinearity. Therefore the development of scalable, implicit MHD algorithms and high-resolution adaptive mesh refinement strategies is of considerable importance. In this work, we develop a high-order stabilized finite-element algorithm for the reduced visco-resistive MHD equations based on the MFEM finite element library (mfem.org). The scheme is fully implicit, solved with the Jacobian-free Newton-Krylov (JFNK) method with a physics-based preconditioning strategy. Our preconditioning strategy is a generalization of the physics-based preconditioning methods in [Chac ́on, et al, JCP 2002] to adaptive, stabilized finite elements. Algebraic multigrid methods are used to invert sub-block operators to achieve scalability. A parallel adaptive mesh refinement scheme with dynamic load-balancing is implemented to efficiently resolve the multi-scale spatial features of the system. Our implementation uses the MFEM framework, which provides arbitrary-order polynomials and flexible adaptive conforming and non- conforming meshes capabilities. Results demonstrate the accuracy, efficiency, and scalability of the implicit scheme in the presence of large scale disparity. The potential of the AMR approach is demonstrated on an island coalescence problem in the high Lundquist-number regime (>=1e7) with the successful resolution of plasmoid instabilities and thin current sheets. |

09:25 | Augmented Lagrangian preconditioners for incompressible resistive magnetohydrodynamics PRESENTER: Patrick Farrell ABSTRACT. The equations of magnetohydrodynamics are generally known to be difficult to solve numerically. They are highly nonlinear and exhibit strong coupling between the electromagnetic and hydrodynamic variables, especially for high Reynolds and coupling numbers. In this work, we present a scalable augmented Lagrangian preconditioner for a finite element discretization of the single-fluid incompressible viscoresistive MHD equations. For stationary problems, our solver achieves robust performance with respect to the Reynolds and coupling numbers in two dimensions and good results in three dimensions. We extend our method to fully implicit methods for time-dependent problems which we solve robustly in both two and three dimensions. Our approach relies on specialized parameter-robust multigrid methods for the hydrodynamic and electromagnetic blocks. The scheme ensures exactly divergence-free approximations of both the velocity and the magnetic field up to solver tolerances. We confirm the robustness of our solver by numerical experiments in which we consider fluid and magnetic Reynolds numbers and coupling numbers up to 10,000 for stationary problems and up to 100,000 for transient problems in two and three dimensions. |

09:50 | Block Preconditioning and a Monolithic AMG Method for Magnetic Confinement Fusion Relevant Resistive MHD Simulations PRESENTER: Peter Ohm ABSTRACT. The mathematical basis for the continuum fluid modeling of resistive magnetohydrodynamic of multifluid plasma physics systems is the solution of the governing partial differential equations (PDEs) describing conservation of mass, momentum, and thermal energy, along with various reduced forms of Maxwell’s equations for the electromagnetic fields. The resulting systems are characterized by strong nonlinear and nonsymmetric coupling of fluid and electromagnetic phenomena, as well as the significant range of time- and length-scales that these interactions produce. These characteristics make scalable and efficient iterative solution, of the resulting poorly-conditioned discrete systems, extremely difficult. In this talk we consider the use of both block preconditioners and an algebraic monolithic multigrid approach for solving the coupled physics block systems. Monolithic multigrid methods can also benefit from this block linear structure. In this context we present a framework for the construction of an algebraic monolithic multigrid utilizing the natural block linear structure arising from coupled multiphysics problems. Multigrid components are constructed first on matrix subblocks corresponding to individual physics. The resulting AMG sub-components are then composed together to define a monolithic AMG preconditioner. We demonstrate this approach through an implementation in MueLu for various Resistive MHD problems that that are relevant to magnetic confinement fusion applications and compare the performance with alternative preconditioning methods. Time permitting, we will discuss a block preconditioner based on an approximate operator splitting that factors the system into block sub-systems, with approximate Schur complements that explicitly encode critical coupling into the sub-systems. In the case of MHD this is the Alfven wave physics. Early results indicate a better scaling of iterations for the linear solves with longer time steps compared to the Alfven time-scale. |

10:15 | Beyond radiation-hydrodynamics: Coupling kinetic plasma and radiation transport PRESENTER: Hans Hammer ABSTRACT. Radiation-hydrodynamics (rad-hydro) is insufficient to describe accurately weakly collisional environments in high-energy-density physics (HEDP) experiments. These environments are common, for instance, in the hohlraum in inertial confinement fusion (ICF) indirectly-driven experiments, which is a key component of the energy delivery system to the capsule that controls, for instance, implosion symmetry. For high-fidelity, a kinetic description of the plasma is necessary, which additionally must be coupled with a (in principle, also kinetic) radiation model. We consider a hybrid plasma description [1], comprised of kinetic ions and fluid electrons, to be coupled with thermal radiative transfer fully implicitly and nonlinearly. Tight coupling is orchestrated via a low-order (LO) moment description for all involved physics: ions, electrons, and radiation. This LO system (which is largely equivalent to a multifluid rad-hydro system) is informed by the high-order (HO) kinetic descriptions via so-called consistency terms, which close the moment equations and enforce consistency with the fully kinetic solution. This HOLO approach is algorithmically efficient [2] because the stiff nonlinear coupling is addressed in the (computationally cheaper) LO solver, while allowing the computationally expensive HO solvers to decouple from one another. We present here results from the coupling of the hybrid-kinetic plasma model with a gray radiation-diffusion model (i.e., neglecting kinetic radiation physics) [3]. Our implementation features a novel mesh-motion strategy for the coupled plasma-radiation system [4] that is able to effectively adapt to solution features dynamically in time. We demonstrate that our implementation is able to deal with strong radiative shocks (Mach 45), delivering excellent agreement with a self-similar rad-hydro shock test problem. We will characterize the efficiency of the solver, as well as convergence of the solution with timestep and mesh refinement. [1] W. T. Taitano et al., Comput. Phys. Comm., 263, 107861 (2021) [2] L. Chacón et al. J. Comput. Phys., 330, 21-25 (2017) [3] Hans Hammer et al., Proceedings of the ANS Mathematics & Computation (M&C) 2021, Raleigh, North Carolina, October 3–7, 2021, pp. 1153-1162 [4] Hans Hammer et al., Transactions of the American Nuclear Society. ANS Winter Meeting. Washington, D.C.: ANS, Nov. 17–21, 2019. |

10:40 | Monolithic Multigrid Methods for Higher-Order Discretizations of Time-Dependent Maxwell’s Equations PRESENTER: Razan Abu-Labdeh ABSTRACT. Maxwell’s equations arise in many challenging applications in computational science. Several paths exist to achieve high-order discretizations, including the use of specialized finite-element basis (such as Raviart-Thomas, Brezzi-Douglas-Marini, and both first- and second-kind Nédélec elements) and high-order implicit temporal integration schemes. While significant effort has been invested in the past 25 years in developing efficient multigrid methods for various spatial discretizations, much of the work in the time-dependent case has been focused on multi-step (or diagonally implicit) temporal discretizations. In this talk, we present recent work on extending monolithic multigrid methods to fully implicit Runge-Kutta temporal discretizations. Particular focus is paid to extending the common overlapping Schwarz relaxation strategy to these discretizations, as well as their use on non-nested multigrid hierarchies, as needed to accurately model complex geometries. |

09:00 | From challenges to edge-preserving methods for large-scale dynamic inverse problems PRESENTER: Mirjeta Pasha ABSTRACT. Inverse problems are ubiquitous in many scientific fields such as engineering, biology, medical imaging, atmospheric science, and geophysics. Three emerging challenges on obtaining meaningful solutions to large-scale and data-intensive inverse problems are ill-posedness of the problem, large dimensionality of the parameters, and the complexity of the model constraints. In this talk we discuss efficient methods for computing solutions to dynamic inverse problems, where both the quantities of interest and the forward operator (measurement process) may change at different time instances. We consider large-scale ill-posed problems that are made more challenging by their dynamic nature and, possibly, by the limited amount of available data per measurement step. To remedy these difficulties, we apply efficient regularization methods that enforce simultaneous regularization in space and time (such as edge enhancement at each time instant and proximity at consecutive time instants) and achieve this with low computational cost and enhanced accuracy. Numerical examples from a wide range of applications, such as limited-angle computerized tomography (CT), space-time image deblurring, and photoacoustic tomography (PAT), will be used to illustrate the effectiveness of the described approaches. |

09:25 | An ℓp Variable Projection Method for Large-Scale Separable Nonlinear Inverse Problems PRESENTER: Malena Espanol ABSTRACT. Variable projection methods are among the classical and efficient methods to solve separable nonlinear least squares problems such as blind deconvolution, system identification, and machine learning. In this talk, we present a modified variable projection method for large-scale separable nonlinear inverse problems, that promotes edge-preserving and sparsity properties on the desired solution, and enhances the convergence of the parameters that define the forward problem. Specifically, we adopt a majorization minimization method that relies on constructing quadratic tangent majorants to approximate an ℓp regularization term, by a sequence of ℓ2 problems that can be solved by the aid of generalized Krylov subspace methods at a relatively low cost compared to the original unprojected problem. In addition, more potential generalized regularizers including total variation (TV), framelet, and wavelet operators can be used, and the regularization parameter can be defined automatically at each iteration with the aid of generalized cross validation. Numerical examples on large-scale two-dimensional imaging problems arising from blind deconvolution are used to highlight the performance of the proposed method in both quality of the reconstructed image as well as the reconstructed forward operator. |

09:50 | Bayesian Level Set Approach for Inverse Problems with Piecewise Constant Reconstructions PRESENTER: William Reese ABSTRACT. There are several challenges associated with inverse problems in which we seek to reconstruct a piecewise constant field, and which we model using multiple level sets. Adopting a Bayesian viewpoint, we impose prior distributions on both the level set functions that determine the piecewise constant regions as well as the parameters that determine their magnitudes. We develop a Gauss-Newton approach with a backtracking line search to efficiently compute the maximum a priori (MAP) estimate as a solution to the inverse problem. We use the Gauss-Newton Laplace approximation to construct a Gaussian approximation of the posterior distribution and use preconditioned Krylov subspace methods to sample from the resulting approximation. To visualize the uncertainty associated with the parameter reconstructions we compute the approximate posterior variance using a matrix-free Monte Carlo diagonal estimator, which we develop in this paper. We will demonstrate the benefits of our approach and solvers on synthetic test problems (photoacoustic and hydraulic tomography, respectively a linear and nonlinear inverse problem) as well as an application to X-ray imaging with real data. |

10:15 | Hybrid Projection Methods for Solution Decomposition in Large-scale Bayesian Inverse Problems PRESENTER: Jiahua Jiang ABSTRACT. In this work, we develop hybrid projection methods for computing solutions to large-scale inverse problems, where the solution represents a sum of different stochastic components. Such scenarios arise in many imaging applications (e.g., anomaly detection in atmospheric emissions tomography) where the reconstructed image can be represented as a combination of two or more images and each image contains different smoothness or stochastic properties. In an inversion or inverse modeling framework, these assumptions correspond to different regularization terms for each image in the sum. Although various prior assumptions can be included in our framework, we focus on the scenario where the solution is a sum of a sparse image and a smooth image; thus, we require $\ell_1$ and $\ell_2$ regularization for separate components, respectively. For computing solution estimates, we develop hybrid projection methods for solution decomposition that are based on a combined flexible and generalized Golub-Kahan process. This approach integrates techniques from the generalized Golub-Kahan bidiagonalization and the flexible Krylov methods. The benefits of the proposed methods are that the decomposition of the solution can be done iteratively, and the regularization terms and regularization parameters are adaptively chosen at each iteration. Numerical examples from image processing demonstrate the potential for these methods to be used for anomaly detection. |

10:40 | Minimisation of || b-Ax||_max ||b- Ax||_1 with Krylov-Simplex and column generation PRESENTER: Wim Vanroose ABSTRACT. We minimise the residual, r=b−Ax, over a subspace in ||r||_\max, i.e. maximum of the absolute residuals, and the ||r||_1-norm, i.e least absolute residuals. Optimised over a Krylov subspace this leads to a small linear programming problem that can be solved by a specialised simplex algorithm that finds the optimal linear combination of Krylov basis vectors to approximate the solution. The resulting simplex algorithm requires the solution of a series of small dense linear systems that only differ by rank-one updates. We compare the method with a column generation approach where the basis vectors of the subspace are selected based on the steepest descent direction that is orthogonal to the current subpace. We illustrate the methods with applications from inverse problems. |

11:25 | On Iterative Sparse Triangular Solves ABSTRACT. Sparse triangular solves are a key kernel in numerical linear algebra, especially for preconditioning with incomplete factorizations. The traditional sequential algorithm is not suitable for parallel computing. Recently, several authors have suggested iterative solution methods, which are more suitable for GPUs and similar architectures. Unfortunately, these methods are not very robust and may not converge. We discuss blocking and scaling strategies that improve robustness. We further discuss reordering schemes that are useful when solving a sequence of systems. Our work applies to both the synchronous and the asynchronous case. |

11:50 | Asynchronous Jacobi Methods with Resilience to Data Corruption and Robustness to Data Delay PRESENTER: Christopher Vogl ABSTRACT. Over the past decade, the proliferation of smart devices ranging from residential smart thermostats to industrial smart power grid meters has motivated research into migrating scientific computing from high-performance computing (HPC) and cloud computing (CC) to these devices. Computing on these smart devices, also referred to as edge devices, allows for computation on data in-situ, avoiding the need to aggregate that data to HPC and/or CC facilities for processing. One bottleneck of this migration is that the same fault tolerance approaches that provide reliable results on HPC and CC systems do not necessarily apply to edge computing environments. As an example, consider checkpoint and restart functionality is a standard approach for mitigating the disruption of an HPC or CC node failing. Such an approach is not feasible for many edge computing environments where data storage is more limited and synchronization more costly than with the massive file systems and high-speed interconnects in HPC and CC systems. Thus, new fault tolerance approaches are needed to reliably perform standard scientific computing tasks on the edge. This work leverages recently developed approaches in the literature to formulate asynchronous Jacobi (ASJ) methods that can solve linear systems in the presence of data corruption or communication delay. Motivated by the ideas of the alternating direction method of multipliers to enable rejection of corrupt information, this work derives a rejection criterion for the ASJ methods from convergence theory. The resulting ASJ method is shown to restore convergence that is lost in the original ASJ method when data corruption is introduced. Motivated by work on average consensus reformulated to use evolution information, this work derives a reformulated ASJ method to incorporate the push-sum approach. The resulting ASJ method is shown to restore the convergence rate that is degraded in the original ASJ method when data delay is introduced. The results of this work serve as a proof of concept for similar resilience and robustness improvements in more powerful solvers, such as the conjugate gradient method. Funded by LLNL LDRD projects 21-FS-007 and 22-ERD-045. Prepared by LLNL under Contract DE-AC52-07NA27344. LLNL-ABS-831426. |

12:15 | Asynchronous Chebyshev Methods PRESENTER: Jordi Wolfson Pou ABSTRACT. Iterative methods typically contain many synchronization points which can scale poorly on massively parallel computers. Additionally, these synchronization costs can be further amplified when some cores are delayed, e.g., when faults occur or load imbalances are present. While asynchronous methods have recently gained interest, asynchronous versions of certain state-of-the-art iterative solvers have yet to be developed. We present the first asynchronous Chebyshev method which uses Chebyshev to accelerate an asynchronous version of the BPX multigrid method. Our initial experiments show that, as the problem scales up, using the Jacobi preconditioner within asynchronous Chebyshev can result in divergence. This indicates the need for a preconditioner that provides grid-size independent convergence which is why we use a multigrid preconditioner. We present experimental results from an OpenMP implementation of asynchronous BPX-Chebyshev. |

12:40 | The Algorithmic Development of a Fully Asynchronous Conjugate Gradient Method PRESENTER: Zachary Atkins ABSTRACT. Decentralized computing environments (DCE), or environments, such as edge computing, which utilize spatially distributed, non-hierarchical, and potentially heterogeneous computational units, present additional challenges such as communication delays and data locality which require asynchronous and resilient iterative methods to be practical. Such algorithms must be capable of progressing toward a solution even if one or more computational units is experiencing slowdowns, either due to poor communication connections or under-performing hardware. Previous work has shown that these challenges can be overcome for stationary iterative methods, such as asynchronous Jacobi (ASJ). In high-performance computing (HPC) settings, Krylov subspace iterative solvers, such as the conjugate gradient method (CG), are preferred for solving linear systems with symmetric positive-definite matrices due to their strong convergence guarantees. Despite requiring synchronous communication across all processors at every iteration, CG performs exceptionally well in shared memory and HPC environments. In DCE, the communication costs of synchronization may be considerably higher, as delays from even one computational unit prevent any other units from continuing. Further, the connection speeds between devices in edge computing will be considerable slower than the high-speed interconnects found in the HPC setting. We have developed an asynchronous CG (ACG) algorithm which removes these synchronization points by utilizing the partial direction vectors received at each iteration to form the next, mutually orthogonal search direction. Initial numerical results demonstrate the number of iterations required for convergence scales with the square root of the condition number of the matrix, as in the traditional CG algorithm. Further, we will present numerical results which compare ACG to CG and ASJ in both reliable computing environments and with injected delays and corruption to demonstrate the algorithmic performance of each for different computing environments. Prepared by LLNL under Contract DE-AC52-07NA27344. Funded by LLNL LDRD project 22-ERD-045. LLNL-ABS-831528 |

13:05 | Dynamic Non-Uniform Randomization in Asynchronous Linear Solvers PRESENTER: Evan Coleman ABSTRACT. One approach to improving the performance of stationary linear solvers is to utilize asynchronous communication between the processors. In order to establish bounds on convergence rate, randomized various of asynchronous linear solvers have been studied. This approach has been examined previously for the case where the random selection is done uniformly [1, 5]. A non-uniform random selection has been used for the case of synchronous algorithms [2, 3], however in both studies the distribution remains fixed and does not dynamically change. The idea behind the solvers considered here is for each processor to select the next component to update randomly, using a distribution that more heavily weights selection of components that are somehow more important to the solution. The main contribution is to analyze the potential performance benefit for using a non-uniform distribution, to evaluate the viability of changing the update order dynamically, and to investigate methods for ranking the contribution of each component. This updating procedure is motivated in part by the Southwell iteration, which selects the component with the largest contribution to the residual at each iteration and which can converge in fewer iterations than traditional relaxation schemes; previous work has also shown that Southwell type iterations can converge faster than uniform random selection [4]. There is a balance between the extra computational time required to intelligently select components, and the savings in total iterations. Updating all the residuals every iteration likely introduces too much computational overhead to be of practical use, and techniques are explored for lessening this computational burden. The thrust of the new algorithms is to focus on techniques that achieve this dynamic focus on components that contribute more to the residual but do so in an efficient manner. [1] Haim Avron, Alex Druinsky, and Anshul Gupta. Revisiting asynchronous linear solvers: Provable convergence rate through randomization. Journal of the ACM (JACM), 62(6):51, 2015. [2] Michael Griebel and Peter Oswald. Greedy and randomized versions of the multiplicative schwarz method. Linear Algebra and its Applications, 437(7):1596–1610, 2012. [3] Dennis Leventhal and Adrian S Lewis. Randomized methods for linear constraints: convergence rates and conditioning. Mathematics of Operations Research, 35(3):641–654, 2010. [4] Julie Nutini, Mark Schmidt, Issam Laradji, Michael Friedlander, and Hoyt Koepke. Coordinate descent converges faster with the gauss-southwell rule than random selection. In International Conference on Machine Learning, pages 1632–1641, 2015. [5] John C Strikwerda. A probabilistic analysis of asynchronous iteration. Linear algebra and its applications, 349(1-3):125–154, 2002. |

11:25 | An implicit, conservative, asymptotic-preserving electrostatic particle-in-cell algorithm for arbitrarily magnetized plasmas PRESENTER: Guangye Chen ABSTRACT. We introduce a new electrostatic particle-in-cell algorithm capable of using large timesteps compared to particle gyro-period under a (to begin, uniform) large external magnetic field. The algorithm extends earlier electrostatic fully implicit PIC implementations [1] with a new asymptotic-preserving (AP) particle-push scheme [2] that allows timesteps much larger than particle gyro-periods. In the large-timestep limit, the AP integrator preserves all the averaged particle drifts, while recovering the standard CN scheme for small timesteps, while recovering particle full orbits with small timesteps. The scheme allows for a seamless, efficient treatment of particles in coexisting magnetized and unmagnetized regions, conserves energy and charge exactly, and does not spoil implicit solver performance. Key to the approach is the generalization of the particle sub-stepping approach introduced in Ref. [1] to allow for orbit segments much larger than cell sizes without spoiling conservation properties. The uniform-magnetic-field assumption allows us to use the standard CN update in Ref. [1] without modification, which is a necessary preliminary step to demonstrate the viability of the approach for more general magnetic field topologies (which will require the implementation in Ref. [2], currently underway). We demonstrate by numerical experiment with several strongly magnetized problems (e.g., diocotron instability, modified two-stream instability, drift instability, etc.) that two orders of magnitude wall-clock-time speedups are possible vs. the standard fully implicit electrostatic PIC algorithm [1] without sacrificing solution quality and while preserving strict charge and energy conservation. We will also discuss possible extensions to the electromagnetic context. [1] Chen, Guangye, Luis Chacón, and Daniel C. Barnes. "An energy-and charge-conserving, implicit, electrostatic particle-in-cell algorithm." Journal of Computational Physics 230.18 (2011): 7018-7036. [2] Ricketson, Lee F., and Luis Chacón. "An energy-conserving and asymptotic-preserving charged-particle orbit implicit time integrator for arbitrary electromagnetic fields." Journal of Computational Physics 418 (2020): 109639. |

11:50 | Fluid Preconditioning for a Fully Implicit Electromagnetic Gyrokinetic Particle-in-Cell Method PRESENTER: Benjamin Sturdevant ABSTRACT. A fully implicit particle-in-cell (PIC) method, based on the work in [1], has been implemented in the full-volume fusion plasma code XGC [2] and was recently demonstrated to provide numerically stable and accurate solutions to the electromagnetic gyrokinetic equations in tokamak geometry [3]. In this talk, we will present a preconditioned Picard iteration scheme, accelerated with Anderson mixing, for handling the resulting system of nonlinear equations at each timestep. The preconditioner is designed from a simplified electron fluid model, which captures the stiff modes of the kinetic system, and accounts for additional numerical effects originating from the PIC method. In addition, we will present our recent work in designing and implementing a discrete formulation for eliminating finite-grid instabilities, which can be problematic in certain physical parameter regimes [4]. The new discrete formulation has promise for enabling the use of a Schur complement form of the fluid preconditioner equations, which can be solved using a semi-coarsening multigrid approach. [1] G. Chen and L. Chacón, Comput. Phys. Comm., 197, 73-87, 2015. [2] S. Ku, C.S. Chang, and P.H. Diamond, Nucl. Fusion, 49, 115021, 2009. [3] B. Sturdevant, S. Ku, L. Chacón, et al. Phys. Plasmas, 28, 072505, 2021. [4] B. Sturdevant and L. Chacón. submitted to J. Comput. Phys., 2021. |

12:15 | Block Preconditioning of a Semi-Implicit Gyrokinetic Model of Fusion Plasmas PRESENTER: Lee Ricketson ABSTRACT. We describe the block preconditioning approach used in our COGENT code, whose primary application is the simulation of the edge plasma region of tokamak fusion reactors. The underlying model is a continuum gyrokinetic system describing the evolution of plasma species distribution functions in 4D axisymmetric or 5D phase space, where the configuration space geometry includes open and closed field line regions spanning the magnetic separatrix and X-point. COGENT combines a high-order, finite-volume, mapped-multiblock spatial discretization with an additive Runge-Kutta (ARK) time integrator to advance the distribution functions, fluid and neutral species moments, and an electrostatic potential, all with a variety of collision operators. Central to the ARK approach is the identification of fast time-scales to be included in the implicit component of the fully coupled system, which is advanced consistently with the slower explicit terms to a specified temporal accuracy. Updates of the implicit ARK stages require the solution of nonlinear systems, for which COGENT employs a Newton-Krylov algorithm. Block preconditioners are employed to accelerate the Krylov iteration, where the blocks approximate the operators responsible for the various fast time scales being treated implicitly. In addition to disjoint operators resulting in diagonal preconditioner blocks, the framework anticipates the possible overlap of implicit terms, coordinating the construction of linear systems and their solution using multigrid solvers provided by the Hypre library. We present several examples of the application of this approach and the preconditioning strategies required for each, including the simulation of kinetic microturbulence in diverted tokamak geometries. *This material is based on work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics Program under Contract DE-AC52-07NA27344 at Lawrence Livermore National Laboratory. |

12:40 | Enabling optimizations for reduced kinetic spectral models PRESENTER: Oleksandr Koshkarov ABSTRACT. Efficient coupling of the microscopic physics into the macroscopic system-scale dynamics (called "fluid-kinetic" coupling) is probably the most important and unresolved problem of computational plasma physics. It impacts most of plasma physics areas including space physics and fusion systems. Majority of conventional simulation tools capable of describing large scale dynamics are usually limited to simplified fluid/magnetohydrodynamics description of the plasma, because of the large spatial and temporal scale separation typical of plasmas. Yet, fluid models lack the microscopic physics, which is known to be important in many applications (e.g., reconnection, shock physics, etc.) A way forward is to build models that combine kinetic and fluid description in one consistent framework. The development of such methods can bridge the scale gap to successfully handle coupling of large-scale dynamics and microscopic processes. In this presentation, we describe a novel simulation method, where the kinetic equation is solved using a spectral expansion of the plasma distribution function. The low-order terms in the expansion capture the large-scale dynamics of the system, while higher-order terms add microscopic physics incrementally, similar to classical fluid-moment expansion. Such a method is ideally suited for problems involving fluid-kinetic coupling, since the number of expansion terms could be adapted in space and time. Furthermore, the spectral basis itself adaptively changes in space and time, adjusting to plasma mean flow and temperature, thus making the representation of the particle distribution function very efficient. We show that our reduced kinetic model with just a ~(4-6)^3 velocity-space moments agrees well with results from fully-kinetic simulations on some examples. In addition to describing the method, we will present several examples illustrating its application to various problems, including solar wind-magnetosphere interaction. |

13:05 | Newton-Krylov-Multigrid approaches for mixed formulations of smectic A liquid crystals PRESENTER: Abdalaziz Hamdan ABSTRACT. In recent years, energy-minimization finite-element methods have been proposed for the computational modelling of equilibrium states of several types of liquid crystals (LCs). Here, we consider a four-field formulation for models of smectic A liquid crystals, based on the free-energy functionals proposed by Pevnyi, Selinger, and Sluckin, and by Xia et al. The Euler-Lagrange equations for these models include fourth-order terms acting on the smectic order parameter (or density variation of the LC) and second-order terms acting on the Q-tensor or director field. While $H^2$ conforming or $C^0$ interior penalty methods can be used to discretize the fourth-order terms, we investigate introducing the gradient of the smectic order parameter as an explicit variable and constraining its value using a Lagrange multiplier. In this talk, we focus on the construction of solvers for the nonlinear systems that result from the discretization of these models. We consider a Newton-Krylov-Multigrid approach, using Newton's method to linearize the systems, and developing monolithic geometric multigrid preconditioners for the resulting saddle-point systems. We demonstrate this to be an effective solver strategy when using a “star” relaxation scheme for the coupled system. |

11:25 | Golub-Kahan bidiagonalization with hybrid projection for streaming problems ABSTRACT. We use the Golub-Kahan bidiagonalization algorithm with hybrid projection to solve inverse problems with regularization with only a limited part of the matrix available at a time and using only limited memory. The algorithm allows accessing the matrix one block at a time, this can be a block of rows or a block of columns. The possibility of solving a linear system with only a subset of columns of the matrix available at a time also allows a range of interesting solution strategies for general matrices. |

11:50 | Fast algorithms for initial value control problems in diffeomorphic registration PRESENTER: Andreas Mang ABSTRACT. We propose fast numerical methods for optimal control problems with initial value control. Our contributions are the design of efficient second-order optimization methods and their analysis. The inverse problem we consider is referred to as diffeomorphic image registration. Here, we seek a diffeomorphism $y$ that establishes a meaningful spatial correspondence between two views (images) of the same scene. In our formulation the diffeomorphism $y$ is parametrized by its velocity $v$. The control is the initial momentum $u_0$ of an integro-differential equation---the Euler--Poincar\'e equation associated with the diffeomorphism group. This equation together with a transport equation for the image intensities represent the state equations of our problem. We present a Newton--Krylov method for numerical optimization. The bottleneck of our method is the computation of the search direction, i.e., the solution of the reduced space Hessian system. We present unconditionally stable numerical methods for evaluating the forward and adjoint operators. We study the spectral properties of the Hessian operator and introduce different strategies for preconditioning the reduced space Hessian. More precisely, we explore the performance of low-rank approximations that exploit randomized linear algebra ideas, multi-level/multi-grid approaches, as well as their combination. We showcase results for applications in medical imaging sciences. |

12:15 | A new robust collective multigrid SQP-type approach for risk-adverse optimal control problems PRESENTER: Tommaso Vanzan ABSTRACT. In this contribution, a class of risk-adverse Optimal Control Problems (OCPs) under uncertainty is considered, involving the conditional value at risk measure of level $\beta$, that is the expectation of a quantity of interest conditioned above the $\beta$ quantile. Such problem can be approximated by a nonlinear optimization problem over the classical state, adjoint and control variables plus a further scalar unknown. Classical full-space Sequential Quadratic Programming approaches (SQP) are not robust since they may lead to singular Hessian matrices along the iterations. To overcome this difficulty, a new SQP procedure is proposed, where a first SQP is performed on the state, control and adjoint variables, and the scalar quantity is sequentially updated. The SQP step requires the solution of a large saddle-point systems for which few preconditioners have been studied recently. In this talk, the Collective MultiGrid (CMG) algorithm is extended to OCP under uncertainty. Each step of CMG involves the solution a reduced system of dimension N times N, where N is the number of collocation points in the discretization of the continuous expectation. It is shown that this reduced system can be solved with linear complexity in $N$. Numerical experiments support the proposed methodology, showing robustness of the global nonlinear procedure and interior linear solver with respect to the several parameters of the problem. |

12:40 | History Matching Relative Permeability Curves with Bayesian Optimization PRESENTER: Steven Samoil ABSTRACT. In the field of reservoir engineering one of the primary time-consuming steps is process of history matching to update the mathematical models describing fluid flow in the subsurface reservoir to match the existing historical production data more closely. This task usually involves either data assimilation or the repeated tuning of parameters through numerous simulations (Oliver 2011). Bayesian Optimization has not yet seen much use in the petroleum industry and is a useful approach to optimization that allows finding an optimal set of parameters for cost functions that are especially expensive (Brochu 2010). This ability to optimize cost functions that may take extensive run time may be very well suited to the task of history matching reservoir models. In this study a framework has been developed to assist with the evaluation of Bayesian Optimization for the task of history matching relative permeability curves. This framework was designed to enable testing on both local and remote machines with a range of capabilities from desktop workstations to large scale clusters. Simulations are conducted utilizing the Reservoir Simulation Group parallel black oil simulator (Wang 2015). This simulator is designed to run on distributed memory computers and utilizes the finite difference method for discretization of the black oil model. At each timestep in the simulation, the simulator utilizes the inexact Newton method to approximately solve the linearized form of the nonlinear systems in the black oil model (Wang 2015). The coupling between the unknowns in the system (pressure and saturation/bubble point pressure) needs to first be weakened through the application of a multi-stage preconditioner (Wang 2015). The preconditioner chosen for this study is the constrained pressure residual (CPR) method that separates the pressure block from the full system and utilizes the algebraic multigrid solver (AMG) to solve the pressure block before utilizing a global smoother applied to the full system (Wang 2015, Wallis 1985, Cao 2005). The restarted general minimal residual method (GMRES) is then used to find the solution to the linear system (Wang 2015, Saad 2003). This whole process is repeated for the next timestep in the simulation until a stopping condition is met. When conducting history matching of a reservoir model, a search of possible parameters is conducted to find the best possible fit. This task typically involves either the iterative assimilation of data during a simulation or the repeated tuning and re-running of simulations until an appropriate match is found (Oliver 2011). Bayesian Optimization has been chosen for this study due to its suitability for optimization of the extremely long running simulations required in reservoir simulation. Bayesian Optimization utilizes Bayes theorem to determine the likelihood of an output based on prior knowledge of the cost function. Several sets of initial parameters are chosen and then the costs are found by running a simulation with each set of parameters. Based on this newly found knowledge of the cost function the Bayesian Optimization algorithm will use an acquisition function to choose a new set of parameters that are expected to provide an improvement. At this stage the optimization process can be configured to be more exploratory in selecting parameters to avoid local minimums or more exploitative to conduct a more directed search of the parameter space. Multiple iterations of this process are conducted until a stopping condition is met (Brochu 2010). For the cost function the fluid production and injection rate curves for the water, oil, gas, and injected fluids are compared against the production data curves using FastDTW (Salvador 2007). The distances between the test and production curves are combined using the Euclidean norm to find the cost of the current set of parameters. At the end of the optimization process, the parameters with the minimum cost are chosen as the optimal parameter set. Due to the randomness in selecting new parameters the average number of required epochs for the optimization is found through repeated runs of the entire process. The SPE 9 black oil model has been utilized for the initial development in this study as a test and benchmark model due to its reasonable complexity but still manageable simulation time (Killough 1995). To simulate the real-world task of history matching, the standard SPE 9 model is set as the “real” production data. For the remainder of the study the relative permeability curves are assumed to be unknown and history matching will be conducted to determine these curves. To aid the process of history matching the relative permeability curves are reduced to a smaller set of 10 parameters with the application of Stone’s Model II (Reynolds 2004). In this model the curves are parameterized to an approximate analytical form that closely matches the values of the relative permeability curves (Reynolds 2004). Current early results are showing promising behavior to find the optimal parameters, however further work is in progress to finalize the experiments. We have begun to evaluate the performance of this Bayesian Optimization framework on larger scale models that require a significant amount of simulation time (on the order of days/weeks on standard workstations or hours on large-scale computation resources). Due to the expensive time cost of these simulations, it is expected the Bayesian Optimization framework will show benefits over more traditional approaches such as the Ensemble Kalman Filter (ENKF) method (Lorentzen 2001, Aanonsen 2009). A full comparison between the Bayesian Optimization history matching framework and the ENKF method will need to be conducted. |

13:05 | Multigrid preconditioning for distributed optimal control of fractional parabolic equations PRESENTER: Andrei Draganescu ABSTRACT. Optimal control problems constrained by fractional parabolic operators have attracted a lot of attention over the last couple of years. Such problems arise naturally in optimal control of anomalous diffusion and in machine learning. While solving fractional parabolic equations poses significant challenges on its own, having them appear as constraints in an optimization problem increases the complexity dramatically, further limiting the size of the problems that can be solved in practice. In general there are two approaches for solving optimal control problems constrained by PDEs, namely the all-at-once pathway, and the reduced-type methods. In the all-at-once approach one has to solve the KKT system representing the first order optimality conditions. The main advantage lies in the fact that for PDE constraints this system is sparse, and no PDE solves are necessary during the process, as the PDE and its adjoint equation are solved as a coupled system. Several works favor this approach for classical parabolic control. However, for fractional parabolic control, the KKT system is no longer sparse, and we prefer the reduced approach using the control-to-state map. Building on our earlier work on multigrid preconditioning for classical parabolic control and on fractional elliptic control, we show how multigrid can be used to solve fractional parabolic control problems very efficiently. A partial analysis and numerical results support our claim. |