View: session overviewtalk overview
08:00 | A Multi Scale Two Grid Strategy for Kinetic Modeling of Burning Plasmas PRESENTER: William Taitano ABSTRACT. In laboratory thermonuclear burning plasma experiments [1] such as those encountered in inertial and magnetic confinement fusion (I/MCF) devices, deuterium-tritium (DT) nuclear fuel at approximately 10keV generates nearly monoenergetic 3.5MeV alpha particles, driving the internal energy production mechanism. The Vlasov-Fokker-Planck (VFP) kinetic modeling of such plasmas presents significant multiscale challenges. Specifically, the 3.5MeV alpha particles slow down to form a highly localized 10eV~10keV ash population, resulting in a large velocity-space structure separation spanning factors of 30 to 600. This disparity renders conventional grid-based techniques for solving the VFP equations computationally intractable. To address this challenge, we introduce a novel, consistent, and conservative multiscale two-grid strategy for modeling fusion-born alpha particles in thermonuclear plasmas, building on the approach pioneered by Peigney [2]. Our method employs separate velocity-space grids for the alpha particle and thermal ash populations. We introduce a physics-based sink term that removes particles transitioning into the ash thermal speed to avoid the formation of unresolved singular structures in the alpha grid. This removed population is subsequently reconstructed as a source term in the ash VFP equation in such a manner that ensures detailed balance and consistency with the original equations, while rigorously conserving mass, momentum, and energy. We have implemented this new capability into the iFP VFP spherical implosion code [3] and demonstrate its effectiveness in modeling challenging multiscale thermonuclear burning ICF problems. [1] H. Abu-Shawareb et al., Phys. Rev. Lett., 129 (2022) 075001. [2] B. E. Peigney et al., J. Comp. Phys., 278 (2014) 416-444. [3] W. Taitano et al., Comp. Phys. Comm., 263 (2021) 107861. |
08:25 | Multigrid preconditioning of implicit integration electromagnetic effects in the gyrokinetic code COGENT PRESENTER: Lee Ricketson ABSTRACT. Gyrokinetic simulation of tokamak fusion reactors represents an extremely multi-scale problem, both temporally and spatially. The temporal scales manifest in stiff terms in the governing PDEs, thus motivating IMEX time integration and the need for efficient solvers for the implicit systems these schemes give rise to. COGENT has long leveraged algebraic multigrid solvers as a preconditioner for many of the stiff terms appearing in electrostatic gyrokinetic simulations, all in complex geometry. We show here that extension to an electromagnetic model – important due to steep gradients in edge regions – presents new challenges from both high-order terms and large spatial anisotropy. We present two approaches to these challenges. In the first, an approximate inversion is performed analytically to simplify the system. This results in a well-conditioned linear system and an effective preconditioner if the electrostatic scalar and magnetic vector potentials have the same boundary conditions. However, there are physical motivations for these boundary conditions to be different, which breaks the advantageous cancellation. We thus present another, more general approach that gets around the fact that, due to large spatial anisotropy, the linear system has a near-null space. *This material is based on work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics Program under Contract DE-AC52-07NA27344 at Lawrence Livermore National Laboratory. |
08:50 | Optimization and Analytical Approaches for Plasma Instability Control PRESENTER: Martin Guerra ABSTRACT. Plasma dynamics exhibit a rich multiscale structure. In time evolution, it is common to observe exponential growth of electric energy and plasma instabilities -- a major challenge in fusion energy research. One promising strategy for controlling these instabilities is to apply a proper external electric field. However, determining the optimal field configuration requires solving a PDE-constrained optimization problem, where the objective function quantifies the instability. In this work, we investigate, using Vlasov-Poisson model, various metrics for assessing instability, evaluate different optimization solvers, and explore analytical approaches to generate effective initial guesses. This provides insights for improved control of plasma confinement. This is a joint work with Yukun Yue, Qin Li, and Leonardo Zepeda-Núñez. |
09:15 | Graph Lineages and Skeletal Graph Products PRESENTER: Cory Scott ABSTRACT. Graphs, and sequences of successively larger interrelated graphs, can be used to specify the architecture of mathematical models in many fields including machine learning and computational science. Here we define structured graph ``lineages'' that grow in a hierarchical fashion, so that: (1) the number of graph vertices and edges increases roughly exponentially in level number; (2) as in algebraic multigrid methods, prolongation and restriction maps in the form of sparse orthogonal matrices relate successive levels; (3) a category of "graded graphs'" can be defined, and using it substantially low-cost "skeletal" variants of standard algebraic type constructors (cross product, box product, and function types) can be derived for graded graphs and hence hierarchical graph lineages. The result is an algebraic type theory for graded graphs and (hierarchical) graph lineages, which we validate with category-theoretic proofs. Skeletal graph product lineages retain important structural features of graph products but grow more slowly, avoiding the curse of dimensionality. The approach is expected to be particularly well suited to defining hierarchical model architectures - "hierarchitectures" - and local sampling, search, or optimization algorithms on them. We give examples of constructing classic computational architecture (such as Laplacian pyramids in image processing, the "butterfly diagram" graphs used in the fast Fourier transform, and convolutional neural networks) with skeletal products. Additionally, we validate this method by using skeletal graph products to reproduce and train a common machine learning architecture for image processing. Finally, we demonstrate that these constructions generalize to continuum analogs of graph lineages and skeletal products, which may have implications for efficient solvers for problems defined on continuous spatial domains. |
09:40 | Accelerating Multi-Level Markov Chain Monte Carlo Methods Using Machine Learning Models and Dimensionality Reduction PRESENTER: Sohail Reddy ABSTRACT. Uncertainty quantification (UQ) studies, for example Bayesian inversion by Markov Chain Monte Carlo (MCMC), for high-risk applications typically require accurate modeling of the system dynamics which are often modeled as a system of partial differential equations (PDEs) and solved numerically. These high-fidelity simulations require fine-scale discretization of the spatial and temporal domain, hence increasing the dimensionality of the sample space and the associated computational cost which renders their use in UQ studies intractable. Hence, machine learning models are often used to reduce the computing cost at a loss of accuracy. However, training/constructing such machine learning models requires a dataset whose size grows with the dimensionality of the problem, thereby limiting their use to coarse-grained discretizations. In such cases, model order reduction and multi-level/multi-fidelity methods, where low-fidelity models augment or are used as filters to the fine-scale, high-fidelity simulations, offer a viable alternative. A natural choice for hierarchy of varying fidelity models for multi-level sampling is through the (geometric) multigrid hierarchy used to accelerate the solution of large-scale problems. The coarsest level in the hierarchy can be further extended using a machine learning surrogate, since training such models for small-scale problems is more feasible, and the dimensionality of the sample space can be reduced using standard model order reduction techniques, such as Karhunen-Loève (KL) expansion. Using such a surrogate-augmented multigrid hierarchy, we develop an accelerated, multi-level Markov Chain Monte Carlo (multi-level MCMC) approach with application to large-scale problems in groundwater (Darcy) flow which is used to estimate the posterior distribution and statistical moments of the permeability field and mass flux (quantity-of-interest). We couple the surrogate model and the PDE model on the coarsest level using a surrogate-transition delayed acceptance MCMC scheme and demonstrate a substantial increase in the coarse-level acceptance rate. We present an efficient, scalable approach for hierarchical, conditioned sampling of Gaussian Random Fields (GRFs), which serves as a parameterization of the permeability field. We present results for cases with varying number of levels in the hierarchy and compare the multi-level approach to the traditional single-level MCMC. We also compare our surrogate-augmented multi-level MCMC approach to the standard PDE-based multi-level approach and demonstrate a significant speed up and increase in acceptance rate over the PDE-based approach while maintaining comparable accuracy. We prove that our machine learning accelerated approach yields a consistent MCMC algorithm that preserves detailed balance and ergodicity. Finally, we present some preliminary work on hierarchical sampling of GRFs using a coupled multigrid-KL decomposition that reduces the dimensionality of the sample space while preserving the discretization accuracy and ergodicity of the multi-level MCMC sampler. |
10:25 | Multilevel Algorithms for Training Kolmogorov-Arnold Networks PRESENTER: Graham Harper ABSTRACT. The Kolmogorov Superposition Theorem (KST) is a well-known result in analysis that proves any multidimensional continuous function is equal to a sum of a composition of one-dimensional functions. Comparing the KST to the universal approximation property (UAP) of multi-layer perceptrons (MLPs) encourages the definition of a new accurate and interpretable network called the Kolmogorov-Arnold Network (KAN). In contrast to MLPs, KANs learn activation functions on network graph edges, enhancing their accuracy and interpretability. These learnable functions, which were pathological in the original KST construction, are often parameterized B-splines for KANs due to their approximation qualities. Because these one-dimensional splines have compact support and a geometric interpretation, it's possible to construct hierarchies of KANs with geometric interpolation. We use this to construct multilevel training algorithms for KANs, and we present numerical results showing greatly improved convergence over standard training methods for KANs. |
10:50 | Block Gauss-Newton and Newton Methods for Non-Convex Optimization Problems Arising from ReLU Neural Network Approximations ABSTRACT. This talk will present our recent work on modified Gauss-Newton and Newton methods for iteratively solving non-convex optimization prob- lems. Those nonlinear problems are resulted in discretization of the physics-preserved ReLU neural network (P2NN) methods for PDEs, whose solution is non-smooth and/or discontinuous. The modified Gauss-Newton and Newton methods exploit ReLU neural network algebraic structure and geometrical and physical meanings of neural network parameters. This presentation will use some materials from joint work with J. Chen, J. Choi, T. Ding, A. Doktorova, R. Falgout, C. Herrera, M. Liu, X. Liu, and J. Xia. |
11:15 | Fast solvers for shallow R{\scriptsize e}LU neural network approximations: the 2D case PRESENTER: Tong Ding ABSTRACT. The Structure-Guided Gauss-Newton (SgGN) method was originally designed to leverage structured matrix properties in shallow neural networks, achieving superior accuracy. Extending this framework to two-dimensional problems introduces new computational challenges, particularly in efficiently handling matrix operations. Instead of relying on conventional costly methods, we employ hierarchical semiseparable (HSS) fast direct solvers to reduce the computational complexity of matrix inversions from O(n^2) to O(n). This advancement paves the way for achieving truly linear-time complexity per iteration. A key aspect of this work is the efficient formation of mass and Hessian matrices while preserving their inherent structures, ensuring stability and computational feasibility. Our approach demonstrates how numerical linear algebra techniques can be systematically integrated into machine learning optimization, offering significant improvements in both efficiency and scalability. |
11:40 | Sparsification of Neural Networks Inspired by Optimal Strongly Attack Tolerant Network Configurations PRESENTER: Vladimir Boginski ABSTRACT. Deep neural network (DNN) architectures typically involve several successive layers of neurons, where the patterns of connections between pairs of successive layers are often represented by complete (fully connected) bipartite graphs. Although fully connected neural network architectures may provide certain benefits, they also have potential drawbacks in terms of being computationally expensive and prone to overfitting. In this work, we explore a structural sparsification (pruning) technique for fully connected neural networks, which is inspired by graph-theoretic concepts of cliques and clique relaxations. Cliques in real-world graphs/networks (e.g., social, communication, and other types of networks) represent fully connected groups of nodes with all possible links, whereas clique relaxations are increasingly popular graph-theoretic concepts that "relax" certain characteristics of cliques by removing links, while still maintaining a certain level of structural connectivity/cohesion. Specifically, we propose to use optimal R-robust 2-club network configurations, which have been proven in our previous work to possess the so-called "strong attack tolerance" property: after the removal of any R-1 nodes and/or links, these structures are guaranteed to maintain not only the overall connectivity but also short (two-hop) distances between any pair of nodes, while requiring the minimum possible number of links for the respective network design. We adapt the concept of optimal R-robust 2-club network configurations to develop a structural sparsification method for DNNs, where sparsification levels are explicitly determined by the parameter R. We conduct computational experiments using the well-known FashionMNIST dataset, training a multi-layer perceptron with varying sparsification levels over 20 epochs. The performance is evaluated based on test accuracy, revealing the trade-offs between connectivity and computational efficiency. Preliminary results demonstrate that our structured sparsification approach maintains high classification accuracy, while significantly reducing the network complexity. |
12:05 | Machine Learning for Image Segmentation of Impact Damage in Composite Materials PRESENTER: Pavlo Krokhmal ABSTRACT. In this work, an unsupervised machine learning method for image segmentation of low velocity impact damage in carbon fiber reinforced polymer (CFRP) composites has been developed and compared with supervised methods, such as CNNs. The method relies on the use of non-parametric statistical models in conjunction with the so-called intensity-based segmentation, enabling one to isolate the damage regions in the grayscale images from the micro computed tomography (micro-CT) scans of the impacted composite plates. The proposed unsupervised approach involves minimization of f-divergence metrics, such as the Kullback–Leibler divergence and the Hellinger distance, and the Renyi divergence. It is demonstrated that the introduced method yields superior results to other unsupervised and supervised machine learning techniques. |
16:30 | One and two level parallel-in-time schemes for problems with time varying coefficients PRESENTER: Josh Hope-Collins ABSTRACT. The ParaDiag method enables time-parallelism by adding a small periodic perturbation to the all-at-once system. This makes the system Jacobian circulant, and therefore block-diagonalisable in time and solvable efficiently in parallel. For constant coefficient problems, a Krylov method preconditioned with the circulant matrix converges extremely quickly. However, for variable coefficient problems the circulant matrix must be constructed from time-averaged coefficients to maintain diagonalisability. This additional approximation means that Krylov iterations increase for larger coefficient variations or longer time-horizons. ParaDiag convergence is well established for constant coefficient problems, or for a nonlinear waveform relaxation variant. In the first part of the talk, we present ParaDiag convergence estimates for linear variable coefficient problems, and compare to numerical results using the open-source asQ library. In the second part of the talk, we present an extension of a two-level scheme for variable coefficients. This scheme resembles multigrid-in-time but uses ParaDiag itself as the C-propagator between each C-point. This means, firstly, that the coefficients can be time-averaged locally within each C-interval. Secondly, the “C”-level is not coarsened in time, which potentially provides a route to avoiding the issues coarsening can cause for advection-dominated problems. We present preliminary convergence rate estimates for the two-level scheme, and compare to numerical results. |
16:55 | Parallelism in time-integration with fully implicit Runge-Kutta methods ABSTRACT. We consider the time-integration of ODEs x'=f(t,x) and of certain types of differential-algebraic equations (DAEs) R(t,x,x')=0 by fully implicit Runge-Kutta (FIRK) methods. FIRK methods are well suited for stiff ODEs and DAEs and for parallel computing contrary to explicit or diagonally implicit RK (DIRK) methods which are sequential in nature. We propose to apply FIRK methods with constant step sizes h in some artifical time after time-rescaling. The use of constant step sizes has several advantages over variable step sizes: - backward error results can be formulated; - global error estimation is possible; - no recomputation of Jacobian due to step size change; - no hyperparameters for step size selection. The only major disadvantage of using constant step sizes h is of course its possible inefficiency, but this is addressed by the use of time-rescaling functions r(t,x) where h*r(t,x) acts like a variable step size in the original time t. FIRK methods have multiple avenues for efficient implementation and parallelism which will be briefly reviewed: - vector fields and constraints evaluations; - low-cost starting algorithms; - low-cost predictors of arbitrary high order to improve on starting algorithms and even on the iterates of Newton type/fixed point methods; - for the solution of the inner linear systems of equations by Newton type methods there exist asymptotically correct parallel preconditioners based on the Jacobian of the implicit Euler method. These parallelization techniques can be done in conjunction with a parallel-in-time approach, they just add another level of parallelism. For certain DAEs of index 3 and stiffly accurate FIRK methods such as Radau IIA methods we will also present a simple projection trick to preserve all constraints and to recover superconvergence of the methods. |
17:20 | New space-time parallel algorithms combining multigrid-reduction-in-time and splitting techniques PRESENTER: Iñigo Jimenez-Ciga ABSTRACT. On account of increasingly complex computer architectures, parallelization has become an essential tool when dealing with large problems that are required to be solved as fast as possible. In the framework of evolutionary differential problems, most suitable algorithms involve parallelization in both space and time so as to save CPU runtime. The main contribution of this work is to introduce a new type of space-time parallel methods that results from the combination of the multigrid-reduction-in-time strategy and suitable splitting techniques which permit parallelization in space. A description of the proposed algorithms is provided and several numerical experiments are given in order to illustrate the optimal use of available processors and related convergence properties. |
17:45 | Layer-Parallel Training with PyMGRIT for Deep Residual Neural Networks PRESENTER: Ryo Yoda ABSTRACT. Layer-parallel training is a parallel training method for deep residual neural networks (ResNets) in addition to traditional model and data parallelism. This training strategy leverages the fact that ResNets can be viewed as a forward Euler time discretization scheme for nonlinear initial value problems. By treating the propagation between layers as a classical time-sequential step, one can extract layer parallelism by applying parallel-in-time approaches, such as multigrid-reduction-in-time (MGRIT), which is a promising multigrid-based parallel-in-time method. The TorchBraid library is designed on PyTorch-based applications and currently implements MGRIT through the Python interface of XBraid, which is an MPI/C code. In contrast, this work implements MGRIT through PyMGRIT, a pure Python MGRIT implementation using mpi4py, with the goal to reduce call overhead and optimize training costs in comparison to classic TorchBraid. Numerical experiments demonstrate parallel results on CPUs and GPUs for representative MNIST examples. |
18:10 | Multilevel Training of Transformers PRESENTER: Eric Cyr ABSTRACT. Transformers utilize self-attention and feed-forward mechanisms to achieve state-of-the-art performance on a variety of language modeling tasks. However, as these networks become increasingly larger and deeper, additional sources of parallelism are needed to accelerate their training. By leveraging the residual structure inherent in transformers, we interpret the network as an Euler time step and apply parallel-in-time techniques to achieve speed-up. We demonstrate this speed-up on a variety of problems, including BERT and GPT-style networks. |