CCIMM26: COLORADO CONFERENCE ON ITERATIVE AND MULTIGRID METHODS
PROGRAM FOR TUESDAY, JUNE 23RD
Days:
previous day
next day
all days

View: session overviewtalk overview

08:00-09:40 Session 7A: Algebraic Multigrid (Part 1 of 2)
08:00
LS--AMG--DD: An AMG/DD Solver for Least-Squares-Style Systems
PRESENTER: Oliver Krzysik

ABSTRACT. We discuss LS--AMG--DD, a new solver for least-squares-style systems that blends ideas from algebraic multigrid and domain decomposition. Specifically the solver combines overlapping Schwarz smoothing with spectral-based coarse grids. The talk focuses on two themes. First, we show how the method applies to conforming finite element discretizations. In particular, we identify a structural condition under which the global operator admits the local Gram structure required by the solver. We also discuss aspects of convergence theory, combining analysis of local generalized eigenvalue problems, overlapping Schwarz smoothing, and classical multigrid approximation properties.

08:25
H2-MG: A Multigrid Method for Hierarchical Rank Structured Matrices
PRESENTER: Edmond Chow

ABSTRACT. The H2 matrix format provides linear scaling storage and matrix-vector multiplications for dense kernel matrices from engineering and machine learning. The format uses a hierarchical clustering of the kernel matrix points and naturally defines a hierarchical structure. We propose using this hierarchical structure to define a multilevel solver, like an aggregation-based multigrid method. The prolongation operator can be constructed from the basis vectors for the block rows of the H2 matrix. The coarse grid matrices also have H2 structure. A major question is how to obtain smoothing that is complementary to coarse grid correction.

08:50
Agglomeration-based multigrid solver in a lazy-evaluated array framework
PRESENTER: Matt Smith

ABSTRACT. Lazy evaluation in array-based codes exposes opportunities to sidestep many of the performance pitfalls associated with traditional eagerly-evaluated array frameworks (e.g., NumPy, CuPy, PyOpenCL). By deferring evaluation, transformations can be employed to fuse loops, avoid the creation of unnecessary intermediate arrays, and perform numerous other problem-domain-targeted optimizations.

Variations on this approach have already seen success in some domains, such as JAX for AI/ML, and applications of such techniques to PDE-based numerical simulation codes and array computations more broadly are areas of active interest.

We introduce a framework for lazy array evaluation in PDE-based simulations that has been implemented as the array substrate for a Discontinuous Galerkin simulation code and used successfully to simulate combusting supersonic flows in a scramjet-like configuration. Additionally, we present an agglomeration-based multigrid solver built on top of this framework to enable support for elliptic problems as well as implicit time integration.

09:15
Energy minimization multigrid for the eddy current equations
PRESENTER: Christian Glusa

ABSTRACT. The solution of Maxwell's equation using multigrid methods requires specialized approaches that preserve the null-space properties of the curl-curl operator. Combining ideas from the Reitzinger and Schoeberl algorithm and energy minimization, we present a new algebraic multigrid algorithm that enforces the commuting relationship between grid transfers and discrete gradient operator.

08:00-09:40 Session 7B: Iterative methods in science and engineering (Part 1 of 2)
08:00
Optimization problems with Holder continuous gradients
PRESENTER: Carl Kelley

ABSTRACT. We consider projecyted gradient descent methods for strongly convex constrained optimization problems with Holder continuous, but not globally Holder continuous gradients. Existing complexity results do not apply in this situation. We prove complexity results for simple gradient descent and extend Nestrov's fast gradient methods. Our estimates indicate that there is an opportunity for preconditioning and we explore that possibility with experiments.

08:25
Iterative execution of discrete and inverse discrete Fourier transforms with applications for signal denoising via sparsification

ABSTRACT. We describe a family of iterative algorithms that involve the repeated execution of discrete and inverse discrete Fourier transforms. One interesting member of this family is motivated by the discrete Fourier transform uncertainty principle and involves the application of a sparsification operation to both the real domain and frequency domain data with convergence obtained when real domain sparsity hits a stable pattern. This sparsification variant has practical utility for signal denoising, in particular the recovery of periodic or structured signals in the presence of noise. Performance is demonstrated on vector, matrix and tensor-valued data with the denoising of deterministic networks and spatial transcriptomics data specific motivating examples. More details can be found in preprints https://arxiv.org/abs/2211.09284 and https://arxiv.org/abs/2602.00790.

08:50
Symplectic nonlinear splitting methods for Hamiltonian Monte Carlo

ABSTRACT. Hamiltonian Monte Carlo (HMC) has become a central tool for sampling from high-dimensional probability distributions in Bayesian inference, statistical learning, and stochastic simulation. Its efficiency depends critically on the numerical integrator used to generate proposals. Standard explicit splittings, such as leapfrog, can become severely step-size limited on targets with strong anisotropy, curvature, or nonlinear coupling. Fully implicit symplectic methods can improve robustness, but at a significantly higher computational cost. This motivates the search for integrators that retain the geometric structure needed by HMC while allowing more flexible treatment of difficult nonlinear target geometry. In this talk I will describe a general framework for symplectic nonlinear splitting methods for HMC based on two-point discrete variational models. We introduce a nonlinear two-point discrete potential that couples the current and next positions and defines a discrete force consistent with the original target on the diagonal. The resulting update map is then obtained from a discrete variational principle. This provides a systematic way to build proposal integrators that are symplectic by construction and that can be designed to reflect geometric features of the target distribution. Our approach accommodates nonlinear and geometry-adapted splittings that are difficult to express in standard operator-splitting form. Our proposed framework suggests natural opportunities for multilevel and surrogate constructions. We explore the design of coarse models to approximate stiff nonlinear components, low-rank or quadratic surrogates to define cheaper implicit stages, and multilevel correction ideas to accelerate the inner solves while respecting the outer HMC structure. We present results on a collection of standard benchmark targets exhibiting common challenges such as anisotropy, strong coupling, and curved or locally stiff geometry. We also discuss practical heuristics for selecting efficient splittings based on such features, with the goal of balancing robustness of the implicit stage solves against the acceptance behavior needed for effective HMC proposals.

09:15
The Wilson-Dirac Operator and its Spectrum
PRESENTER: Patrick Oare

ABSTRACT. Lattice quantum chromodynamics (QCD) is an ab-initio numerical framework for studying QCD, the theory of the strong nuclear force, in which quantum fields are discretized on a spacetime lattice and represented as finite-dimensional complex vectors. The Dirac operator acts on the space of these fields and, upon discretization, is realized as a large, sparse matrix. Its spectral properties carry direct physical significance: real eigenvalues correspond to topologically non-trivial field configurations, and its low modes limit convergence in Krylov-based linear solvers.

One of the most common discretizations of the Dirac operator is the Wilson-Dirac operator. This operator is non-normal but enjoys a symmetry called gamma-5 hermiticity, which constrains its eigenvalues to either be real or come in complex-conjugate pairs, and permits the construction of an associated hermitian operator which is amenable to the Lanczos algorithm. Of particular interest is in understanding the interior eigenmodes of the Wilson-Dirac operator, which are difficult to compute with conventional Krylov solvers.

In this talk, I will discuss ongoing work to understand the relationship between the complex spectrum of the Wilson-Dirac operator and the real spectrum of its hermitian counterpart. I will present the first application of the Krylov-Schur algorithm and its extension to harmonic Ritz values to lattice QCD, and demonstrate that the computed spectra visually and unambiguously reveal topological sector changes across different gauge field configurations.

09:40-10:20Coffee and Tea Break
10:20-12:00 Session 8A: Machine Learning and Neural Networks (Part 1 of 2)
10:20
Structure-Guided Gauss-Network Method: LSNN for Linear Advection-Reaction Equation
PRESENTER: César Herrera

ABSTRACT. The structure-guided Gauss-Newton (SgGN) method was introduced in Cai et al. (2024) for solving the non-convex optimization problem arising from the least-squares approximation to a given function using a shallow ReLU neural network (NN). The SgGN alternates between the linear and the nonlinear parameters by the respective linear solve and a modified Gauss-Newton (GN) method. The modified GN can truly treat the singularity of the GN matrix and is much more efficient than the generic Levenberg-Marquardt method. By overcoming a major difficulty on less smoothness of ReLU NN functions, this paper extends the SgGN to solve the non-convex optimization problem arising from the least-squares NN (LSNN) method for two-dimensional linear advection-reaction equation. The resulting SgGN achieves highly accurate NN approximation in significantly fewer iterations than the commonly used optimizers such as ADAM, BFGS, LM, etc.

10:45
Preconditioning Via Spectral Density Driven Graph Neural Networks
PRESENTER: Francesco Brarda

ABSTRACT. We propose a learning-based framework for constructing sparse approximate inverse preconditioners that promote spectral clustering. The method combines two key components. First, we introduce a Line-Graph with Virtual Node (LG-VN) architecture that transforms a matrix adjacency graph into its line graph, enabling message passing directly on matrix entries. A virtual node provides a global context at constant communication depth, enhancing expressivity for long-range interactions. Second, we approximate the spectral density of the preconditioned operator and penalize spectral mass away from unity to encourage eigenvalue clustering. To stabilize optimization with nonconvex spectral objectives, we employ a two-stage training: a Frobenius norm based phase to obtain a robust initial factorization, followed by the spectral fine-tuning. The preconditioner is in a factorized form, where for nonsymmetric matrices a dual-head architecture is used to predict nonsymmetric factors, to ensure that inference requires only sparse matrix–vector multiplications. Experiments on representative PDE problems demonstrate consistent reductions in Krylov iteration counts and improved robustness compared to classical factored sparse approximate inverse methods.

11:10
Spatial and Channel Refinement of Convolutional Neural Networks
PRESENTER: Jonas Actor

ABSTRACT. While machine learning methods are playing an increasingly prominent role in accomplishing grid-based and imaging tasks, they still lack practical theory to guide decisions about their architecture, parameterization, and hyperparameter selection. To address this issue, we use Kolmogorov-Arnold Networks (KANs), and their connections to multichannel multilayer perceptrons (MLPs), to create a convolutional architecture that allows for simultaneously performing geometric refinement of activation functions and spatial refinement of convolution operators. By performing this refinement, we achieve better performance in terms of accuracy and efficiency. We demonstrate this method on a handful of examples that highlight the method’s capabilities, including a principled nested iteration method to refine the popular Xception architecture, and an application for a convolutional PDE learning problem defined on a regular grid.

11:35
What Makes a Good Preconditioner for Data Science?
PRESENTER: Mitchell Scott

ABSTRACT. Stochastic Gradient Descent (SGD) often slows in the late stage of training due to anisotropic curvature and gradient noise. We analyze preconditioned SGD in the geometry induced by a symmetric positive definite matrix $\mathbf{M}$. Our bounds make explicit how both the convergence rate and the stochastic noise floor depend on $\mathbf{M}$. For nonconvex objectives, we establish a basin-stability guarantee in a local $\mathbf{M}$-metric neighborhood around a minimizer set: under local smoothness and a local PL condition, we give an explicit lower bound on the probability that the iterates remain in the basin up to a time horizon. Our framework covers both diagonal/adaptive and curvature-aware preconditioners and yields a practical criterion: choose $\mathbf{M}$ to improve local conditioning while attenuating noise in the $\mathbf{M}^{-1}$-norm. While we originally considered scientific machine learning (SciML) problems-- where reaching small training losses under stochastic updates is closely tied to physical fidelity, numerical stability, and constraint satisfaction-- we now use tools from random matrix theory (RMT) to study generalizability to other machine learning models. Specifically, we analyze the theoretical properties of covariance preconditioners to \textit{a priori} have a metric that can inform the benefit of preconditioning. Experiments on a quadratic diagnostic and three SciML benchmarks support the predicted rate--floor behavior, while experiments on elastic regression reinforce the RMT's efficacy in understanding complicated loss surfaces.

10:20-12:00 Session 8B: Matrix approximations and decompositions
10:20
Parametric Hierarchical Matrix Approximations to Kernel Matrices
PRESENTER: Abraham Khan

ABSTRACT. Kernel matrices are ubiquitous in computational mathematics, often arising from applications in machine learning and scientific computing. In two or three spatial or feature dimensions, such problems can be approximated efficiently by a class of matrices known as hierarchical matrices. A hierarchical matrix consists of a hierarchy of small near-field blocks (or sub-matrices) stored in a dense format and large far-field blocks approximated by low-rank matrices. Standard methods for forming hierarchical matrices do not account for the fact that kernel matrices depend on specific hyperparameters; for example, in the context of Gaussian processes, hyperparameters must be optimized over a fixed parameter space. We introduce a new class of hierarchical matrices, namely, parametric (parameter-dependent) hierarchical matrices. Members of this new class are parametric $\mc{H}$-matrices and parametric $\mc{H}^{2}$-matrices. The construction of a parametric hierarchical matrix follows an offline-online paradigm. In the offline stage, the near-field and far-field blocks are approximated by using polynomial approximation and tensor compression. In the online stage, for a particular hyperparameter, the parametric hierarchical matrix is instantiated efficiently as a standard hierarchical matrix. The asymptotic costs for storage and computation in the offline stage are comparable to the corresponding standard approaches of forming a hierarchical matrix. However, the online stage of our approach requires no new kernel evaluations, and the far-field blocks can be computed more efficiently than standard approaches. Numerical experiments show over $100\times$ speedups compared with existing techniques.

10:45
A Discrete-Sine-Transform based Preconditioner for Adaptive Meshes
PRESENTER: Kate Wall

ABSTRACT. We present a preconditioner for solving fractional differential equations (FDEs) on an adaptive mesh. Adaptive refinement of the problem domain results in a stiffness matrix with Toeplitz blocks along the main diagonal, while the FDE yields a dense stiffness matrix, where off-diagonal blocks are stored as low-rank approximations. Our adaptive preconditioner utilizes ideas from the circulant preconditioner of Chan and Strang [SIAM Journal on Scientific Computing, 1989] and the discrete-sine-transform based preconditioner of Bini and Di Benedetto [Association for Computing Machinery, 1990]. We take advantage of the Toeplitz blocks on the diagonal and accounts for the low-rank nature of the off-diagonal blocks. We demonstrate the adaptive preconditioner's effectiveness at accelerating convergence for our systems and emphasize its efficient application. This work presents theoretical results about the spectral clustering of the preconditioned system. In order to prove these results, special consideration is taken on how the low-rank blocks perturb the eigenvalues of the Toeplitz block-diagonal system. Numerical tests are used to inspect any assumptions and validate our results.

11:10
Caracal: A GPU-Resident Sparse LU Solver with Lightweight Fine-Grained Scheduling
PRESENTER: Jie Ren

ABSTRACT. We address inefficiencies in task scheduling, memory management, and scalability in GPU-resident sparse LU factorization with a two-level approach of sequentially scheduled coarse-grained blocks containing multiple fine-grained blocks managed with a lightweight static scheduler enabling multi-stream parallelism. Additionally, we design an intelligent memory caching mechanism for the fine-grained scheduler, which retains frequently accessed data in GPU memory. To further enhance scalability, we introduce a distributed memory design that partitions the input matrix using a 1D block-cyclic distribution and optimizes inter-GPU communication via NVLink. The multi-GPU design reaches a computational throughput of 6.46 TFLOP/s on four A100 GPUs, demonstrating promising scalability. This is up to 7x speedup over the latest \texttt{SuperLU\_DIST} with 3D communication, 94x speedup over \texttt{PanguLU}, 16x speedup over \texttt{PasTiX}, and 10x speedup over our own coarse-grained dynamic scheduling implementation while reaching up to 21\% of the A100’s theoretical peak performance.

11:35
Matrix Spectrum Quantile Approximation via Moments
PRESENTER: Mikhail Lepilov

ABSTRACT. Much attention has been paid to methods of quickly approximating the spectrum of large matrices, spurred by the possibility of detecting low numerical rank or large spectral gaps. In recent years, techniques utilizing Stochastic Lanczos Quadrature (SLQ) and related ideas to match certain moments of the empirical spectrum distribution have been devised and analyzed, and probabilistic bounds on the spectral density, for example using the Wasserstein-1 distance, have been developed. In this talk, we build on this approach, using the information obtained from SLQ to get more precise guarantees on the quantiles of the empirical spectral distribution. We showcase how to leverage this new quantile estimation framework to enable better estimates on the numerical rank and detection of spectral gaps.

12:00-14:40Lunch Break
14:40-15:20Coffee and Tea Break
15:20-17:00 Session 9: Multigrid for indefinite systems and multigrid reduction in time
15:20
Scalable multigrid solver for the Helmholtz equation: real-shifted coarse grid correction
PRESENTER: Rachel Yovel

ABSTRACT. We present a convergent and scalable multigrid solver for high-frequency Helmholtz equations. Standard multigrid does not converge for high-frequency Helmholtz problems, and a common cure is adding a complex shift and using the shifted operator as a preconditioner. Nevertheless, the complex shift prevents scalabiliy. In this work we present a new method that achieves scalable convergence of a 3-level cycle without a complex shift. Our key idea is modifying the coarse grid correction by real-shifting the coarsest grid Galerkin operator, to correct the numerical dispersion between the grids. We show that this real-shifted coarse grid correction leads to a scalable 3-level method, for problems with 12 grid points per wavelength on the fine grid. For problems with 10 grid per wavelength, we show that our method combined with a very modest complex shift outperforms the standard complex shifted Laplacian method by an order of magnitude. We demonstrate wavenumber independent convergence for heterogeneous and real geophysical media in 2D and 3D.

15:45
Multigrid for FEEC using Mass-Lumping and Transforming Smoothers: Algorithms and Results

ABSTRACT. For several PDEs naturally posed in the de Rham complex, standard (structure-preserving) variational discretizations lead to linear systems that can be indefinite (e.g., in mixed or saddle-point form), so classical point smoothers may be ineffective for multigrid. This situation arises, for example, for the Hodge-Dirac operator, mixed Hodge-Laplacians, and Maxwell-type formulations.

We propose a multigrid preconditioning framework that combines mass-lumping and transforming smoothers. On the discretization side, we replace the consistent FEEC (finite element exterior calculus) mass matrices by explicitly invertible mass-lumped mass matrices. On the solver side, we apply transforming smoothers that map the operator to a block form with positive definite diagonal blocks, enabling Gauss-Seidel-type relaxation. Under mild $h$-uniform norm-equivalence assumptions, we prove uniformly bounded condition numbers of the FEEC system when preconditioned with the mass-lumped one in the case of a trivial topology. This motivates using the mass-lumped MG cycles as preconditioners for the consistent FEEC system. The uniform convergence of the resulting multigrid cycles remains an open theoretical question.

We instantiate the approach for the Hodge-Dirac problem, the mixed Hodge-Laplacian, and a saddle-point system arising from magnetostatics. Extensive numerical experiments in MFEM validate the robustness of this preconditioner, demonstrating efficient V- and W-cycles in 2D/3D, and mesh-width-independent preconditioned GMRES iteration counts for the FEEC problem on unstructured meshes.

16:10
Parallel-in-iteration optimization using multigrid reduction-in-time

ABSTRACT. Standard gradient-based iteration algorithms for optimization, such as gradient descent and its various proximal-based extensions to nonsmooth problems, are known to converge slowly for ill-conditioned problems, sometimes requiring many tens of thousands of iterations in practice. Since these iterations are computed sequentially, they may present a computational bottleneck in large-scale parallel simulations. In this work, we present a "parallel-in-iteration" framework that allows one to parallelize across these iterations using multiple processors with the objective of reducing the wall-clock time needed to solve the underlying optimization problem. Our methodology is based on re-purposing parallel time integration algorithms for time-dependent differential equations, motivated by the fact that optimization algorithms often have interpretations as discretizations of time-dependent differential equations (such as gradient flow). Specifically in this work, we use the parallel-in-time method of multigrid reduction-in-time (MGRIT), but note that our approach permits in principle the use of any other parallel-in-time method. We numerically demonstrate the efficacy of our approach on two different model problems, including a standard convex quadratic problem and the nonsmooth elastic obstacle problem in one and two spatial dimensions. For our model problems, we observe fast MGRIT convergence analogous to its prototypical performance on partial differential equations of diffusion type. Theoretically predicted parallel speedup results are also provided.

16:35
A full layer-parallel training with PyMGRIT for forward and backward passes in neural networks
PRESENTER: Ryo Yoda

ABSTRACT. Layer-parallel training for deep neural networks introduces layer parallelism in addition to data parallelism and model parallelism by interpreting these networks as time-stepping systems and applying multigrid-in-time across network layers. Our previous work studied layer-parallel training through a Python-centric implementation based on the PyMGRIT library, focusing on the forward pass. In this work, we extend the framework to the backward pass, thereby moving toward a full layer-parallel training framework covering both forward and backward passes.

The main difficulty is that the backward pass follows the reverse layer direction, while PyMGRIT is formulated for forward-time execution. We resolve this issue by introducing rank reversal on communicators and local array reversal, allowing the backward pass to reuse the same multigrid-in-time components used in the forward pass. Numerical experiments show training behavior comparable to that of TorchBraid, a state-of-the-art layer-parallel library, in terms of loss and accuracy, while achieving reduced runtime in representative MNIST settings.