ArXiv

Don't Get Your Kroneckers in a Twist: Gaussian Processes on High-Dimensional Incomplete Grids

Authors
Mads Greisen Højlund, August Smart Lykke-Møller, Henry Moss...
Categories
cs.LG
arXiv
https://arxiv.org/abs/2605.08036v1
PDF
https://arxiv.org/pdf/2605.08036v1

Brief

CUTS-GPR targets the computational bottleneck of Gaussian process regression in high-dimensional problems by exploiting an additive kernel on an incomplete grid to reveal structure that yields an extremely fast kernel matrix–vector product. The method shows near-linear (or linear) scaling in N and low-order polynomial scaling in D, with benchmarks on billions of points and thousands of dimensions and a full GPR + hyperparameter run in hours for N=447,265, D=24, enabling Bayesian modeling of high-dimensional potential energy surfaces in computational chemistry.

Why it matters

CUTS-GPR introduces an extremely fast kernel matrix–vector product that attains near-linear or linear scaling with training size N and low-order polynomial scaling with dimensionality D by combining an additive kernel with an incomplete grid structure.

Key details

  • Authors demonstrate scalability with benchmarks involving billions of data points and thousands of dimensions; a full Gaussian process regression (including hyperparameter optimization) was completed in hours for N = 447,265 and D = 24, enabling Bayesian modeling of high-dimensional potential energy surfaces.
Source evidence

Abstract

We introduce CUTS-GPR, a new method for performing numerically exact Gaussian process regression (GPR) in high-dimensional settings. The key component of CUTS-GPR is an extremely fast kernel matrix-vector product, which exhibits near-linear or even linear scaling with the amount of training data, $N$, and low-order polynomial scaling with dimensionality, $D$. This is obtained by combining an additive kernel with an incomplete grid and exploiting the resulting structure of the kernel matrix. We demonstrate the scalability of the matrix-vector product by running benchmarks with billions of data points and thousands of dimensions. Full GPR calculations, including hyperparameter optimization, are completed in a matter of hours for $N = 447 265$ and $D = 24$. We demonstrate that our CUTS-GPR enables Bayesian modeling of high-dimensional potential energy surfaces - a longstanding challenge in computational chemistry.

Comment: 51 pages, 8 figures