ArXiv

Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions

Authors
Alexander Shen, Mikael Kuusela
Categories
stat.ML, cs.LG
arXiv
https://arxiv.org/abs/2605.12118v1
PDF
https://arxiv.org/pdf/2605.12118v1

Brief

Shen and Kuusela (2026) introduce a score-augmented loss for neural likelihood surrogates in simulation-based inference, augmenting binary cross-entropy with exact parameter-space score ∇_θ log p(x | θ) and adaptive weighting based on loss gradients. Evaluated on network dynamics and spatial processes, the approach boosts surrogate quality and can match the effect of 10× more training data with under a 10% training-time increase.

Why it matters

Proposes a score-augmented loss for neural likelihood surrogates: augment binary cross-entropy with exact score information ∇_θ log p(x | θ) and adaptive, gradient-based weighting to exploit structure in stochastic process models (Shen & Kuusela, 2026).

Key details

  • On network-dynamics and spatial-process case studies, the method improves surrogate quality and, in some cases, yields downstream inference performance equivalent to a 10× increase in training data while increasing training time by less than 1.1×.
Source evidence

Abstract

For stochastic process models, parameter inference is often severely bottlenecked by computationally expensive likelihood functions. Simulation-based inference (SBI) bypasses this restriction by constructing amortized surrogate likelihoods, but most SBI methods assume a black-box data generating process. While these surrogates are exact in the limit of infinite training data, practical scenarios force a strict tradeoff between model quality and simulation cost. In this work, we loosen the black-box assumption of SBI to improve this tradeoff for structured stochastic process models. Specifically, for neural network likelihood surrogates trained via probabilistic classification, we propose to augment the standard binary cross-entropy loss with exact score information $\nabla_θ\log p(x \mid θ)$ and adaptive weighting based on loss gradients. We evaluate our approach on case studies involving network dynamics and spatial processes, demonstrating that our method improves surrogate quality at a drastically lower computational cost than generating more training data. Notably, in some cases, our approach achieves downstream inference performance equivalent to a 10x increase in training data with less than a 1.1x increase in training time.

Comment: 9 pages of main text, 9 pages of appendices, 13 figures