ArXiv

Consistency Regularised Gradient Flows for Inverse Problems

Authors
Alessio Spagnoletti, Tim Y. J. Wang, Marcelo Pereyra...
Categories
stat.ML, cs.CV, cs.LG
arXiv
https://arxiv.org/abs/2605.07907v1
PDF
https://arxiv.org/pdf/2605.07907v1

Brief

The paper tackles inverse imaging with Vision–Language Latent Diffusion Models by introducing a unified Euclidean–Wasserstein-2 gradient-flow that jointly samples the posterior and optimizes prompts in latent space. By pairing this flow with few-step latent text-to-image models, the method achieves low-NFE inference and avoids backprop through autoencoders, yielding state-of-the-art results on canonical inverse problems with much lower compute; summary based on the abstract only.

Why it matters

Proposes a Euclidean–Wasserstein-2 gradient-flow framework that jointly performs posterior sampling and prompt optimization in the latent space, aligning the generative prior and posterior with observed data.

Key details

  • Combines the single flow with few-step latent text-to-image models to enable low-NFE inference without backpropagation through autoencoders, addressing the high-NFE and heavy backprop costs of prior LDM-based solvers (e.g., Rombach et al., 2022).
  • Authors Alessio Spagnoletti, Tim Y. J. Wang, Marcelo Pereyra, and O. Deniz Akyildiz (arXiv:2605.07907v1, 2026-05-08) report state-of-the-art reconstructions on several canonical imaging inverse problems with significantly reduced computational cost.
Source evidence

Abstract

Vision-Language Latent Diffusion Models (LDMs) (Rombach et al., 2022) provide powerful generative priors for inverse problems. However, existing LDM-based inverse solvers typically require a large number of neural function evaluations (NFEs) and backpropagation through large pretrained components, leading to substantial computational costs and, in some cases, degraded reconstruction quality. We propose a unified Euclidean-Wasserstein-2 gradient-flow framework that jointly performs posterior sampling and prompt optimization in the latent space through a single flow that aligns the prior and posterior with the observed data. Combined with few-step latent text-to-image models, this formulation enables low-NFE inference without backpropagation through autoencoders. Experiments across several canonical imaging inverse problems show that our method achieves state-of-the-art performance with significantly reduced computational cost.