ArXiv

Consistency Regularised Gradient Flows for Inverse Problems

2026-05-08 · 15:45 UTC ·Alessio Spagnoletti, Tim Y. J. Wang, Marcelo Pereyra... ·1 min read

Authors: Alessio Spagnoletti, Tim Y. J. Wang, Marcelo Pereyra...
Categories: stat.ML, cs.CV, cs.LG
arXiv: https://arxiv.org/abs/2605.07907v1
PDF: https://arxiv.org/pdf/2605.07907v1

Brief

The paper tackles inverse imaging with Vision–Language Latent Diffusion Models by introducing a unified Euclidean–Wasserstein-2 gradient-flow that jointly samples the posterior and optimizes prompts in latent space. By pairing this flow with few-step latent text-to-image models, the method achieves low-NFE inference and avoids backprop through autoencoders, yielding state-of-the-art results on canonical inverse problems with much lower compute; summary based on the abstract only.

Why it matters

Proposes a Euclidean–Wasserstein-2 gradient-flow framework that jointly performs posterior sampling and prompt optimization in the latent space, aligning the generative prior and posterior with observed data.

Key details

Combines the single flow with few-step latent text-to-image models to enable low-NFE inference without backpropagation through autoencoders, addressing the high-NFE and heavy backprop costs of prior LDM-based solvers (e.g., Rombach et al., 2022).
Authors Alessio Spagnoletti, Tim Y. J. Wang, Marcelo Pereyra, and O. Deniz Akyildiz (arXiv:2605.07907v1, 2026-05-08) report state-of-the-art reconstructions on several canonical imaging inverse problems with significantly reduced computational cost.

Source evidence

Abstract

Vision-Language Latent Diffusion Models (LDMs) (Rombach et al., 2022) provide powerful generative priors for inverse problems. However, existing LDM-based inverse solvers typically require a large number of neural function evaluations (NFEs) and backpropagation through large pretrained components, leading to substantial computational costs and, in some cases, degraded reconstruction quality. We propose a unified Euclidean-Wasserstein-2 gradient-flow framework that jointly performs posterior sampling and prompt optimization in the latent space through a single flow that aligns the prior and posterior with the observed data. Combined with few-step latent text-to-image models, this formulation enables low-NFE inference without backpropagation through autoencoders. Experiments across several canonical imaging inverse problems show that our method achieves state-of-the-art performance with significantly reduced computational cost.