ArXiv

Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data

2026-05-11 · 15:24 UTC ·Youssef Chaabouni, David Gamarnik ·1 min read

Authors: Youssef Chaabouni, David Gamarnik
Categories: stat.ML, cs.IT, cs.LG, math.ST
arXiv: https://arxiv.org/abs/2605.10713v1
PDF: https://arxiv.org/pdf/2605.10713v1

Brief

Sparse recovery from mixed-quality measurements (n1 high-quality, n2 low-quality) is studied via information-theoretic and algorithmic sample-size conditions. The paper proves a linear 'Price of Quality' trade-off: agnostic decoders cap one high-quality sample at no more than two low-quality ones, whereas informed decoders can exploit variance knowledge to make the price unbounded. LASSO matches homogeneous-noise thresholds, depending only on average noise. Published ICLR 2026 (arXiv 2026-05-11).

Why it matters

Information-theoretic: for mixed-quality sparse recovery the sufficient sample-size condition is a linear trade-off in (n1, n2) defining a 'Price of Quality' — in the agnostic decoder case one high-quality sample is never worth more than two low-quality samples for this sufficient condition.

Key details

Algorithmic: for the agnostic setting the LASSO recovery threshold matches the homogeneous-noise case and depends only on the average noise level (showing computational recovery is robust to heterogeneity), while an informed decoder (aware of per-sample variances) can make the price of quality arbitrarily large.

Source evidence

Abstract

We study sparse recovery when observations come from mixed-quality sources: a small collection of high-quality measurements with small noise variance and a larger collection of lower-quality measurements with higher variance. For this heterogeneous-noise setting, we establish sample-size conditions for information-theoretic and algorithmic recovery. On the information-theoretic side, we show that it is sufficient for $(n1, n2)$ to satisfy a linear trade-off defining the Price of Quality: the number of low-quality samples needed to replace one high-quality sample. In the agnostic setting, where the decoder is completely agnostic to the quality of the data, it is uniformly bounded, and in particular one high-quality sample is never worth more than two low-quality samples for this sufficient condition to hold. In the informed setting, where the decoder is informed of per-sample variances, the price of quality can grow arbitrarily large. On the algorithmic side, we analyze the LASSO in the agnostic setting and show that the recovery threshold matches the homogeneous-noise case and only depends on the average noise level, revealing a striking robustness of computational recovery to data heterogeneity. Together, these results give the first conditions for sparse recovery with mixed-quality data and expose a fundamental difference between how the information-theoretic and algorithmic thresholds adapt to changes in data quality.

Comment: Published as a conference paper at ICLR 2026