ArXiv

Black-box model classification under the discriminative factorization

Authors
Hayden Helm, Merrick Ohata, Carey Priebe
Categories
cs.LG, stat.ML
arXiv
https://arxiv.org/abs/2605.07878v1
PDF
https://arxiv.org/pdf/2605.07878v1

Brief

Discriminative factorization is proposed to quantify and distinguish high- versus low-quality query sets for black-box model-level classification. The framework yields a theoretical result: the probability of chance-level classification decays exponentially with query budget. On three auditing tasks the authors show estimated parameters track empirical decay and enable query selection that mirrors oracle ordering.

Why it matters

Introduces the 'discriminative factorization' to evaluate query-set quality for black-box model-level classification; under this framework the probability of chance-level classification decays exponentially with the query budget.

Key details

  • Empirical validation on three auditing tasks (Helm, Ohata, Priebe; arXiv 2026-05-08) shows estimated factorization parameters predict the observed performance-decay rate, and query sets chosen by the estimated discriminative field reproduce the empirical ordering of oracle query sets.
Source evidence

Abstract

Access to modern generative systems is often restricted to querying an API (the ``black-box" setting) and many properties of the system are unknown to the user at inference time. While recent work has shown that low-dimensional representations of models based on the relationship between their embedded responses to a set of queries are useful for inferring model-level properties, the quality of these representations is highly sensitive to the query set. We introduce the \emph{discriminative factorization} to distinguish between high- and low-quality query sets in the context of black-box model-level classification. Under this framework, the probability of chance-level classification decays exponentially in the query budget. On three auditing tasks, estimated factorization parameters predict the empirical performance decay rate. We conclude by showing that query sets selected using the estimated discriminative field reproduce the empirical ordering of oracle query sets.