ArXiv

Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge

2026-05-11 · 16:30 UTC ·Wenbo Zhang, Lijinghua Zhang, Liner Xiang... ·1 min read

Authors: Wenbo Zhang, Lijinghua Zhang, Liner Xiang...
Categories: cs.AI, cs.CL, stat.ML
arXiv: https://arxiv.org/abs/2605.10805v1
PDF: https://arxiv.org/pdf/2605.10805v1

Brief

The paper studies LLM-as-a-Judge trade-offs and finds that explicit reasoning markedly boosts accuracy on structured verification tasks (e.g., math and coding) while offering limited or negative benefit on simpler evaluations and costing substantially more compute. To address this, Wenbo Zhang et al. introduce RACER, which adaptively routes examples between reasoning and non-reasoning judges under a fixed budget via distributionally robust optimization with a KL uncertainty set. RACER uses an efficient primal–dual solver, has theoretical guarantees (unique optimal policy, linear convergence), and empirically attains better accuracy–cost trade-offs under distribution shift. (Based on the paper abstract; ICML 2026 acceptance.)

Why it matters

Controlled comparisons by Wenbo Zhang et al. (accepted at ICML 2026, posted 2026-05-11) show explicit chain-of-thought style reasoning substantially improves LLM-as-judge accuracy on structured verification tasks (notably math and coding) but gives limited or even negative gains on simpler evaluations.

Key details

Reasoning-based judges incur significantly higher computational cost, motivating selective use under a fixed budget rather than universal deployment.
The authors propose RACER, a Robust Adaptive Cost-Efficient Routing scheme that frames routing as a constrained distributionally robust optimization with a KL-divergence uncertainty set; RACER admits an efficient primal–dual algorithm and provable guarantees including uniqueness of the optimal policy and linear convergence, and yields superior accuracy–cost trade-offs under distribution shift.

Source evidence

Abstract

Reasoning-capable large language models (LLMs) have recently been adopted as automated judges, but their benefits and costs in LLM-as-a-Judge settings remain unclear. Through controlled comparisons between reasoning and non-reasoning judges, we show that explicit reasoning substantially improves judgment accuracy on tasks requiring structured verification (e.g., math and coding), while offering limited or even negative gains on simpler evaluations and incurring significantly higher computational cost. These findings motivate that reasoning should be used selectively rather than universally, with awareness of possible distribution shift. We propose a Robust Adaptive Cost-Efficient Routing (RACER), which dynamically selects between reasoning and non-reasoning judges under a fixed budget by formulating routing as a constrained distributionally robust optimization problem. RACER explicitly accounts for distribution shift via a KL-divergence uncertainty set, admits an efficient primal--dual algorithm, and enjoys theoretical guarantees including uniqueness of the optimal policy and linear convergence. Extensive experiments show that RACER achieves superior accuracy--cost trade-offs under distribution shift.

Comment: Accepted at ICML 2026