ArXiv

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

Authors
Ariel Gera, Shir Ashury-Tahan, Gal Bloch...
Categories
cs.CL, cs.IR, cs.LG
arXiv
https://arxiv.org/abs/2605.12487v1
PDF
https://arxiv.org/pdf/2605.12487v1

Brief

Task-Adaptive Embedding Refinement via Test-time LLM Guidance presents a test-time method that refines query embeddings via a generative LLM's feedback on a small document subset to tailor embeddings to ad-hoc zero-shot search and classification tasks. Experiments with state-of-the-art embedding models across diverse benchmarks report consistent gains (up to +25% relative), improving ranking and class separation; code released on GitHub. Abstract only; full text not provided here.

Why it matters

Proposes test-time LLM-guided query refinement that updates a query's embedding using feedback from a generative LLM on a small set of documents, improving ranking quality and producing clearer binary separation in the embedding space.

Key details

  • Experiments with state-of-the-art text embedding models across diverse search and classification benchmarks show consistent gains across all models/datasets, with relative improvements up to +25% on literature search, intent detection, key-point matching, and nuanced instruction-following.
  • Method broadens practical zero-shot use of embeddings as a cheaper alternative to corpus-scale LLM pipelines; authors Ariel Gera, Shir Ashury-Tahan, Gal Bloch et al. released code at https://github.com/IBM/task-aware-embedding-refinement (arXiv:2605.12487, 12 May 2026).
Source evidence

Abstract

We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of documents, enabling embeddings to adapt in real time to the target task. We conduct extensive experiments with state-of-the-art text embedding models across a diverse set of challenging search and classification benchmarks. Empirical results indicate that LLM-guided query refinement yields consistent gains across all models and datasets, with relative improvements of up to +25% in literature search, intent detection, key-point matching, and nuanced query-instruction following. The refined queries improve ranking quality and induce clearer binary separation across the corpus, enabling the embedding space to better reflect the nuanced, task-specific constraints of each ad-hoc user query. Importantly, this expands the range of practical settings in which embedding models can be effectively deployed, making them a compelling alternative when costly LLM pipelines are not viable at corpus-scale. We release our experimental code for reproducibility, at https://github.com/IBM/task-aware-embedding-refinement.