ArXiv

SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

2026-05-08 · 17:32 UTC ·Tianfei Ren, Zhipeng Yan, Yiming Zhao... ·1 min read

Authors: Tianfei Ren, Zhipeng Yan, Yiming Zhao...
Categories: cs.CV, cs.AI
arXiv: https://arxiv.org/abs/2605.08043v1
PDF: https://arxiv.org/pdf/2605.08043v1

Brief

SCOPE proposes a specification-guided framework that tracks semantic commitments across generation by decomposing intents into structured specifications and conditionally calling retrieval, reasoning, and repair skills to fix violations (the 'Conceptual Rift'). Evaluated on the new Gen-Arena benchmark with the EGIP metric, SCOPE yields 0.60 EGIP and also performs strongly on WISE-V (0.907) and MindBench (0.61). Summary based on the abstract (full text not provided).

Why it matters

SCOPE (Structured Decomposition and Conditional Skill Orchestration) maintains persistent semantic commitments via an evolving structured specification and conditionally invokes retrieval, reasoning, and repair skills to resolve the identified 'Conceptual Rift' in complex text-to-image intent realization.

Key details

The paper introduces the human-annotated Gen-Arena benchmark with entity- and constraint-level specifications and the Entity-Gated Intent Pass Rate (EGIP). SCOPE achieves 0.60 EGIP on Gen-Arena and strong results on WISE-V (0.907) and MindBench (0.61).

Cleaned source text

Abstract