ArXiv

Solve the Loop: Attractor Models for Language and Reasoning

Authors
Jacob Fein-Ashley, Paria Rashidinejad
Categories
cs.LG, cs.AI, cs.CL, cs.NE
arXiv
https://arxiv.org/abs/2605.12466v1
PDF
https://arxiv.org/pdf/2605.12466v1

Brief

Attractor Models introduce a two-stage iterative-refinement architecture where a backbone proposes embeddings and an attractor module solves for a fixed point with gradients via implicit differentiation, which keeps training memory constant and allows adaptive iteration depth. Empirically (abstract only), they report up to 46.6% perplexity reduction and 19.7% downstream accuracy gains in pretraining—e.g., a 770M model beating a 1.3B Transformer trained on twice the tokens—and strong few-shot reasoning (27M params, ~1,000 examples: 91.4% Sudoku-Extreme, 93.1% Maze-Hard). The paper also describes 'equilibrium internalization,' where fixed-point training moves initial outputs near equilibrium so the solver can be removed at inference with little loss. Full text was not available here (abstract only).

Why it matters

Attractor Models use a two-stage design: a backbone proposes output embeddings and an attractor module refines them to a fixed point via implicit differentiation, keeping training memory constant and allowing adaptive iteration counts.

Key details

  • In large-scale language-model pretraining, Attractor Models improve perplexity by up to 46.6% and downstream accuracy by up to 19.7%; a 770M Attractor Model outperformed a 1.3B Transformer trained on twice as many tokens (paper published 2026-05-12).
  • On reasoning benchmarks, a 27M-parameter Attractor Model trained with ~1,000 examples achieved 91.4% on Sudoku-Extreme and 93.1% on Maze-Hard; the authors report frontier models (Claude, GPT o3) fail completely and observe 'equilibrium internalization' that lets the solver be removed at inference with little degradation.
Source evidence

Abstract

Looped Transformers offer a promising alternative to purely feed-forward computation by iteratively refining latent representations, improving language modeling and reasoning. Yet recurrent architectures remain unstable to train, costly to optimize and deploy, and constrained to small, fixed recurrence depths. We introduce Attractor Models, in which a backbone module first proposes output embeddings, then an attractor module refines them by solving for the fixed point, with gradients obtained through implicit differentiation. Thus, training memory remains constant in effective depth, and iterations are chosen adaptively by convergence. Empirically, Attractor Models outperform existing models across two regimes, large-scale language-model pretraining and reasoning with tiny models. In language modeling, Attractor Models deliver a Pareto improvement over standard Transformers and stable looped models across sizes, improving perplexity by up to 46.6% and downstream accuracy by up to 19.7% while reducing training cost. Notably, a 770M Attractor Model outperforms a 1.3B Transformer trained on twice as many tokens. On challenging reasoning tasks, we show that our model with only 27M parameters and approximately 1000 examples achieves 91.4% accuracy on Sudoku-Extreme and 93.1% on Maze-Hard, scaling favorably where frontier models like Claude and GPT o3, fail completely, and specialized recursive reasoners collapse at larger sizes. Lastly, we show that Attractor Models exhibit a novel phenomenon, which we call equilibrium internalization: fixed-point training places the model's initial output embedding near equilibrium, allowing the solver to be removed at inference time with little degradation. Together, these results suggest that Attractor Models make iterative refinement scalable by turning recurrence into a computation the model can learn to internalize.