ArXiv

Factual recall in linear associative memories: sharp asymptotics and mechanistic insights

Authors
Alessio Giorlandino, Sebastian Goldt, Antoine Maillard
Categories
stat.ML, cond-mat.dis-nn, cond-mat.stat-mech, cs.LG
arXiv
https://arxiv.org/abs/2605.10795v1
PDF
https://arxiv.org/pdf/2605.10795v1

Brief

The paper analyzes limits of factual recall in linear associative memories by introducing a decoupled model and using statistical‑physics tools to obtain sharp asymptotics. It proves equivalence between the decoupled and original models (capacity, weight spectra, mechanism) and derives the capacity law pc log pc / d^2 = 1/2. The work explains why optimal learning nudges correct scores just above extreme‑value thresholds and generalizes results to two‑layer linear architectures, providing a baseline for memory in neural networks.

Why it matters

Capacity formula: in the decoupled/analytic limit the model can store up to p_c associations satisfying p_c log p_c / d^2 = 1/2 (i.e., p_c log p_c = d^2/2), giving a sharp asymptotic for storage vs. embedding dimension d.

Key details

  • Model equivalence: a decoupled model (each input has independent competing outputs) is numerically and analytically shown to be equivalent to the original linear associative memory in storage capacity, spectra of learned weights, and storage mechanism.
  • Mechanism and architecture: using statistical‑physics methods the authors show the optimal solution raises correct input–target scores just above the extreme‑value threshold set by competitors (contrasting with broad Hebbian fluctuations), and they extend the capacity computation to linear two‑layer networks.
Source evidence

Abstract

Large language models demonstrate remarkable ability in factual recall, yet the fundamental limits of storing and retrieving input--output associations with neural networks remain unclear. We study these limits in a minimal setting: a linear associative memory that maps $p$ input embeddings in $\mathbb{R}^d$ to their corresponding~$d$-dimensional targets via a single layer, requiring each mapped input to be well separated from all other targets. Unlike in supervised classification, this strict separation induces~$p$ constraints per association and produces strong correlations between constraints that make a direct characterisation of the storage capacity difficult. Here, we provide a precise characterisation of this capacity in the following way. We first introduce a decoupled model in which each input has its own independent set of competing outputs, and provide numerical and analytical evidence that this decoupled model is equivalent to the original model in terms of storage capacity, spectra of the learnt weights, and storage mechanism. Using tools from statistical physics, we show that the decoupled model can store up to $pc \log pc / d^2 = 1 / 2$ associations, and generalise the computation of $p_c$ to linear two-layer architectures. Our analysis also gives mechanistic insight into how the optimal solution improves over a naïve Hebbian learning rule: rather than boosting input-output alignments with broad fluctuations, the optimal solution raises the correct scores just above the extreme-value threshold set by the competing outputs. These findings give a sharp statistical-physics characterisation of factual storage in linear networks and provide a baseline for understanding the memory capacity of more realistic neural architectures.