Twitter/X

@randall_balestr claims there are now early but promising provable measures…

Brief

Randall Balestr highlights a small but growing set of papers that purportedly provide provable links between pretraining configurations and downstream eval performance across vision, MAE, JEPA, and LLM/NLP settings. The thread frames this as an early step toward a real theory of model behavior, echoing Terence Tao’s observation that while LLM training mechanics are understandable, predictive theory for capability differences is still largely missing.

Why it matters

@randall_balestr claims there are now early but promising provable measures linking pretraining setup choices to downstream evaluation performance across multiple paradigms: computer vision (arXiv:2402.11337), computer vision with noise (arXiv:2505.12477), masked autoencoders/MAE (arXiv:2508.15404), JEPA (arXiv:2205.11508), and LLM/NLP (arXiv:2505.17169).

Key details

  • The post argues these results are the beginning of a theory for when pretraining objectives align with benchmark performance, rather than relying only on empirical trial-and-error.
  • A quoted Terence Tao remark reinforces the point: the mechanics of training and running LLMs are not mathematically hard, but there is still no predictive theory for why models succeed on some tasks and fail on others, so the field remains heavily empirical.
Source evidence

title: @randallbalestr: We start having provable measures of alignment between pretraining setups and eval perfs:
- CV arxiv...
author: @randall
balestr
contenttype: tweet
publication: Twitter/X
published: 2026-01-01T15:53:42+00:00
source
url: https://x.com/randall_balestr/status/2006755721862590623

word_count: 82

We start having provable measures of alignment between pretraining setups and eval perfs:
- CV arxiv.org/abs/2402.11337
- CV + noise: arxiv.org/abs/2505.12477
- MAE arxiv.org/abs/2508.15404
- JEPA arxiv.org/abs/2205.11508
- LLM/NLP: arxiv.org/abs/2505.17169
Very early but promising!

Haider. (@slow_developer)

Mathematician Terence Tao:

Training and running LLMs isn't mathematically difficult; any math undergrad could understand the basics

The mystery is that we have no theory to predict why models excel at certain tasks and fail at others

"we can only make empirical experiments"

Video

— https://nitter.net/slow_developer/status/2006364731037139092#m