Twitter/X

François Chollet (May 1, 2026)

Brief

François Chollet warns that reinforcement learning can boost performance in familiar settings but causes models to hallucinate and behave as if solving different trained tasks in unfamiliar settings. He highlights a Chris (@chatgpt21) report showing GPT-5.5 scored 0.43% on ARC AGI 3 (Claude 4.6: 0.45%, Gemini 3.1: 0.4%) and lists three specific GPT-5.5 failure modes.

Why it matters

François Chollet (May 1, 2026): "RL is a bit of a double edged sword: in known territory performance increases, but in unknown territory the model tends to hallucinate that it is performing a completely different task it was trained on."

Key details

  • Chris (@chatgpt21) reports ARC AGI 3 scores: GPT-5.5 0.43%, Claude 4.6 0.45%, Gemini 3.1 0.4%, GPT-5.4 0.20%, Opus 4.7 0.18%.
  • Reported GPT-5.5 failure modes: "True local effect, false world model"; "Wrong level of abstraction from training data"; "Solved the level, didn’t reinforce the reward."
Reader · no content

No body text on file.

Open the original to read the full piece.