Reader · no content
No body text on file.
Open the original to read the full piece.
François Chollet warns that reinforcement learning can boost performance in familiar settings but causes models to hallucinate and behave as if solving different trained tasks in unfamiliar settings. He highlights a Chris (@chatgpt21) report showing GPT-5.5 scored 0.43% on ARC AGI 3 (Claude 4.6: 0.45%, Gemini 3.1: 0.4%) and lists three specific GPT-5.5 failure modes.
François Chollet (May 1, 2026): "RL is a bit of a double edged sword: in known territory performance increases, but in unknown territory the model tends to hallucinate that it is performing a completely different task it was trained on."
Open the original to read the full piece.