Twitter/X

ParseBench (announced 2026-04-13 by Jerry Liu/LlamaIndex) is an open-sourced…

Brief

ParseBench is LlamaIndex's open-sourced document OCR benchmark (announced 2026-04-13) with ~2,000 human-verified enterprise pages and 167,000+ test rules across five dimensions: tables, charts, content faithfulness, semantic formatting, and visual grounding. Benchmarking 14 parsers shows compute scaling yields diminishing returns, charts and layout/visual grounding remain major weaknesses, and LlamaParse tops overall at 84.9%.

Why it matters

ParseBench (announced 2026-04-13 by Jerry Liu/LlamaIndex) is an open-sourced document OCR benchmark with ~2,000 human-verified enterprise pages and 167,000+ test rules across five dimensions: tables, charts, content faithfulness, semantic formatting, and visual grounding.

Key details

  • The benchmark evaluated 14 document parsers (frontier/OSS VLMs, specialized parsers, and LlamaParse); LlamaParse achieved the highest overall score at 84.9% and led in 4 of the 5 dimensions.
  • Key failure modes: increasing compute gives diminishing returns (Gemini/GPT-5-mini/haiku gained only 3–5 points moving from minimal to high thinking at ~4× cost); charts are polarizing (most specialized parsers <6%); VLMs excel at visual understanding but perform poorly on layout/visual grounding (GPT-5-mini/haiku <10%).
Reader · no content

No body text on file.

Open the original to read the full piece.