Twitter/X

Andrej Karpathy (post dated 2026-04-02) indexes source documents into a raw/…

2026-04-02 · 20:42 UTC ·@karpathy ·3 min read

Brief

Karpathy describes an LLM-driven knowledge-base workflow where source docs (articles, papers, repos, datasets, images) are indexed into a raw/ directory and an LLM compiles a .md wiki with summaries, backlinks, categories and concept pages. He uses Obsidian as the front-end (with the Web Clipper and local image downloads) and reports a real-world scale example of ~100 articles (~400K words), at which point the LLM auto-maintains indices and answers complex queries across the corpus. Outputs are rendered as markdown, Marp slides or matplotlib figures and often re-filed into the wiki. He also runs LLM health checks, built a tiny search engine exposed via CLI, and is investigating synthetic-data generation and fine-tuning to bake the knowledge into model weights, arguing this approach could become a new product.

Why it matters

Andrej Karpathy (post dated 2026-04-02) indexes source documents into a raw/ directory and uses an LLM to incrementally "compile" a personal wiki composed of .md files that include summaries, backlinks, categories and concept articles.

Key details

His workflow uses Obsidian as the IDE and the Obsidian Web Clipper (plus a hotkey to download images locally) so the LLM can reference local markdown and images; Karpathy says he rarely edits the wiki manually and the LLM writes and maintains most content.
Scale example: his research wiki contains ~100 articles (~400K words); at that scale the LLM auto-maintains index files, handles complex Q&A over the corpus, and produces outputs as markdown, Marp slides, or matplotlib images that are then filed back into the wiki.
He runs LLM "health checks" to find inconsistencies, imputes missing data, built a small search engine handed to the LLM via CLI, and is exploring synthetic data generation and fine-tuning so the model could internalize the repo — he sees room for a dedicated product beyond scripts.

Reader · no content

No body text on file.

Open the original to read the full piece.