Twitter/X

Author @alex_prompter (posted 2026-05-13) asserts 'The AI is 5% of the work' and…

Brief

The AI is 5% of the work. @alex_prompter argues that the remaining 95% — observability (Langfuse, Braintrust, Helicone), evals, durable runtimes (Temporal, Inngest), guardrails, memory (Pinecone, pgvector, Turbopuffer), tools (MCP servers, E2B, Modal, Browserbase), auth/multi-tenancy, cost controls, human‑in‑the‑loop, prompt versioning, orchestration, and model routing — is the real product. A CTO: observability + evals + durable runtime + guardrails = minimum viable production stack.

Why it matters

Author @alex_prompter (posted 2026-05-13) asserts 'The AI is 5% of the work' and enumerates the 95%: observability (Langfuse, Braintrust, Helicone), evals, durable runtime (Temporal, Inngest), guardrails, memory layer (Pinecone, pgvector, Turbopuffer), tools (MCP servers, E2B, Modal, Browserbase), auth/multi-tenancy, cost controls, human-in-the-loop, prompt versioning, orchestration, and model routing (LiteLLM, Portkey, OpenRouter).

Key details

  • A CTO with 10+ years quoted: 'Observability + evals + durable runtime + guardrails is the minimum viable production stack'; skipping those four produces the 'works-in-demo → on-fire-in-prod' gap killing agent startups right now.
  • Operational recommendations: prefer minimal orchestration with explicit state machines, implement model routing with fallback/prompt caching/version pinning, enforce cost controls and human approval gates (e.g., thresholds for spend or external emails) to prevent runaway agents and cross-tenant data exposure.
Source evidence

The AI is 5% of the work.

The 95% that breaks:
→ Observability (Langfuse, Braintrust, Helicone) - you can't debug what you can't see
→ Evals - regression suites for non-deterministic software. The new CI.
→ Durable runtime (Temporal, Inngest) - so a 10-minute agent run survives a server restart
→ Guardrails - prompt injection detection, PII redaction, output filtering
→ Memory layer - vector DBs (Pinecone, pgvector, Turbopuffer), retrieval, session state
→ Tools layer - MCP servers, sandboxed code execution (E2B, Modal), browser automation (Browserbase)
→ Auth + multi-tenancy - your agent calling Salesforce for customer A must NEVER see customer B's anything
→ Cost controls - agents in runaway loops burn $$ in minutes
→ Human-in-the-loop - approval gates for "spend more than $X" or "send external email"
→ Prompt versioning - prompts are code, treat them like code
→ Orchestration - plan-act-observe-repeat. Most serious teams are moving toward minimal orchestration + explicit state machines over heavy frameworks.
→ Model routing - LiteLLM, Portkey, OpenRouter for fallback, prompt caching, and version pinning so a vendor update doesn't silently change your product

A CTO with 10+ years shipping production gave me the honest version: "Observability + evals + durable runtime + guardrails is the minimum viable production stack.

Skip those four and you get the works-in-demo → on-fire-in-prod gap killing agent startups right now."

The LLM is the easy part.

Everything around it is the actual company.