Briefing · 2026-05-13

Your briefing

100 ranked ·2026-05-13T23:02
TOP 5 TODAY

Start with these

Five items worth reading before you scan the full briefing.

Full briefing
  1. 1 datacenterdynamics.com · 2026-05-11 · 1 min Stream Now: DCD>Inside APAC DCD Investment & Markets Channel published a livestream, “DCD>Inside APAC,” on 2026-05-11 that examines Asia‑Pacific data‑centre strategy amid rising AI investment and sovereign cloud mandates, highlighting mature hubs (Singapore, Sydney, Tokyo) and fast‑emerging markets (Indonesia, Malaysia, the Philippines).
  2. 2 substack.com · 2026-05-12 · 3 min The Gonzo Size of the HyperScaler DataCenter Investment Boom: Chart of the Day Brad DeLong (Grasping Reality newsletter) estimates hyperscaler datacenter investment will total about $1.5 trillion in 2026 (post dated May 12, 2026).
  3. 3 divenewsletter.com · 2026-05-11 · 5 min PPL-Blackstone joint venture’s ‘advanced’ data center pipeline in Pennsylvania… PPL-Blackstone joint venture’s ‘advanced’ data center pipeline in Pennsylvania reached 28.3 GW, and company officials said the JV is securing gas turbines for new power plants intended to serve that data center load (Utility Dive, May 11, 2026).
  4. 4 datacenterdynamics.com · 2026-05-12 · 2 min Your streaming link: Keeping IT Cool: Liquid 2026 Day Two Article created 2026-03-11 12:44 and last updated 2026-04-04 00:00; email from James Raddings (Digital Portfolio Lead, DCD) promoting Day Two of the Keeping IT Cool | Liquid series for 13 May 2026.
  5. 5 substack.com · 2026-05-12 · 5 min Semis Memo: Supply Chain Inheritance Nvidia in a May 2025 technical blog credited the electric‑vehicle and solar industries as the source of the 800V DC rack architecture, implying AI rack power designs are inheriting EV/solar supply‑chain technology.
MUST READ

Start here. These are the items with the strongest reader value today.

9 items
2 substack.com 2026-05-12 3 min read
Open

The Gonzo Size of the HyperScaler DataCenter Investment Boom: Chart of the Day

Why it matters

Brad DeLong (Grasping Reality newsletter) estimates hyperscaler datacenter investment will total about $1.5 trillion in 2026 (post dated May 12, 2026).

  • DeLong's 2026 breakdown (largest four hyperscalers): Amazon ~$0.5T, Google ~$0.45T, Microsoft ~$0.35T, Meta/Facebook ~$0.3T — the overwhelming bulk of the spend.
  • Macro scale: that capex run-rate is roughly 1/4 of all U.S. capital investment, ~1/20 of global investment, ~1/20 of U.S. GDP, and ~1/100 of world GDP.
  • DeLong also gives smaller guesses for other firms: Oracle ~$35B and Apple ~$18B, and characterizes the spending as a $1.5T AI 'arms race' that is shifting balance sheets from software-like to railroad-like capital intensity.

Brad DeLong's May 12, 2026 Grasping Reality post quantifies the hyperscaler datacenter buildout as roughly a $1.5 trillion run‑rate in 2026, driven mainly by four firms — Amazon (~$0.5T), Google (~$0.45T), Microsoft (~$0.35T) and Meta/Facebook (~$0.3T) — with smaller estimated contributions from Oracle (~$35B) and Apple (~$18B). DeLong highlights a chart to convey scale and emphasizes macro implications: this single wave of capex equals about one quarter of U.S. capital investment and a few percent of global investment and GDP. He frames the spree as an AI arms race that turns tech balance sheets into railroad‑style capital intensity, raising questions about long‑term value versus a dollar‑auction dynamic.

By Brad DeLong, from Grasping Reality Newsletter
3 divenewsletter.com 2026-05-11 5 min read
Open

PPL-Blackstone joint venture’s ‘advanced’ data center pipeline in Pennsylvania…

Why it matters

PPL-Blackstone joint venture’s ‘advanced’ data center pipeline in Pennsylvania reached 28.3 GW, and company officials said the JV is securing gas turbines for new power plants intended to serve that data center load (Utility Dive, May 11, 2026).

  • Vistra announced it will add 4.5 GW of capacity; CEO Jim Burke said the additions align with what the company considers ‘reasonable’ load forecasts for PJM and ERCOT, noting Vistra’s outlook is below many third‑party and ISO projections but reflects the expected pace of physical development.
  • The Department of Energy’s proposal for ‘nuclear lifecycle innovation campuses’ would partner with states to accept and potentially recycle used nuclear fuel — several states have shown interest, but the plan faces significant practical and policy hurdles.

Utility Dive’s May 11, 2026 Daily Dive highlights rapid power development tied to data center demand and evolving policy proposals. PPL’s ‘advanced’ Pennsylvania data center pipeline has grown to 28.3 GW, and a PPL‑Blackstone joint venture is reportedly procuring gas turbines to build power plants that would directly serve that load. Separately, Vistra disclosed a 4.5 GW capacity addition and said CEO Jim Burke considers the move consistent with ‘reasonable’ PJM and ERCOT forecasts, while stressing the company’s outlook remains below many independent and ISO projections but matches anticipated physical buildout. The newsletter also covers DOE’s push for ‘nuclear lifecycle innovation campuses’ to take in and possibly recycle spent fuel — an idea drawing some state interest but facing practical and policy challenges — and includes an opinion defending competitive power markets.

By Utility Dive
4 datacenterdynamics.com 2026-05-12 2 min read
Open

Your streaming link: Keeping IT Cool: Liquid 2026 Day Two

Why it matters

Article created 2026-03-11 12:44 and last updated 2026-04-04 00:00; email from James Raddings (Digital Portfolio Lead, DCD) promoting Day Two of the Keeping IT Cool | Liquid series for 13 May 2026.

  • Event streams live 9am ET (2pm BST) on Wednesday 13 May 2026 with on-demand replay; sessions target AI-ready data centres, addressing rack densities beyond 100 kW and the shift of cooling architectures from the chiller to the chip.
  • Session lineup and speakers: 9am ET — nVent (Jason Matteson, Patrick McCarthy, Matthew Archibald); 10am ET — Alison Deane (ZutaCore) & Christopher Sullivan (Oregon State University); 11am ET — Airedale (Mark Johnson, Reece Thomas, Mike Kis, Patrick Cotton); 12:00pm ET — Dylan Osiecki (CPC) on dry-break quick disconnects; 12:30pm ET — Prof. Yogendra Joshi (Georgia Tech), Prof. Srikanth Rangarajan & Prof. Bahgat Sammakia (Binghamton), and Pritish Parida (IBM Research).

Keeping IT Cool | Liquid — Day Two of DCD's Liquid series streams live at 9am ET (2pm BST) on Wednesday, 13 May 2026, focusing on liquid cooling for AI-ready data centres. Sessions cover >100 kW rack densities, scalable cooling loops, retrofit strategies, chiller-to-chip architectures, dry-break quick-disconnect reliability testing, and research from Georgia Tech, Binghamton and IBM.

By James Raddings
5 substack.com 2026-05-12 5 min read
Open

Semis Memo: Supply Chain Inheritance

Why it matters

Nvidia in a May 2025 technical blog credited the electric‑vehicle and solar industries as the source of the 800V DC rack architecture, implying AI rack power designs are inheriting EV/solar supply‑chain technology.

  • Analog and power semiconductor names — including Texas Instruments (TXN US), NXP (NXPI US), Murata Manufacturing (6981 JP), Vishay (VSH US) and Samsung Electro‑Mechanics (009150 KS) — are seeing outperformance driven by MLCC shortages and rising data‑center demand; firms are raising ASPs rather than adding capacity, reflecting a cautious post‑COVID capex stance.
  • Citrini launched the Semis Memo series in January 2026 (analysts Zephyr and Jukan) using a macro‑first framework to hunt for sectors where legacy, non‑AI headwinds understate potential AI‑driven demand (topics flagged include CPUs in the agentic era, neoclouds, materials bottlenecks, and 'Korea Unlocked').

Analog and power semiconductors are being repositioned by AI data‑center buildouts that “inherit” EV and solar supply‑chain investments: Nvidia’s May 2025 note on 800V DC rack architecture explicitly ties rack power technology to EV/solar, and the market is responding. Companies such as TXN, NXPI, Murata, Vishay and Samsung Electro‑Mechanics are benefiting from an MLCC shortage and higher data‑center orders while maintaining conservative capex (capex/revenue) — choosing to let ASPs rise instead of rapidly expanding capacity. Citrini’s Semis Memo (launched January 2026) applies a macro‑first method to find where forecasts still reflect legacy headwinds and where AI demand can exceed expectations; the broader memo also previews analysis of CPUs for agentic workloads, neocloud inference supply, material bottlenecks, and South Korea’s semiconductor role.

By Citrini
6 divenewsletter.com 2026-05-12 5 min read
Open

Fluence Energy signed master supply agreements with two unnamed “major”…

Why it matters

Fluence Energy signed master supply agreements with two unnamed “major” hyperscalers; Jefferies analyst Julien Dumoulin‑Smith said the deals arrived earlier than expected and mark “significant progress” for Fluence’s emerging data center thesis.

  • TXU Energy’s special retail EV plan in Texas gives Ford electric‑vehicle drivers 15 hours per day of “free” home charging; Ford reports it shifted 515 MWh of load to off‑peak periods in 2025 under the program.
  • Constellation Energy entered roughly 5 GW of new capacity (nuclear, gas and battery) into the PJM interconnection queue, while some prospective data‑center customers are pausing procurement pending PJM colocation and backstop auction rule outcomes.
  • CAISO says the Extended Day‑Ahead Market (EDAM) is “solid and stable,” with prices in expected ranges and steady transfer volumes; CAISO CEO Elliot Mainzer noted battery energy storage has become a major player on the Western grid.

Utility Dive’s Storage Weekly roundup highlights several market moves and metric updates across storage, EV charging and grid markets. Fluence announced master supply agreements with two major hyperscalers — a deal Jefferies’ Julien Dumoulin‑Smith called earlier‑than‑expected and strategically important for Fluence’s data‑center angle. In Texas, TXU’s EV charging program delivered 15 hours/day of subsidized home charging for Ford EV drivers and enabled Ford to shift 515 MWh to off‑peak in 2025. Constellation queued ~5 GW of nuclear, gas and battery projects into PJM even as some data‑center customers pause decisions pending PJM colocation/backstop auction rules. CAISO reported EDAM performance within expected price ranges and steady transfer volumes, with CEO Elliot Mainzer saying batteries are now a major Western‑grid resource. Sunrun suffered steep Q1 sales drops but grew networked storage to ~4.3 GWh (50% YoY) and aims for 10 GWh by 2028; separate E3 analysis finds medium/heavy fleet electrification could modestly lower residential rates by 2035 if grid upgrades are managed.

By Utility Dive: Storage
7 datacenterdynamics.com 2026-05-12 1 min read
Open

What’s driving India’s next wave of data center expansion?

Why it matters

NTT Global Data Centers' whitepaper, published on Data Centre Dynamics on 12 May 2026, identifies AI growth, accelerating cloud adoption, data‑localization mandates, and rising enterprise demand as the primary drivers of India’s next wave of data‑center expansion.

  • The report maps market dynamics, infrastructure trends, investment drivers and regulatory/operational considerations to guide operators and enterprises planning long‑term scale, connectivity and cloud deployment strategies across India in 2026 and beyond.

NTT Global Data Centers' whitepaper (Data Centre Dynamics, 12 May 2026) frames India as an emerging global digital‑infrastructure hub, driven by AI workloads, fast cloud adoption, data‑localization rules and growing enterprise demand. It analyzes market dynamics, infrastructure trends, investment drivers and regulatory/operational issues to help operators plan long‑term capacity, connectivity and cloud expansion.

By DCD Cloud & Hybrid
8 divenewsletter.com 2026-05-12 5 min read
Open

Constellation Energy entered about 5 GW of new capacity into the PJM…

Why it matters

Constellation Energy entered about 5 GW of new capacity into the PJM interconnection queue (composed of nuclear, gas and battery projects), and some potential data center customers are pausing decisions as PJM’s colocation and backstop auction rules are clarified (report dated May 12, 2026).

  • Con Edison announced a $29 billion plan to upgrade the New York City area grid to handle rising electrification from buildings and transportation; the company and reporting note NYC/suburbs are not seeing a major data center influx but are seeing gradual load growth.
  • System operators SPP, PJM and CAISO forecast they can meet summer 2026 power demand despite expectations for higher‑than‑average temperatures across their footprints (reliability outlook reported May 12, 2026).
  • Corporate clean energy procurement in the U.S. exceeded 27 GW in 2025, with four companies accounting for roughly three‑quarters of that volume, according to CEBA CEO Rich Powell.

Utility Dive’s May 12, 2026 Daily Dive highlights several industry developments: Constellation Energy has placed roughly 5 GW of nuclear, gas and battery projects into the PJM queue, though some colocation customers are pausing commitments pending clarity on PJM’s colocation and backstop auction rules. Con Edison unveiled a $29 billion program to harden and expand the NYC area grid to accommodate electrification of buildings and transport, noting only gradual local load growth from non‑data‑center sectors. Regional operators SPP, PJM and CAISO all project adequate supply for the coming summer despite forecasts of above‑average temperatures. The newsletter also cites corporate procurement trends—CEBA says U.S. corporations contracted over 27 GW in 2025 with four firms taking ~75%—and reports a PPL–Blackstone pipeline of 28.3 GW in Pennsylvania backed by gas turbine procurements. An opinion piece from Dominion Energy’s Sean Burri urges using AI as an operational partner to improve predictive reliability metrics.

By Utility Dive
9 substack.com 2026-05-12 47 min read
Open

The EDA Primer: From RTL to Silicon

Why it matters

Verification now dominates chip projects — RTL verification can consume up to 70% of total project effort and generates thousands of CPU core‑hours and multiple petabytes of regression data; verification engineers are the fastest‑growing role in chip firms.

  • Design complexity is outpacing productivity: chip complexity grows ~50% per year while design productivity improves only ~20% per year, driving larger teams, heavier EDA compute needs and longer verification burdens.
  • The EDA industry is concentrated in a 'Big Three' — Synopsys, Cadence and Siemens (Mentor acquired by Siemens for $4.5B in 2017; rebranded Siemens EDA in 2021) — with flagship tools named throughout the flow (e.g., Design Compiler/Fusion, Genus/iSpatial, IC Compiler II/Innovus, PrimeTime/Tempus, Calibre/Pegasus/IC Validator).
  • RTL-to‑silicon is a 13‑stage flow (Planning → Architecture → RTL → RTL Verification → RTL Freeze → FW/SW dev → Physical Design → Signoff → Tapeout → Fabrication → Post‑silicon → System Integration → Production) that relies on many specialized tools and PDKs; PDK versions progress 0.1/0.3 → 0.5 → 0.9 → 1.0, with Intel 18A milestones listed (PDK0.3 Sep 2022 → 0.5 Mar 2023 → 0.9 Sep 2023 → 1.0 Jul 2024; Panther Lake Jan 2026).

The primer frames modern chip design as an RTL→silicon pipeline driven by EDA tooling, with verification and physical design as the dominant cost and schedule risks. Designers write RTL (primarily SystemVerilog), lint with tools like VC SpyGlass, then run UVM constrained‑random testbenches on commercial simulators (VCS, Xcelium, Questa) that can consume thousands of core‑hours per regression and petabytes of disk. Formal engines (JasperGold, VC Formal) are used for exhaustive proofs on targeted properties while equivalence checkers (Formality, Conformal LEC) validate transformations between RTL, synthesized netlists and later layout edits. Coverage closure (line/branch/toggle/FSM plus functional covergroups) gates the RTL Freeze milestone; late fixes require ECOs and full re‑verification.

Physical design translates netlists into layout using standard‑cell libraries and a foundry PDK (LEF, .lib, SPICE, extraction decks, DRM). Synthesis (Synopsys Design Compiler, Cadence Genus, Fusion unified flows) picks cell flavors and drive strengths; place & route (IC Compiler II, Innovus) handles floorplanning, power‑grid, clock‑tree synthesis, placement and routing; signoff uses DRC/LVS/ERC (Calibre/Pegasus/IC Validator), STA (PrimeTime/Tempus) and power tools (RedHawk‑SC, Voltus). The article details PDK versioning (0.1/0.3 → 0.5 → 0.9 → 1.0) and Intel 18A’s timeline (PDK0.3 Sep 2022 → 0.5 Mar 2023 → 0.9 Sep 2023 → 1.0 Jul 2024 → Panther Lake Jan 2026). Pre‑silicon emulation (ZeBu‑200 up to 23B gates, Palladium Z3 up to 48B gates) accelerates firmware/software bring‑up. At system scale, DTCO and STCO (Sentaurus TCAD → Mystic → chip PPA evaluation → packaging co‑optimization) and tools like QuantumATK for materials work enable co‑optimization of process, cell libraries and package. The primer emphasizes industry pressures — one‑third of U.S. semiconductor staff >55, complexity growing ~50%/yr vs productivity ~20%/yr, and examples such as AMD’s MI455X (320 billion transistors, 12 dies, 2/3nm, Hybrid Bonding, HBM4, 224G SerDes) — to argue that EDA, unified flows and emerging AI‑driven design automation are now central to whether leading‑edge chips ship on time and at acceptable cost.

By SemiAnalysis
WORTH READING

Useful context and follow-up reading when you have more time.

27 items
1 substack.com 2026-05-11 4 min read
Open

You gave your AI agent real tools. Here's the 4-part control layer it's missing + the Judge Layer implementation g…

Why it matters

Nate (Nate’s Substack) published on May 11, 2026 an argument that production AI agents need a separate "Judge Layer" to decide whether proposed actions may execute, because subtle, correct‑looking actions (e.g., an email sent, customer record updated, PR opened) cause the next serious failures — not jailbreaks.

  • Prompting and approval modals both fail in production: prompts can't simultaneously pursue tasks and police them, and approval modals either break workflows or get habitually clicked; the practical fix is an architectural judge wrapped around the actor.
  • The article describes a builder toolkit and implementation guidance (the OpenBrain Judge Extender + a five‑prompt 'prompt kit') covering action classification, proposals, specialist judges, eval, memory governance, durable memory/provenance, and structured write‑back.
  • A concrete case — the 'Lindy' multi‑channel agent product — is used to show the failure mode and the architectural fix that stopped it, illustrating that orchestration (coordination) is distinct from judgment (permissioning).

Nate’s May 11, 2026 Substack post argues that once language models gain real tools, the critical missing layer is a separate Judge Layer that decides which agent proposals may act. Rather than dramatic jailbreaks, the next large failures will be subtle, correct‑looking actions with real consequences (emails sent, records changed, money spent). The piece explains why improved prompts and approval modals are insufficient — prompts can’t both execute and police, and modals either break UX or get ignored — and presents an architectural remedy: a judge wrapped around the actor. It outlines a builder toolkit (action classification, proposals, specialist judges, eval, memory governance) and delivers an OpenBrain Judge Extender implementation guide plus a five‑prompt kit to build a first judge wired to durable memory, provenance, and structured write‑back. A Lindy multi‑channel agent case study shows the failure and the effective fix, and the article stresses that orchestration and judgment are different problems requiring different layers.

By Nate from Nate’s Substack
2 divenewsletter.com 2026-05-12 1 min read
Open

Watch Webinar | Agentic AI for Holistic Major Event Management

Why it matters

Agentic AI for Holistic Major Event Management webinar (Schneider Electric) featured on Utility Dive, published 2026-05-12, proposes an integrated platform ecosystem unifying planning, real-time operations, and post-event recovery for utilities.

  • The approach targets a 5–7-day pre-event window and combines AI-augmented insights, wildfire modeling, vegetation analytics, and asset-health data to generate actionable, auditable plans; presenters say the agent-driven method scales beyond wildfire across the full grid lifecycle.
  • Utility Dive promoted the session in its newsletter, noting an advertising reach of 82,000+ utility executives and providing a watch-the-video link.

Agentic AI for Holistic Major Event Management (Schneider Electric, Utility Dive, 2026-05-12) presents an integrated platform approach that unifies planning, real-time operations, and post-event recovery. It emphasizes the 5–7-day pre-event window and uses AI-augmented insights, wildfire modeling, vegetation analytics, and asset-health data to produce actionable, auditable, agent-driven plans that scale across the grid lifecycle.

By Utility Dive
3 ArXiv 2026-05-11 1 min read
Open

What should post-training optimize? A test-time scaling law perspective

Why it matters

Defines a 'budget-mismatch' regime where training has m << N per-prompt rollouts but deployment uses best-of-N selection; under structural assumptions on reward tails, the authors show the best-of-N policy gradient can be approximated by extrapolating upper-tail statistics from much smaller rollout groups.

  • Proposes a family of Tail-Extrapolated estimators — a direct estimator, Tail-Extrapolated Advantage (TEA), and a fixed-order debiased Prefix-TEA based on moment cancellation — and reports that TEA and Prefix-TEA improve best-of-N performance on instruction-following tasks across different LMs, reward models, and datasets.
  • Paper: Muheng Li, Jian Qian, Wenlong Mou; arXiv:2605.10716v1 (posted 2026-05-11). PDF available at https://arxiv.org/pdf/2605.10716v1.

Large-language-model post-training often optimizes single-response mean reward, creating a mismatch with best-of-N test-time selection. The paper analyzes the budget-mismatch setting (m << N) and proves that, assuming structure in reward tails, best-of-N policy gradients can be extrapolated from small-rollout samples. It introduces Tail-Extrapolated estimators (TEA, Prefix-TEA) and shows improved best-of-N performance on instruction-following benchmarks. Summary based on the abstract only.

Authors: Muheng Li, Jian Qian, Wenlong Mou
4 sinocism.com 2026-05-11 9 min read
Open

Trump China visit; China’s Next Generation Industrial Policy; Standardizing and developing AI agents; No more defl…

Why it matters

PRC confirmed a Trump–Xi meeting in Beijing: Trump arrives May 13 and will depart after lunch Friday; Vice Premier He Lifeng will lead a delegation to South Korea May 12–13 to consult with Treasury Secretary Bessent; the US on May 11 added three Chinese satellite firms to an entity list and OFAC designated 12 individuals/entities for enabling IRGC oil shipments to China.

  • A Rhodium Group report for the U.S. Chamber of Commerce finds China has shifted to an "industrial policy of everything" since Made in China 2025, using recentralized, state-directed financing to deepen a manufacturing trade surplus—a "China Shock 2.0" that masks market-share gains in volume due to price deflation.
  • Li Qiang chaired a State Council executive meeting prioritizing domestic circulation, service-sector upgrades and construction of six networks (water, new power grid, computing-power, new-generation communications, urban underground pipes, logistics) while advancing risk resolution in real estate, local government debt, and small/medium financial institutions; Ding Xuexiang toured basic-research sites (CAS, Huairou lab, SJTU, UCAS, CATL, Huawei) and met Ren Zhengfei at Huawei’s Chip Fundamental Technology Research Laboratory.
  • Regulators (CAC, NDRC, MIIT) released Implementation Opinions on AI agents to coordinate development and security; the State Council’s August 2025 AI+ plan sets a phased target of >70% adoption for new-generation intelligent terminals and AI agents by 2027; April inflation rose (CPI +1.2% YoY, PPI +2.8% YoY), suggesting an end to deflation aided by rising energy prices.

China–U.S. summit dynamics dominate the brief: Beijing confirmed a May 13 Trump–Xi meeting (departure after lunch Friday) while Vice Premier He Lifeng travels to Seoul May 12–13 to meet U.S. Treasury Secretary Bessent, signaling last‑minute deliverable talks. Concurrent U.S. actions include an entity listing of three Chinese satellite firms and an OFAC designation of 12 individuals/entities tied to IRGC oil shipments. A Rhodium Group report for the U.S. Chamber warns China is deploying a systemic, state‑directed "industrial policy of everything," sustaining a large manufacturing surplus and deepening global input dependencies. Domestically, Li Qiang’s State Council meeting emphasizes six national networks and financial/real‑estate risk resolution. Regulators (CAC, NDRC, MIIT) published Implementation Opinions to standardize AI agents, aligning with a State Council goal of >70% adoption of new AI terminals/agents by 2027. April inflation readings (CPI +1.2%, PPI +2.8% YoY) suggest deflationary pressures are easing.

By Bill Bishop at Sinocism
5 ArXiv 2026-05-11 1 min read
Open

Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge

Why it matters

Controlled comparisons by Wenbo Zhang et al. (accepted at ICML 2026, posted 2026-05-11) show explicit chain-of-thought style reasoning substantially improves LLM-as-judge accuracy on structured verification tasks (notably math and coding) but gives limited or even negative gains on simpler evaluations.

  • Reasoning-based judges incur significantly higher computational cost, motivating selective use under a fixed budget rather than universal deployment.
  • The authors propose RACER, a Robust Adaptive Cost-Efficient Routing scheme that frames routing as a constrained distributionally robust optimization with a KL-divergence uncertainty set; RACER admits an efficient primal–dual algorithm and provable guarantees including uniqueness of the optimal policy and linear convergence, and yields superior accuracy–cost trade-offs under distribution shift.

The paper studies LLM-as-a-Judge trade-offs and finds that explicit reasoning markedly boosts accuracy on structured verification tasks (e.g., math and coding) while offering limited or negative benefit on simpler evaluations and costing substantially more compute. To address this, Wenbo Zhang et al. introduce RACER, which adaptively routes examples between reasoning and non-reasoning judges under a fixed budget via distributionally robust optimization with a KL uncertainty set. RACER uses an efficient primal–dual solver, has theoretical guarantees (unique optimal policy, linear convergence), and empirically attains better accuracy–cost trade-offs under distribution shift. (Based on the paper abstract; ICML 2026 acceptance.)

Authors: Wenbo Zhang, Lijinghua Zhang, Liner Xiang...
6 ArXiv 2026-05-11 1 min read
Open

CADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generation

Why it matters

CADBench provides 18,000 evaluation samples spanning six benchmark families (derived from DeepCAD, Fusion 360, ABC, MCB, and Objaverse), five input modalities (clean meshes, noisy meshes, single-view renders, photorealistic renders, multi-view renders), and six metrics covering geometric fidelity, executability, and program compactness; STEP-based families are stratified by B-rep face count and diversity-sampled.

  • The authors benchmarked 11 CAD-specialized and general-purpose vision-language systems, generating over 1.4 million CAD programs; under idealized inputs, specialized mesh-to-CAD models substantially outperform code-generating VLMs, and CADBench exposes three failure modes: quality drops with geometric complexity, CAD-specialized models are brittle under modality shift, and model rankings vary by metric.

CADBench is a unified multimodal benchmark for recovering editable CAD programs from 2D/3D inputs, assembled from 18,000 samples across six families and five input modalities with six evaluation metrics. The authors evaluated 11 systems (>1.4M generated programs) and found mesh-to-CAD specialists beat code-generating VLMs under ideal inputs; they report brittleness under modality shift and complexity-related failures. (Summary based on abstract; full text not available.)

Authors: Anna C. Doris, Jacob Thomas Sony, Ghadi Nehme...
7 ArXiv 2026-05-11 1 min read
Open

Personal Visual Context Learning in Large Multimodal Models

Why it matters

Formalized Personal Visual Context Learning (Personal VCL) and released Personal-VCL-Bench, a benchmark capturing personal first‑person visual world across persons, objects, and behaviors (arXiv preprint 2026-05-11 by Zihui Xue, Ami Baid, Sangho Kim, Mi Luo, Kristen Grauman).

  • Analyzed frontier LMMs and found a pronounced context-utilization gap; proposed the Agentic Context Bank (a self‑refining memory bank with query‑adaptive evidence selection) which consistently improves over standard context prompting across tasks and evaluated backbones.

Personal Visual Context Learning (Personal VCL) formalizes how wearable-driven LMMs should use user-specific first-person visual evidence to resolve personalized queries. Xue et al. introduce Personal-VCL-Bench (covers persons, objects, behaviors), identify a pronounced context-utilization gap in frontier LMMs, and propose the Agentic Context Bank—a self‑refining memory plus query‑adaptive evidence selection—that consistently improves performance over standard context prompting across tasks and evaluated backbones.

Authors: Zihui Xue, Ami Baid, Sangho Kim...
8 ArXiv 2026-05-11 1 min read
Open

Factual recall in linear associative memories: sharp asymptotics and mechanistic insights

Why it matters

Capacity formula: in the decoupled/analytic limit the model can store up to p_c associations satisfying p_c log p_c / d^2 = 1/2 (i.e., p_c log p_c = d^2/2), giving a sharp asymptotic for storage vs. embedding dimension d.

  • Model equivalence: a decoupled model (each input has independent competing outputs) is numerically and analytically shown to be equivalent to the original linear associative memory in storage capacity, spectra of learned weights, and storage mechanism.
  • Mechanism and architecture: using statistical‑physics methods the authors show the optimal solution raises correct input–target scores just above the extreme‑value threshold set by competitors (contrasting with broad Hebbian fluctuations), and they extend the capacity computation to linear two‑layer networks.

The paper analyzes limits of factual recall in linear associative memories by introducing a decoupled model and using statistical‑physics tools to obtain sharp asymptotics. It proves equivalence between the decoupled and original models (capacity, weight spectra, mechanism) and derives the capacity law pc log pc / d^2 = 1/2. The work explains why optimal learning nudges correct scores just above extreme‑value thresholds and generalizes results to two‑layer linear architectures, providing a baseline for memory in neural networks.

Authors: Alessio Giorlandino, Sebastian Goldt, Antoine Maillard
9 Twitter/X 2026-05-12 2 min read
Open

GBrain is Garry Tan (Y Combinator CEO)'s personal agent brain that treats a…

Why it matters

GBrain is Garry Tan (Y Combinator CEO)'s personal agent brain that treats a folder of Markdown pages as the system of record: one page per person/company/concept with a top "Compiled truth" and an append-only "Timeline"; edits in VS Code are picked up by gbrain sync.

  • Instead of relying solely on vector embeddings, GBrain deterministically extracts entity references and typed relations (works_at, invested_in, founded, attended, advises) via regex with zero LLM calls, wiring a knowledge graph on every write and using ~20 search techniques (intent classification, multi-query expansion, vector search, reciprocal rank fusion, cosine re-scoring, compiled-truth boosting, backlink ranking, etc.).
  • GBrain compounds knowledge automatically: a signal detector creates stubs after one mention and triggers web enrichment after ~3 mentions; an overnight "dream cycle" enriches entities, fixes citations, and consolidates memory. It ships with 34 skills, runs on embedded PGLite (no server, ~2s startup) and can act as an MCP server for Claude Code, Cursor, and Windsurf (github.com/garrytan/gbrain).

GBrain, built by Garry Tan, is a markdown-first agent memory that makes per-entity pages (Compiled truth + append-only Timeline) the source of truth rather than a vector store. It deterministically extracts regex-based relations, runs ~20 layered search techniques, auto-enriches via a signal detector and overnight "dream cycle," and ships with 34 skills and embedded PGLite.

By @akshay_pachaar
10 Twitter/X 2026-05-12 1 min read
Open

Robin Hanson claims capitalists, facing the view that fixed infrastructure in…

Why it matters

Robin Hanson claims capitalists, facing the view that fixed infrastructure in Africa is vulnerable to confiscation, have built floating, transferable infrastructure to avoid that risk.

  • The MV Karadeniz Powership Osman Khan (299 m) has been anchored off Ghana since 2017 and can produce up to 480 MW; Karpowership has scaled this floating power-plant model into a global business serving countries with chronic energy deficits.
  • Community-corrected numbers: the ship supplied about 26% of Ghana’s electricity in 2017–2019, but Karpowership now supplies roughly 12% of Ghana’s power after the country added more generation capacity.

Robin Hanson highlights Mario Nawfal’s point that private capital responded to fears of asset confiscation in Africa by deploying floating, transferable infrastructure. He cites the MV Karadeniz Powership Osman Khan (299 m, up to 480 MW) anchored off Ghana since 2017; Karpowership turned this model into a global business, though the ship’s share of Ghana’s power fell from ~26% in 2017–19 to about 12% today.

By @robinhanson
11 ArXiv 2026-05-11 1 min read
Open

Count Anything at Any Granularity

Why it matters

The paper (Liu, Wu, Xie; published 2026-05-11) reframes open-world counting as multi-grained counting with five explicit semantic granularity levels (identity, attribute, instance type, category, abstract concept) so users can specify “what to count” precisely via visual exemplars plus fine-grained text and optional negative prompts.

  • The authors introduce a fully automatic data-scaling pipeline that integrates controllable 3D synthesis, consistent image editing, and VLM-based filtering to produce KubriCount — claimed as the largest and most comprehensively annotated counting dataset to date — supporting both training and multi-grained evaluation.
  • Systematic benchmarks show multimodal LLMs and specialist counting models suffer severe prompt-following failures on fine-grained distinctions; the proposed HieraCount model, which jointly leverages text and visual exemplars, substantially improves multi-grained counting accuracy and generalizes robustly to challenging real-world scenarios.

Count Anything at Any Granularity (Liu et al., 2026) tackles brittle open-world object counting by making counting granularity explicit across five levels (identity, attribute, instance type, category, abstract concept). The authors build KubriCount using a novel automatic pipeline combining controllable 3D synthesis, image editing, and VLM filtering, and find existing multimodal LLMs and counting models fail to follow fine-grained prompts. They train HieraCount, which fuses text and visual exemplars, and report substantial accuracy and generalization gains on multi-grained counting tasks.

Authors: Chang Liu, Haoning Wu, Weidi Xie
12 Twitter/X 2026-05-12 2 min read
Open

Associated Builders and Contractors projects the industry needs 349,000 new…

Why it matters

Associated Builders and Contractors projects the industry needs 349,000 new construction workers in 2026 and 456,000 in 2027; 41% of the current workforce will retire by 2031, specialty-trade wages are rising 4–11% annually, and many firms are paying ~20% premiums to keep crews staffed.

  • Infill in the 20 most desirable metros (San Francisco, NYC, LA, Boston, Seattle, DC, Austin, etc.) is the only near-job-location path to more housing but costs roughly 2–3× per square foot and depends on hard-to-staff, union trades—ironworkers, concrete crews, complex MEP, and elevator mechanics.
  • Austin permitted more multifamily per capita from 2021–2023 and saw rents fall ~20% from the 2022 peak, but the HOME reforms passed in Dec 2023 and May 2024 aren’t built yet; Austin’s success relied on permissive code, greenfield land, ~40% immigrant labor and low-rise economics, motivating investment in construction robotics (claimed up to ~90% less labor).

Nick Durham argues the US housing problem is as much about labor productivity as zoning: ABC estimates demand for 349,000 workers in 2026 and 456,000 in 2027 with 41% of workers retiring by 2031, infill costs 2–3×/sf in top metros, and Austin’s 2021–23 multifamily surge shows zoning alone won’t suffice—he pushes construction robotics to cut labor by ~90%.

By @americanhousing
13 Twitter/X 2026-05-11 1 min read
Open

John Schulman (@johnschulman2) announced on 2026-05-11 that Thinky is sharing…

Why it matters

John Schulman (@johnschulman2) announced on 2026-05-11 that Thinky is sharing work on full‑duplex multimodal models enabling real‑time, simultaneous multimodal interaction (talk/listen/watch/think) without compromising intelligence.

  • Thinky was founded to differentially advance human‑AI collaboration capabilities, which Schulman says are underemphasized relative to intelligence/autonomy because they are harder to evaluate.
  • They propose every AI will include an 'interaction model' as an outer, user‑facing layer that continually keeps users informed and learns their preferences; Thinking Machines (@thinkymachines) published a blog post and demo video with early results.

John Schulman and Thinky (Thinking Machines) released work on full‑duplex multimodal models on 2026-05-11, demonstrating real‑time, simultaneous talk/listen/watch interaction without sacrificing intelligence. They argue human‑AI collaboration deserves more focus than autonomy because it's harder to evaluate and propose an 'interaction model' as a persistent outer layer; a blog and demo video show early results.

By @johnschulman2
14 substack.com 2026-05-12 3 min read
Open

Amkor Q1 2026. Money For Nothing & Your Shares For Free

Why it matters

Amkor reported record Q1 2026 revenue of $1.68 billion, up 27% year‑over‑year and down 10.7% sequentially; the communications end market was the largest contributor, increasing 42% YoY.

  • Profitability improved: gross margin was 14.2% (up 230 basis points YoY, down 250 bps QoQ) with gross profit of $239 million (+52% YoY); GAAP diluted EPS was $0.33 versus $0.09 in Q1 2025 and $0.69 in Q4 2025; net income attributable to Amkor was $83.4 million on 249.6 million diluted shares.
  • Management (CEO Kevin Engel) said Amkor progressed advanced packaging customer programs, improved factory utilization and continued margin initiatives; AI advanced‑packaging revenues are said to be on track to triple YoY and the company is expanding capacity in Arizona and Vietnam.
  • Market reaction and financing: shares fell 8.63% to $71.36 in after‑hours trading on April 27 (after an ~80% YTD run) but recovered to $76.69 by May 12; Amkor also launched a $1 billion convertible senior notes offering and reaffirmed $2.5B–$3B 2026 CapEx guidance.

Amkor delivered a record Q1 2026 with $1.68 billion in revenue ( +27% YoY, -10.7% QoQ), driven principally by a 42% YoY jump in communications and stronger-than-expected end‑market performance outside of computing. Gross margin rose to 14.2% (up 230 bps YoY), producing $239 million of gross profit (+52% YoY); GAAP diluted EPS was $0.33 and net income attributable to Amkor was $83.4 million on 249.6 million diluted shares. CEO Kevin Engel highlighted progress on advanced packaging programs, improved factory utilization and margin initiatives; management says AI advanced‑packaging revenues are on track to triple YoY while capacity builds in Arizona and Vietnam. Investors initially sold off after the April 27 report—shares slid 8.63% to $71.36 amid an 80% YTD run and concern over reaffirmed $2.5–3.0B CapEx—though the stock recovered to $76.69 by May 12. Amkor also filed a $1 billion convertible senior notes offering to raise capital.

By William Martin Keating from Semicon Alpha
15 ArXiv 2026-05-11 1 min read
Open

Revisiting Policy Gradients for Restricted Policy Classes: Escaping Myopic Local Optima with $k$-step Policy Gradients

Why it matters

Authors Alex DeWeese and Guannan Qu (arXiv 2026-05-11) introduce a generalized k-step policy gradient that couples randomness over a k-step window; they prove its solution is exponentially close (in k) to the optimal deterministic policy.

  • Projected gradient descent and mirror descent using the k-step policy gradient achieve the exponential guarantee in O(1/T) iterations under only smoothness and differentiability assumptions, and the analysis avoids distribution-mismatch factors ||d_μ^{π^*}/d_μ^π||_∞ and ||d_μ^{π^*}/μ||_∞.

Revisiting policy gradients for restricted policy classes, the paper identifies one-step myopia (policy gradients depending only on the one-step Q) as a cause of suboptimal critical points and proposes a k-step policy gradient that couples randomness across k steps. The method yields performance exponentially close in k to the optimal deterministic policy, converges in O(1/T) with projected/mirror descent under mild smoothness/differentiability assumptions, avoids common distribution-mismatch factors, and targets applications like state aggregation and partially observable cooperative multi-agent problems. Full text on arXiv.

Authors: Alex DeWeese, Guannan Qu
16 wordpress.com 2026-05-12 15 min read
Open

China and US Trade Talks: A Solution for Oil Shortages?

Why it matters

Gail Tverberg (Our Finite World, May 12, 2026) identifies distillate fuel oil (diesel and jet fuel) as the binding resource constraint for transport, agriculture and industry, noting per-capita diesel+jet fuel fell after the 2007–09 financial crisis and took a larger step down in 2020 (Energy Institute / 2025 Statistical Review).

  • She highlights the May 14–15, 2026 meeting between US President Trump and Chinese President Xi as an opportunity to reorganize trade into two hemispheric blocs (Americas vs. World ex Americas) to shorten shipping/flight distances and conserve distillates.
  • Data show the Americas have higher crude-oil production per capita and have increased total production ~65% since 2005 (EIA data cited), while World ex Americas production is largely flat or declining since ~2019–2020; Russia+ and Middle East production show post-2019 weakness.
  • Population trends favor the World ex Americas (much larger and faster growth): 2021–2024 growth averaged ~0.6%/yr in the Americas vs ~0.9%/yr in World ex Americas, creating geopolitical and resource pressure on the non‑Americas bloc.

Tverberg proposes that the May 14–15, 2026 Xi–Trump summit could kickstart a pragmatic reorganization: shorten trade routes by concentrating commerce within two regional blocs (the Americas and East Asia/World ex Americas) to conserve scarce distillates and possibly mediate an Iran settlement (drawing on Dr. Mohammed Marandi’s mediation thesis). She stresses limits: lost manufacturing capacity in the Americas, China’s control over critical minerals, and massive infrastructure and processing gaps mean any full transition will take decades to over a century. In the near term she warns of an expected downward economic step, uneven regional outcomes favoring population centers near oil and refineries (e.g., Houston), and the need to recalibrate trade, energy and industrial policy to avoid acute fuel-driven dislocations.

By Our Finite World
17 Twitter/X 2026-05-12 1 min read
Open

@fredstaffordcs says Graham Platner misdiagnoses Maine utility price drivers

Why it matters

@fredstaffordcs says Graham Platner misdiagnoses Maine utility price drivers: the main causes are (1) storm damages being recovered in rates instead of taxes, (2) growth of net‑metered rooftop and community solar, and (3) the state renewables mandate — citing LBNL reports from 2025 and 2026.

  • In an interview Platner claimed prices rise because “People need more energy and we’re not producing enough” and due to “corporate consolidation and greed”; Stafford notes the rooftop/community solar growth and renewables mandate are likely popular with Platner’s coalition.
  • Reporter Jae Holzman tweets that Platner told advocates in March he’d support a national moratorium on AI data centers and backs doing “anything” to slow their spread, but opposed a moratorium that’s merely symbolic.

Fred Stafford (@fredstaffordcs) argues Graham Platner wrongly attributes rising Maine utility prices to insufficient supply and corporate consolidation, insisting instead that storm cost recovery in rates, expanding net‑metered rooftop and community solar, and the state renewables mandate (per LBNL 2025/2026 reports) are the primary drivers. Stafford notes those latter drivers align with much of Platner’s coalition. Separately, Platner told reporters he’d back a targeted national moratorium on AI data centers.

By @fredstaffordcs
18 ArXiv 2026-05-11 1 min read
Open

A Recursive Decomposition Framework for Causal Structure Learning in the Presence of Latent Variables

Why it matters

DiCoLa: a recursive decomposition framework that enables divide-and-conquer constraint-based causal discovery in the presence of latent variables by decomposing the global task into smaller subproblems and reconstructing the global structure.

  • The paper (Li et al., arXiv 2026-05-11) proves soundness and completeness of DiCoLa and reports significant computational-efficiency gains on synthetic benchmarks plus successful application to a real-world dataset.

The paper introduces DiCoLa, a recursive decomposition framework that extends divide-and-conquer causal discovery to settings with latent variables by splitting the global CI-testing task into smaller subproblems and integrating solutions via a principled reconstruction step. The authors prove soundness and completeness for the framework and demonstrate substantial runtime improvements on synthetic experiments and practical effectiveness on a real-world dataset, addressing high-dimensional CI-testing bottlenecks.

Authors: Zheng Li, Feng Xie, Shenglan Nie...
19 substack.com 2026-05-12 14 min read
Open

Introducing Barely Possible

Why it matters

Lane Rettig launched Barely Possible publicly on May 12, 2026 (BarelyPossible.to and via Transistor); the project has become his main focus and is an AI-operated podcast meant to be fully automated.

  • Initial product: a single AI-produced daily audio briefing (target ~1–2 hours) combining world news, AI, crypto/tech/finance updates, and a daily deep dive; Rettig says he has ~1–2 hours/day for audio and listens at 2x speed (able to get through 3–4 podcast episodes on a run).
  • Technical approach: a pipeline that collects and analyzes content from multiple sources and synthesizes personalized audio outputs; Rettig says the pipeline can already run on local, open models making the cost structure sustainable.
  • Team & timeline: the project launched after ~two months of work with collaborator 'Baz'; Rettig has been listening daily since the early runs and reports incremental improvements.

Barely Possible is Lane Rettig’s AI-operated podcast, launched publicly on May 12, 2026 (BarelyPossible.to and on Transistor). Rettig positions the show as a high-quality, fully automated daily briefing—initially a single AI-produced program targeted at roughly one to two hours per day that synthesizes world news, AI developments, crypto and tech/finance updates, and a focused daily deep dive. He frames the product around real listening habits (he reports ~1–2 hours/day of audio and uses 2x speed to get through 3–4 episodes on a run) and wants a compact, high-octane slot that replaces time otherwise spent hunting for disparate sources.

Technically, Barely Possible runs a content-collection-and-analysis engine that synthesizes consistent underlying stories into different user-specific outputs; Rettig says the pipeline can operate on local open models, keeping costs sustainable. The project began about two months before launch with collaborator “Baz,” and Rettig has iterated on it daily. The broader vision is an AI-native audio platform with effectively infinite, customizable content (per-user “transcoding” of a single source into many tailored formats), personalized advertising, and creator tools—drawing on concepts like skill files and Neal Stephenson’s ‘ractive’—rather than attempting to supplant human-hosted long-form podcasts.

By Lane Rettig from Three Things
20 ArXiv 2026-05-11 1 min read
Open

Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping

Why it matters

Introduces Super-Linear Advantage Shaping (SLAS): an advantage-dependent extension of the Fisher–Rao information metric that reshapes local policy geometry to amplify high-advantage update directions and tighten low-advantage ones; also uses batch-level normalization to stabilize training.

  • SLAS consistently outperforms the DanceGRPO baseline across multiple backbones and benchmarks, delivering faster training dynamics, improved out-of-domain performance on GenEval and UniGenBench++, greater robustness to model scaling, and reduced reward hacking while preserving semantic and compositional fidelity.
  • Paper identifies that removing the prompt-level standard-deviation term yields an optimal policy ascent linear in the advantage but still limits separation of genuine signal from noise, motivating the super-linear geometric reshaping in SLAS.

Text-to-image post-training via reinforcement learning faces reward-hacking and miscalibration from normalization. Based on the abstract, the authors propose Super-Linear Advantage Shaping (SLAS), which extends the Fisher–Rao metric with advantage-dependent weighting to amplify informative updates and suppress illusory gradients, plus batch-level normalization. Evaluations report consistent gains over DanceGRPO, faster training, better GenEval and UniGenBench++ OOD performance, and improved scaling robustness and fidelity.

Authors: Haoyuan Sun, Jing Wang, Yuxin Song...
21 Twitter/X 2026-05-12 1 min read
Open

Perceptron Mk1 returns native structured outputs—timestamped actions, segmented…

Why it matters

Perceptron Mk1 returns native structured outputs—timestamped actions, segmented subgoals, timecodes, clips, points, and boxes—that can plug directly into pipelines, unlike typical video-AI demos that only report "here's what happened."

  • Benchmarks reported by the author: EgoSchema 80.60% and Perception Test 80.8%; Perceptron Mk1 is priced below Flash Lite.
  • Author argues most video-AI pilots fail to operationalize because outputs aren't queryable or connectable; Perceptron Mk1's format makes it feel like infrastructure teams can build on it, and the "Mk1" name signals a flagship, physical-world model category.

Perceptron Mk1 is a frontier video and embodied-reasoning model whose outputs include structured, pipeline-ready artifacts (timecodes, clips, points, boxes, segmented subgoals) that the author tested in a robotics workflow. Reported scores are EgoSchema 80.60% and Perception Test 80.8%, with pricing below Flash Lite; the author says this format shifts video demos toward buildable infrastructure.

By @hasantoxr
22 Twitter/X 2026-05-12 1 min read
Open

Test setup: @stevibe evaluated six open-source LLMs on a scrambled 3×3 sliding…

Why it matters

Test setup: @stevibe evaluated six open-source LLMs on a scrambled 3×3 sliding puzzle using a move_tile tool, running five trials per scramble depth (best run kept); a model was marked failed if it exceeded 6× the optimal move count.

  • Progressive failures by depth: Depth 10 — GLM 5.1 melted down with 43 moves and was cut; Depth 12 — Gemma4 26B lost the plot; Depth 15 — DeepSeek V4 Flash, DeepSeek V4 Pro, Gemma4, and GLM 5.1 failed; Depth 18 — only Qwen3.6 35B-A3B and Kimi K2.6 still solved; Depth 22 (final) — Qwen3.6 solved in 36 moves, Kimi cracked at 81 moves, DeepSeek V4 Pro finished at 90.
  • Surprising winner and details: Qwen3.6 35B-A3B performed best despite having an active footprint of ~3B parameters that fits on a single RTX 3090; Kimi K2.6 produced an earlier 11-move solve described as looking like cheating, and the author calls Qwen 'unstoppable' and Kimi 'elegant.'

Author @stevibe ran a brutal long-horizon/tool-calling benchmark: six open-source LLMs solved a scrambled 3×3 sliding puzzle via a move_tile tool, five trials per depth, failing if >6× optimal moves. As depth increased, GLM 5.1, Gemma4 26B and DeepSeek variants collapsed; Qwen3.6 35B-A3B (≈3B active params, fits on one 3090) reliably beat the final boss in 36 moves while Kimi K2.6 ultimately cracked at 81.

By @stevibe
23 Twitter/X 2026-05-12 2 min read
Open

Use a spreadsheet + a Claude subscription and ~20 minutes to build a Finance…

Why it matters

Use a spreadsheet + a Claude subscription and ~20 minutes to build a Finance Summary based on the Profit First model that pulls data from your Foundational Five accounts: Income, Profit, Owner's Comp, Tax, and OpEx.

  • Report must output total revenue, each F5 line as separate line items, real profit margin after Owner's Comp, and a TAPS (target allocation percentage) benchmark for your revenue tier; OpEx must be broken into meaningful categories (no 'miscellaneous').
  • Always surface three critical numbers: trailing 3‑month revenue trend (with direction arrow), any OpEx category that grew >10% MoM (named with dollar amount), and cash reserves in weeks of OpEx (RED FLAG if <8 weeks); spot‑check 10 random transactions per report for the first 3 months; schedule monthly on the 1st or weekly on Mondays.

Corey Ganim outlines a Finance Summary to stop cash leaks: build a Profit First–based report (spreadsheet + Claude, ~20 minutes) that pulls five accounts, shows revenue and separate F5 lines with TAPS benchmarks, and flags three metrics: 3‑month trend, OpEx >10% MoM, and cash weeks <8. Spot‑check 10 transactions per report for the first 3 months; schedule monthly or weekly.

By @coreyganim
24 Twitter/X 2026-05-12 1 min read
Open

Aditya (@adxtyahq) cites Andrej Karpathy calling current AI agents “slop” and…

Why it matters

Aditya (@adxtyahq) cites Andrej Karpathy calling current AI agents “slop” and attributes a major cause to memory systems optimizing storage rather than retrieval quality.

  • He reports concrete degradation as memory stores grow: old preferences resurface, contradictions accumulate, and low-signal context competes with important information, so systems technically remember more but behave less intelligently.
  • Aditya published the thread and linked a video/article on 2026-05-12 arguing the AI memory race is focused on the wrong layer.

Aditya (@adxtyahq) argues that current AI 'agents' are 'slop' because memory systems prioritize compact storage over retrieval quality, causing long-term degradation: as memory grows, old preferences reappear, contradictions and low-signal context crowd out crucial facts, so systems technically remember more but act less intelligently; he linked a video/article on 2026-05-12 advocating a shift in focus.

By @adxtyahq
25 Twitter/X 2026-05-12 1 min read
Open

Chamath told the All-In Prediction Show on Jan 9, 2026 that he would pick copper…

Why it matters

Chamath told the All-In Prediction Show on Jan 9, 2026 that he would pick copper as the biggest business winner of 2026, arguing it's essential for datacenters, chips and weapons and that markets are underestimating supply shortfalls amid a more unilateral national-security stance.

  • He claimed current trajectories will leave roughly a 70% global copper supply shortfall by 2040 and tweeted on May 12, 2026 that copper is 'ripping' and 'this trade is definitely working rn.'

Chamath predicted on the All-In Prediction Show (Jan 9, 2026) that copper would be the biggest business winner of 2026, citing its role in datacenters, chips and weapons and warning of underestimated shortages; he asserted a roughly 70% supply shortfall by 2040 and on May 12, 2026 said copper is 'ripping'.

By @chamath
26 Twitter/X 2026-05-12 1 min read
Open

The NEO humanoid robot is built at what the author calls the most vertically…

Why it matters

The NEO humanoid robot is built at what the author calls the most vertically integrated humanoid robot factory in America—every critical component (motors, limbs, circuits, sensors) is manufactured in-house by 1X_tech.

  • VP of Operations Vikram Kothari says the factory runs a rapid four-week CAD-to-finished-robot cycle; CEO Bernt Børnich (who started building soapbox cars at age 11) led the tour and frames NEO around 1X_tech's '1X North Star.'
  • NEO is positioned to launch into consumer homes later in 2026 and the company emphasizes a proprietary 'World Model' (claimed as 'true general intelligence') along with safety and privacy measures.

The NEO factory tour shows 1X_tech’s end-to-end U.S. humanoid robot production: CEO Bernt Børnich and VP Vikram Kothari highlight in-house manufacture of all critical components and a four-week CAD-to-robot build cycle. NEO, tied to a claimed 'World Model' general intelligence, is slated for consumer-home launch later in 2026, with safety and privacy discussed on the factory floor.

By @rpnickson
27 Twitter/X 2026-05-12 1 min read
Open

On 2026-05-12 Séb Krier endorsed the paper 'Positive Alignment'…

Why it matters

On 2026-05-12 Séb Krier endorsed the paper 'Positive Alignment' (arXiv:2605.10310), calling it “fabulous” and “a must read.”

  • The paper defines 'Positive Alignment' as agents that help humans navigate value trade-offs, build resilience, and act as scaffolds for human flourishing—distinct from mere harm-avoidance—and warns that avoiding top-down, technocratic paternalism is a core design challenge.
  • Authors include Ruben Laukkonen, Michael Levin, Verena Rieser, Adam C. Elwood, Franklin Matija, Fernando Rosas and 12+ co-authors; the paper calls for substantially more research into aligning models that actively help humans thrive.

Séb Krier praised the 2026 paper 'Positive Alignment' (arXiv:2605.10310), by Ruben Laukkonen, Michael Levin, Verena Rieser, Adam C. Elwood, Franklin Matija, Fernando Rosas and others, which proposes agents that proactively help humans navigate value trade-offs, strengthen resilience, and scaffold flourishing rather than only preventing harm, and urges more research to avoid technocratic paternalism.

By @Afinetheorem
QUICK SKIM

Scan these for facts, links, or weak signals worth tracking.

64 items · open
1 Twitter/X 2026-05-12 19 min read
Open

Naval (Naval Podcast, published 2026-05-12) argues credibility is more important…

Why it matters

Naval (Naval Podcast, published 2026-05-12) argues credibility is more important than traditional sales tactics: be authentic, honest, knowledgeable and trustworthy rather than overtly “selling”; he cites Robert Cialdini’s CLASSR (consistency, liking, authority, scarcity, social proof, reciprocity) plus anchoring as a useful checklist but says sales is primarily credibility.

  • He uses a “yes, and” approach as rational empathy—reasoning to the other person’s position before agreeing or contradicting—and claims he is wrong about 80% of the time, which motivates his insistence on objectivity and “selfish honesty.”
  • Naval defines charisma as projecting confidence + love (power + good intentions); honesty should be prioritized over niceness for effectiveness, though kindness layered over truth is necessary to have people listen; he admits he is bad at firing and delegates that or helps find alternate roles.
  • On leadership he distinguishes management (telling people what to do) from leadership (making them want to do it), advocates small, high‑trust teams (examples: groups of 10–20, or scaling to 50–200 for big projects), and uses the stag-hunt game-theory metaphor to explain cooperation and high‑trust societies.

Naval (Naval Podcast) lays out a model of persuasion centered on credibility, honesty, and intrinsic motivation rather than conventional salescraft. He references Robert Cialdini’s CLASSR framework and anchoring as a useful checklist but insists that effective “selling” is really about being authentic, explaining simply, and building long‑term trust so people don’t feel they’re being sold to. He practices “rational empathy” and often employs a “yes, and” stance to understand others’ positions before agreeing or contradicting them.

Naval frames charisma as the simultaneous projection of confidence and love, and privileges honesty over politeness while still arguing kindness is necessary for effectiveness. He admits to being wrong about 80% of the time, to avoiding brute-force sales tactics, and to struggling with firing people—preferring to reassign or help them find better roles. He emphasizes leadership as inspiring people to want the work, preferring small, high‑trust teams (10–20 up to a few hundred for major projects) and uses the stag‑hunt metaphor to explain why cooperation scales value. Practically, he advises only selling things you’re obsessed about, timing fundraising to internal excitement (citing his startup Impossible), and feeding good intellectual obsessions rather than chasing formulas or motivational noise.

By @naval
2 Twitter/X 2026-05-12 1 min read
Open

Jeff Wilke, who ran Amazon’s worldwide consumer business, told Jeff Bezos he was…

Why it matters

Jeff Wilke, who ran Amazon’s worldwide consumer business, told Jeff Bezos he was generating so many ideas that he was ‘going to destroy the company’ and insisted Bezos must “release work at the rate the organization can accept.”

  • The thread frames the problem via Little’s Law: more inputs raise work-in-process (WIP), which increases cycle time, defect rate, and rework, choking throughput; Bezos responded by keeping prioritized idea lists and holding ideas until the org could absorb them (Wilke retired in 2021 and Bezos still keeps a list).
  • Many founders and PMs behave oppositely: Slack channels and monthly roadmap reshuffles introduce constant new initiatives, causing teams to optimize for appearing responsive rather than finishing work.

Jeff Bezos generated so many ideas that operations leader Jeff Wilke warned they would ‘destroy the company,’ advising Bezos to limit releases to what the organization can absorb. The post ties this to Little’s Law — excess inputs raise WIP, cycle time, and rework — and notes Bezos changed by prioritizing and keeping lists, while many startups still flood teams with new initiatives.

By @aakashgupta
3 Twitter/X 2026-05-12 1 min read
Open

Garry Tan merged 14 PRs in 72 hours, adding 29,000 new lines and following a…

Why it matters

Garry Tan merged 14 PRs in 72 hours, adding 29,000 new lines and following a 'ratchet' practice where each agent coding session ships tests, docs, and evals alongside code (pattern present in gstack).

  • steipete (behind openclaw) ran 50 Codex instances in parallel, closed 4,000 issues, and open-sourced the tool as 'clawsweeper', described as the same ratchet shape.
  • Claim: agents operate as loops that ratchet progress forward reliably (records/tests prevent human forgetfulness), pushing the complexity ceiling up; practical tip — add "and write tests too" to every Codex prompt.

The post argues that autonomous coding agents create a 'ratchet' effect: repeated agent loops ship tests, docs, and evals with code, producing compounding, reliable progress. Examples: Garry Tan merged 14 PRs in 72 hours adding 29,000 lines; steipete ran 50 Codex instances to close 4,000 issues and released 'clawsweeper.' The author advises adding 'and write tests too' to Codex prompts.

By @Voxyz_ai
4 Twitter/X 2026-05-12 1 min read
Open

Steve Jobs introduced Siri with the iPhone 4S on October 4, 2011; Jobs died two…

Why it matters

Steve Jobs introduced Siri with the iPhone 4S on October 4, 2011; Jobs died two days later on October 6, 2011.

  • By 2026 the author says Siri is widely seen as a punchline and many users (including the author) disable it because it “works so poorly.”
  • The author argues Apple deliberately underinvested in AI—while Microsoft, Google, Meta and OpenAI are spending roughly $30–$50 billion per year on AI, Apple has spent relatively little, causing Siri to fall behind.

Siri was unveiled October 4, 2011 as a flagship iPhone 4S feature—two days before Steve Jobs’s death—and, according to the author, has become a 2026 punchline many users disable. The new video argues Apple deliberately underinvested in AI while rivals (Microsoft, Google, Meta, OpenAI) pour $30–$50 billion annually into AI, explaining Siri’s decline.

By @girdley
5 ArXiv 2026-05-11 1 min read
Open

Is Your Driving World Model an All-Around Player?

Why it matters

WorldLens is a unified benchmark for driving world models that measures fidelity across five complementary aspects and 24 standardized dimensions (pixel quality, 4D geometry, closed-loop driving, human perceptual alignment, etc.), presented at the CVPR 2026 VideoWorldModel Workshop.

  • An evaluation of six representative world-models found no single approach dominates: texture-rich models often violate geometry, geometry-aware models lack behavioral fidelity, and top performers score only 2–3 out of 10 on human realism ratings.
  • The authors release WorldLens-26K, a 26,808-entry human-annotated preference dataset pairing numerical scores with textual rationales, plus WorldLens-Agent, a distilled vision–language evaluator for scalable, explainable automatic assessment (project page and GitHub provided).

WorldLens introduces a comprehensive benchmark for driving world models, spanning five aspects and 24 dimensions to evaluate pixels, 4D geometry, closed-loop driving, and human perceptual realism. The authors benchmark six models, show no method excels across axes (best attain 2–3/10 human realism), and provide WorldLens-26K plus WorldLens-Agent for scalable, explainable evaluation. (Summary based on abstract; full text not reviewed.)

Authors: Lingdong Kong, Ao Liang, Tianyi Yan...
6 ArXiv 2026-05-11 1 min read
Open

Safe Aerial 3D Path Planning for Autonomous UAVs using Magnetic Potential Fields

Why it matters

3DMaxConvNet extends the 2D MaxConvNet magnetic potential-field planner to 3D, using a convolutional autoencoder to predict obstacle-aware potential fields from LiDAR-derived 101^3 voxel grids and achieved 100% path-planning success across 100 randomized closed-loop trials on two Cosys-AirSim urban maps (dense night-time cityscape and suburban district) without retraining.

  • Offline, 3DMaxConvNet produces path lengths comparable to A* on unseen maps while reducing runtime from 0.155–0.17s (A*) to 0.087–0.089s (≈1.7–1.95× faster); compared with RRT*(3k) it achieves similar path quality while cutting runtime from 17.2–17.5s to ≈0.09s (≈193–201× faster).

3DMaxConvNet extends the MaxConvNet magnetic potential-field planner into 3D by training a convolutional autoencoder to generate obstacle-aware potential fields from LiDAR-derived 101^3 voxel grids. In Cosys-AirSim experiments (100 randomized closed-loop trials on two urban maps) it reached 100% success without retraining, matched A* path quality with ~2× lower runtime, and ran ~200× faster than RRT*(3k).

Authors: Haechan Mark Bong, Giovanni Beltrame
7 Twitter/X 2026-05-12 1 min read
Open

Gary Marcus tweeted that Claude Code is “the most neurosymbolic thing” he has…

Why it matters

Gary Marcus tweeted that Claude Code is “the most neurosymbolic thing” he has seen, describing it as “still not AGI” but the biggest advance since GPT‑4; he says it integrates 53 symbolic tools and about 500,000 lines of symbolic code with a state‑of‑the‑art LLM.

  • Heather C. Miller frames Marcus’s praise as evidence that recent progress is a victory for integrating classical AI and computer‑science techniques into compound/neurosymbolic systems (not for pure LLMs) and points to a technical dissection at ccunpacked.dev.

Heather C. Miller amplifies Gary Marcus's claim that Claude Code, while "still not AGI," is a major neurosymbolic advance: it pairs a state‑of‑the‑art LLM with 53 symbolic tools and roughly 500,000 lines of symbolic code. She argues this vindicates integrating classical AI/CS into compound systems rather than relying on pure LLMs, and links to a ccunpacked.dev dissection.

By @heathercmiller
8 Twitter/X 2026-05-12 1 min read
Open

Interview prompt: a hiring tool recommends 15% fewer candidates from certain…

Why it matters

Interview prompt: a hiring tool recommends 15% fewer candidates from certain demographic backgrounds; most candidates reply by debating root cause (training data, feature weights, model architecture) instead of an immediate product response.

  • Product-first play: stop the harm now — pause automated rejects for the affected segment, surface recommendations to human recruiters, and route cases to human review while auditing/fixing the model offline; Prasad Reddy at Deaher used this approach when diagnostics outputs were wrong.
  • Governance math and messaging: a bias audit adds ~10 days, a class action can take years and cost hundreds of millions, EEOC has filed AI discrimination cases since 2023, and boards expect leaders to present the issue, the response, and a timeline.

An AI PM interview question about a hiring tool recommending 15% fewer candidates argues candidates should prioritize immediate harm mitigation over technical root-cause debates. The right product move is to pause auto-rejects, route affected recommendations to human review, and fix the model offline (as Prasad Reddy did at Deaher). Audits take ~10 days; legal exposure can cost years and hundreds of millions, so leaders must lead with response and timeline.

By @aakashgupta
9 Twitter/X 2026-05-12 1 min read
Open

Gary Marcus (post dated 2026-05-12) affirms the SE Gyges line that “partial…

Why it matters

Gary Marcus (post dated 2026-05-12) affirms the SE Gyges line that “partial regurgitation, no matter how fluent, does not, and will not ever, constitute genuine comprehension,” and says LLMs do sometimes “partially regurgitate,” a point supported by a large body of literature on hallucinations and errors.

  • Marcus accuses Geoffrey Hinton of posting a quote on Hinton’s webpage that Marcus says is either fabricated or taken out of context; he could find no other source for the attribution and calls such misattribution intellectually dishonest, arguing it’s unfair for a Nobel laureate to attack a view he doesn’t hold.

Gary Marcus (2026-05-12) insists his position is that LLMs sometimes “partially regurgitate” and that fluent repetition doesn’t equal understanding, endorsing SE Gyges’ wording. He claims Geoffrey Hinton has attributed a different, unfounded quote to him on Hinton’s website, calls that fabrication or out-of-context citation, and urges intellectual honesty from a Nobel laureate.

By @GaryMarcus
10 Twitter/X 2026-05-12 1 min read
Open

ByteDance shut down 30% of its AI application projects following an April 2026…

Why it matters

ByteDance shut down 30% of its AI application projects following an April 2026 internal AI strategy review, cutting efforts including '猫箱', '星绘', and parts of the overseas AI video tool Dreamina.

  • ByteDance missed its 2025 KPI of building three additional products with ≥10 million DAU (outside 豆包); 2025 AI inference costs exceeded ¥80 billion RMB (2.3× the incremental revenue), and the CFO warned cash flow won't last to 2027 at current burn.
  • Overseas growth has slowed (Dreamina monthly user growth fell from 30% to 4%) and regulatory/geo risks (TikTok U.S. divestiture uncertainty, EU AI Act costs, India bans) prompted a strategic pivot to double down on 豆包, bet on hardware (PICO + AI glasses), and sharply shrink pure-application investments.

ByteDance enacted an April 2026 rollback that eliminated 30% of AI projects (e.g., 猫箱, 星绘, parts of Dreamina) after failing to produce any of three promised 10M+ DAU products in 2025. With AI inference costs >¥80B (2.3× incremental revenue), slowing overseas growth and regulatory headwinds, the company is refocusing on 豆包, hardware (PICO and AI glasses), and cutting broad app bets.

By @ExplodeMeow102
11 Twitter/X 2026-05-12 1 min read
Open

On 2026-05-12 Ethan Mollick says OpenAI contacted him to confirm Study Mode is…

Why it matters

On 2026-05-12 Ethan Mollick says OpenAI contacted him to confirm Study Mode is still live and accessible via the /study and /learn shortcuts, even though the official Study Mode page doesn't mention it and most accounts no longer show a menu option to select it.

  • Mollick claims the silent UI removal is “a big mistake,” arguing evidence shows assistant-mode AI can harm learning by handing out answers; he notes Claude and Gemini still offer study modes and says Study Mode was a simple mitigation parents and teachers could recommend.

Ethan Mollick (2026-05-12) reports OpenAI told him Study Mode remains live and reachable via /study and /learn shortcuts, despite its absence from most ChatGPT menus. He argues the silent removal is “a big mistake,” citing evidence that assistant-mode AIs can hurt learning by giving answers, and says Study Mode offered a simple mitigant for parents and teachers.

By @emollick
12 Twitter/X 2026-05-12 2 min read
Open

Aakash Gupta: the most expensive PM artifact is a 30-page PRD for a feature that…

Why it matters

Aakash Gupta: the most expensive PM artifact is a 30-page PRD for a feature that would never ship; his recommended fix is three validation lines at the top of a Claude skill—"Problem statement clear? Target user identified? Evidence the user wants this?"—and if two of three are missing, the skill refuses and names what's missing.

  • In 75 tests across 25 Claude skills (15 daily PM skills + 10 edge-case skills) Gupta found two failure modes: Claude routes on only a skill's name+description (37 characters is often insufficient), and it takes the shortest path, skipping later self-review steps; a table mapping "What Claude might think | Why it's wrong" and top-of-skill gates reduce these failures.
  • Gupta published 10 laws and two installable skills: /improve-skill (generates test prompts, diagnoses where outputs break, rewrites the highest-leverage problem) and /create-skill (scaffolds new skills with the 10 laws); he says PMs who stopped pasting an 800-word prompt began pulling ahead six months ago.

Aakash Gupta (tweeted 2026-05-12) argues that bloated PM artifacts—exemplified by a 30-page PRD that never ships—result from missing upfront checks in generator skills. His solution: simple three-line gates at the top of Claude skills, paired with a table that anticipates rationalizations. He validated this in 75 tests over 25 skills and publishes 10 laws plus two actionable skills (/improve-skill, /create-skill).

By @aakashgupta
13 substack.com 2026-05-12 2 min read
Open

Consumer Investors Say Their Category Isn’t Dead. It’s Different.

Why it matters

On May 12, 2026 Eric Newcomer reported that consumer-focused VCs—including Lightspeed partner Faraz Fatemi and dinner attendees Kirsten Green (Forerunner), Saar Gur (CRV), Ivan Zhou (Accel), and Mike Duboe (Greylock)—met at a Menlo Ventures-hosted gathering and signaled the consumer category is “different,” not dead.

  • VCs are explicitly broadening the definition of consumer investing to include prosumer AI products—notably coding assistants and other pro‑consumer/prosumer AI tools—shifting emphasis from pure consumer plays to AI-enabled productivity and pro-sumerization.
  • The piece appears on Newcomer (Eric Newcomer) on Substack and is behind a paywall; metadata shows it was created 2026-03-11, last updated 2026-04-04, and published May 12, 2026.

Consumer investors are recalibrating: Eric Newcomer (Substack) reports on May 12, 2026 that a Menlo Ventures–hosted dinner with VCs like Faraz Fatemi, Kirsten Green, Saar Gur, Ivan Zhou, and Mike Duboe reflected a shift—the consumer category is evolving to encompass prosumer AI, including coding assistants and other productivity-focused AI tools—rather than being dead; full article is paywalled.

By Newcomer
14 wordpress.com 2026-05-12 4 min read
Open

Signing off in a world of what’s next

Why it matters

On May 12, 2026 Om Malik reflects on Pete Larson of Just a Few Acres Farm ending his ~6‑year YouTube run; Pete and his wife, both in their 50s, announced they will “find a new way of doing things” and said, “It occurred to both of us that we didn't need to know.”

  • Larson previously spent about 20 years as an architect before quitting to run a small cattle farm and building a channel with reportedly hundreds of thousands of followers and a revenue stream — yet he chose to stop producing videos.
  • In his final video Pete apologized to viewers with two words, “I'm sorry,” acknowledging the weight of leaving a steady rhythm his audience relied on; Malik likens the loss to a close friend moving away and calls such channels an antidote to the noisy announcement economy.
  • Malik contrasts Pete’s quiet exit with the “ugliness” of the ongoing OpenAI lawsuit and the spectacle of tech; writing from San Francisco on May 12, 2026, he admits envy and contemplates stepping away from tech for a more meaningful life.

Om Malik reflects on Pete Larson’s decision to stop making videos for his Just a Few Acres Farm YouTube channel, a project Larson ran for roughly six years after leaving a 20‑year architecture career to become a small cattle farmer. Malik notes Pete and his wife — both in their 50s — intend to “find a new way of doing things” and accepted not knowing the next step. Despite building a channel with hundreds of thousands of followers and a revenue stream, Larson chose to walk away; in his final clip he said two words, “I’m sorry,” recognizing how much viewers relied on that steady rhythm. Malik frames the exit as a quiet, characterful counterpoint to the spectacle of technology (citing the OpenAI lawsuit) and expresses personal envy and a longing for a less rancorous life.

By On my Om
15 ArXiv 2026-05-11 1 min read
Open

When Can Digital Personas Reliably Approximate Human Survey Findings?

Why it matters

Using the LISS panel, authors constructed digital personas from respondents' background variables and pre-2023 survey histories and tested them against the same respondents' held-out post-cutoff answers across four persona architectures, three LLMs, and two prediction tasks.

  • Personas improved alignment with human response distributions—especially for questions tied to stable attributes and values—and retrieval-augmented architectures produced the clearest gains; however personas performed poorly at individual-level prediction, failed to recover multivariate respondent structure, and did worst on subjective, heterogeneous, or rare responses.

LLM-based digital personas were evaluated as substitutes for human survey respondents using the LISS panel: personas were built from background variables and pre-2023 survey histories and tested on held-out post-cutoff answers. Across four persona architectures, three LLMs, and two prediction tasks, personas improved distributional alignment (notably for stable attributes) but struggled with individual prediction and multivariate respondent structure; retrieval augmentation helped. Summary is based on the abstract (full paper not reviewed).

Authors: Mumin Jia, Yilin Chen, Divya Sharma...
16 producthabits.com 2026-05-12 1 min read
Open

Half of you got a test email yesterday

Why it matters

On 2026-05-12 Hiten Shah (Product Habits) sent an A/B test email: half of recipients received one version, half another, and some readers identified one variant as “AI-written.”

  • Shah says click rates have fallen over the past year, suspects readers detect empty/AI-like copy, and is running a live Reddit AMA (https://www.reddit.com/r/crazyegg/comments/1tajl4t/) to answer questions and test more human writing.

Hiten Shah (Product Habits) ran an A/B email test on 2026-05-12 — half of subscribers saw one version and half another, with some calling one variant AI-sounding. Facing a year-long drop in click rates, Shah hypothesizes readers detect “empty” AI-style prose and will run a live Reddit AMA to probe responses and defend human writing.

By Hiten Shah
17 substack.com 2026-05-12 3 min read
Open

The Industry // 10 Crypto Heavyweights Explain the Quantum Risk to Bitcoin

Why it matters

Google published a paper on March 31, 2026, estimating a theoretical quantum setup could derive a Bitcoin private key from an exposed public key in about nine minutes—close to Bitcoin's average block time of ~10 minutes.

  • The paper reduces the previously believed hardware requirement from 'millions of qubits' to roughly 500,000 qubits (about 1/20th of prior estimates), making a Shor-algorithm–based attack materially more plausible.
  • By contrast, recovering a private key with classical computing resources would take >1 billion years; Shor's algorithm (discovered by Peter Shor in 1994) on a sufficient quantum machine would shorten that to minutes, putting early/exposed wallets (including addresses tied to Satoshi) and potentially millions of BTC at risk.
  • The May 12, 2026 Pirate Wires piece by Ryan Hassan collects reactions from ten crypto and quantum experts about 'Q-Day' risks and the likelihood of chaotic forking or protocol responses.

Pirate Wires (Ryan Hassan, May 12, 2026) examines how a March 31 Google paper re‑shapes the quantum threat to Bitcoin by presenting a theoretical quantum‑computer design that could extract a private key from an exposed public key in ~9 minutes—nearly matching Bitcoin's ~10‑minute average block interval. The paper's resource estimate (~500,000 qubits) is roughly 1/20th of earlier forecasts that assumed millions of qubits, making a Shor‑algorithm attack (Shor, 1994) significantly more feasible. The article explains the quantum advantage (superposition and parallelism), contrasts the >1 billion‑year effort required by classical means, and warns that wallets with exposed public keys—including early Satoshi‑era addresses—could be immediately vulnerable once such hardware exists. It compiles views from ten crypto/quantum experts on timelines, mitigation paths (post‑quantum upgrades, key hygiene), and the risk of market and consensus disruption around 'Q‑Day.'

By Pirate Wires
18 Twitter/X 2026-05-12 1 min read
Open

@hasantoxr ran a 10+ back-and-forth research workflow on 2026-05-12 and reports…

Why it matters

@hasantoxr ran a 10+ back-and-forth research workflow on 2026-05-12 and reports OpenSquilla's content-aware routing sent simple prompts to cheaper models and complex prompts to full‑power models, producing a token bill ~67% lower than his usual agent while keeping memory sharp and outputs clean.

  • OpenSquilla (open-source, Python, local‑first) implements content-aware model routing, memory consolidation, and adaptive token compression; its public benchmark claims 60–80% lower model cost on mixed long‑running tasks.
  • #10MTokenChallenge: OpenSquilla is offering 30 winners × 10M OpenRouter credits across three categories — 10 Faithful Reproduction, 10 Best Savings Case, 10 Quality Bug Report — to verify claimed savings.

OpenSquilla: @hasantoxr ran a 10+ turn research workflow (published 2026-05-12) and says smart routing routed simple prompts to cheaper models and complex ones to full power, cutting his token bill by ~67% versus his usual agent while preserving memory and output quality. OpenSquilla (open-source, Python, local‑first) claims 60–80% cost reduction and launched the #10MTokenChallenge (30 winners × 10M OpenRouter credits).

By @hasantoxr
19 ArXiv 2026-05-11 1 min read
Open

PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models

Why it matters

PriorVLA is a prior-preserving adaptation framework that keeps a frozen Prior Expert and trains an Adaptation Expert using Expert Queries to integrate pretrained scene and motor priors, while updating only 25% of the parameters compared to full fine-tuning.

  • On benchmarks, PriorVLA outperforms full fine-tuning and SOTA VLA baselines: it improves over pi0.5 by 11 points on RoboTwin 2.0-Hard and achieves 99.1% average success on LIBERO.
  • In real-world evaluation across eight tasks and two embodiments, PriorVLA attains 81% in-distribution (ID) and 57% out-of-distribution (OOD) success with standard data; with 10 demonstrations per task it reaches 48% ID and 32% OOD, surpassing pi0.5 by 24 and 22 points respectively.

PriorVLA introduces a prior-preserving adaptation method for Vision-Language-Action models that freezes a Prior Expert and trains an Adaptation Expert, using Expert Queries to inject pretrained scene and motor priors. By updating only 25% of parameters, it outperforms full fine-tuning and SOTA baselines on RoboTwin 2.0, LIBERO, and eight real-world tasks, with strong OOD and few-shot gains.

Authors: Xinyu Guo, Bin Xie, Wei Chai...
20 ArXiv 2026-05-11 1 min read
Open

Optimal and Scalable MAPF via Multi-Marginal Optimal Transport and Schrödinger Bridges

Why it matters

Usman A. Khan and Joseph W. Durham (ArXiv 2026-05-11; accepted ICML 2026 spotlight) recast anonymous multi-agent path finding (MAPF) on finite graphs as a Markovian multi-marginal optimal transport (MMOT) problem whose exponentially large MMOT collapses to a linear program (LP) of polynomial size.

  • They prove the LP is feasible and totally unimodular under stated conditions, yielding min-cost integral {0,1} transports that provably avoid space–time overlaps (no collisions).
  • For scalability, they formulate a Schrödinger bridge (entropic regularization) solved by Sinkhorn-type iterations; the resulting fractional transport guides a reduced LP that produces near-optimal integral solutions with significantly lower complexity, validated by extensive experiments.

The paper addresses anonymous MAPF by formulating it as a Markovian MMOT that exactly reduces an exponentially large problem to a polynomial-size LP, with proven feasibility and total unimodularity yielding collision-free integral {0,1} transports. For large-scale instances the authors use a Schrödinger-bridge entropic regularization solved via Sinkhorn iterations to produce fractional templates used in a reduced LP, trading negligible optimality loss for major computational savings; experimental results demonstrate strong optimality and scalability. Accepted to ICML 2026 as a spotlight paper.

Authors: Usman A. Khan, Joseph W. Durham
21 ArXiv 2026-05-11 1 min read
Open

RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

Why it matters

RoboMemArena is a large-scale robotic memory benchmark (ArXiv preprint 2026-05-11) that contains 26 tasks with average trajectory lengths exceeding 1,000 steps and 68.9% of subtasks labeled as memory-dependent; it supplies multimodal memory annotations (subtask instructions, native keyframe annotations), a VLM-driven generation pipeline using atomic functions, and paired real-world tasks (project: https://robomemarena.github.io).

  • PrediMem is a proposed dual-system vision–language agent in which a high-level VLM planner manages a memory bank with recent and keyframe buffers and a predictive coding head; extensive experiments on RoboMemArena show PrediMem outperforms all baselines and provides empirical insights into memory management, model architecture choices, and scaling laws.

RoboMemArena is a large-scale benchmark addressing robotic memory shortcomings by offering 26 long-horizon tasks (avg >1,000 steps) with multimodal memory annotations and paired real-world evaluations; a VLM-based pipeline composes subtasks and generates trajectories via atomic functions. The authors introduce PrediMem, a dual-system VLA with recent/keyframe memory buffers and a predictive coding head that outperforms baselines on this benchmark.

Authors: Huashuo Lei, Wenxuan Song, Huarui Zhang...
22 ArXiv 2026-05-11 1 min read
Open

Unified Noise Steering for Efficient Human-Guided VLA Adaptation

Why it matters

UniSteer unifies human corrective actions with noise-space RL by approximately inverting a frozen flow‑matching decoder to map corrective actions to noise targets, providing supervised guidance for a lightweight noise‑predicting actor while that actor is simultaneously optimized with RL.

  • On four real‑world manipulation adaptation tasks, UniSteer raised success rates from 20% to 90% on average in 66 minutes and outperformed strong noise‑space RL and action‑space human‑in‑the‑loop baselines.
  • Paper by Junjie Lu et al., posted to arXiv (cs.RO) on 2026-05-11; summary based on the abstract (full text not available here).

UniSteer tackles efficient real‑world adaptation of diffusion-based vision‑language‑action models by combining human corrective interventions with noise‑space RL. The method inverts a frozen flow‑matching decoder to convert human actions into supervision for a noise‑predicting actor, which is concurrently improved with RL. Real‑world tests on four manipulation tasks show a jump from 20% to 90% success in 66 minutes on average. Full paper was not available, summary based on the abstract.

Authors: Junjie Lu, Xinyao Qin, Yuhua Jiang...
23 ArXiv 2026-05-11 1 min read
Open

Counterfactual Stress Testing for Image Classification Models

Why it matters

Moritz Stammel et al. (arXiv 2026-05-11) propose a counterfactual stress testing framework that uses causal generative models to intervene on attributes (e.g., scanner type, patient sex) and synthesize anatomically-preserving “what-if” images for targeted distribution shifts.

  • Evaluated on chest X‑ray and mammography across three model architectures and multiple shift scenarios, the counterfactual stress tests provide a substantially more accurate proxy for real out-of-distribution performance than classical perturbations, capturing direction, relative magnitude of performance changes, and model ranking.

Counterfactual stress testing for medical image classifiers uses causal generative models to create realistic, anatomically-preserving ‘‘what-if’’ images by intervening on attributes like scanner type and patient sex. Evaluated on chest X‑ray and mammography across three architectures and several shift scenarios (Stammel et al., 2026), the method more reliably predicts real OOD performance than simple brightness/contrast perturbations, improving robustness assessment prior to deployment.

Authors: Moritz Stammel, Fabio De Sousa Ribeiro, Raghav Mehta...
24 e.economist.com 2026-05-12 7 min read
Open

Off the Charts: Let’s keep things in perspective

Why it matters

The Strait of Hormuz had been “all but shut” for more than two months as of the May 12, 2026 newsletter, causing one of the largest energy‑supply shocks in history and driving up US petrol prices while disrupting shipments of other goods such as plastics and pistachios.

  • New SIPRI data — reported in the newsletter and analysed by Sondre Solstad — show that, when adjusted for spending power, America’s allies now outspend the United States on defence for the first time since 2001.
  • Britain’s new Rent Act came into force on May 1, 2026; the newsletter (citing James Fransham) warns the law aims to reduce insecurity but could have unintended consequences for landlords and tenants.
  • Visual‑data journalist Rosamund Pearce outlines projection choices for illustrations: oblique (keep one face undistorted), isometric (equal x,y,z axes — used for Chernobyl’s fourth reactor and Hong Kong flat diagrams), dimetric/trimetric variants, and perspective (naturalistic, with vanishing points; use Blender or three.js to switch camera from orthographic to perspective); she recommends 2D for scale and heavy labelling because projected text can be harder to read.

Rosamund Pearce’s May 12, 2026 Off the Charts newsletter links two strands: hard news visualised and guidance on technical diagramming. On the news side, the piece notes the Strait of Hormuz has been effectively closed for over two months, producing a major energy‑supply shock that has raised US petrol prices and disrupted trade in items from plastics to pistachios; it also flags SIPRI data (reported by Sondre Solstad) that, on a purchasing‑power basis, America’s allies now outspend the US on defence for the first time since 2001, and reminds readers Britain’s Rent Act began on May 1, 2026 with possible unintended effects. On the craft side, Pearce explains projection choices — oblique, isometric (equal axes), dimetric/trimetric and perspective — gives concrete examples (Chernobyl reactor, Hong Kong flats, geothermal schematic), and advises when to use 2D versus 3D tools (Blender, three.js) and why projecting text often reduces readability.

By The Economist
25 Twitter/X 2026-05-12 2 min read
Open

xAI's Grok Voice Think Fast 1.0 leads the τ‑Voice agentic benchmark with a 52.1%…

Why it matters

xAI's Grok Voice Think Fast 1.0 leads the τ‑Voice agentic benchmark with a 52.1% task-success rate and averages 5.6 minutes per conversation (second-longest duration).

  • OpenAI's GPT‑Realtime‑2 (High) scored 39.8% (3.0 min) and GPT‑Realtime‑1.5 scored 38.8% (4.8 min); Google Gemini 3.1 Flash Live Preview - High scored 37.7% (3.8 min).
  • τ‑Voice (extending τ²‑bench) evaluates multi-turn S2S instruction-following and tool use with an LLM-driven simulated user and realistic audio (accents, background noise, packet loss) across Airline (50), Retail (114) and Telecom (114) scenarios; even top S2S models resolve only about half of end-to-end customer-service cases and performance varies by audio condition and conversation length.

Artificial Analysis' τ‑Voice benchmark extends τ²‑bench into speech-to-speech agent evaluation using an LLM-driven simulated customer and realistic audio (accents, noise, packet loss) across Airline (50), Retail (114) and Telecom (114) tasks. xAI's Grok Voice Think Fast 1.0 tops the ranking at 52.1% (5.6 min); OpenAI and Gemini models follow in the high‑30% range, and overall S2S still resolves roughly half of realistic customer-service scenarios.

By @elonmusk
26 substack.com 2026-05-12 9 min read
Open

Towards a New Aesthetic

Why it matters

Megan Gafford announced on May 12, 2026 that she was awarded a New Aesthetics grant (responding to Patrick Collison and Tyler Cowen’s “A Call for New Aesthetics” from the end of 2025) to invent a new ornamental architectural style using drawing as her primary method.

  • Her planned methodology is explicit: travel to cities with Art Deco and Brutalist architecture, make in‑person sketches (drawing from life rather than photos), and publish sketches, research, and observations on her newsletter Fashionably Late Takes.
  • Gafford argues drawing has historically shaped architectural movements—citing Le Corbusier, Gordon Bunshaft, Paul Rudolph—and plans to use that persuasive power to 'resurrect' ornament, building on earlier essays ('America was supposed to be Art Deco,' Sept 10, 2024; 'Beauty Under the Cover of Darkness,' Jan 14, 2025).
  • She criticizes contemporary debates (Noah Smith, Matthew Yglesias) for framing choices only as modernism vs European traditionalism and urges mining American precedents (e.g., American Art Deco) for new, non‑pastiche aesthetics.

Megan Gafford’s May 12, 2026 essay 'Towards a New Aesthetic' announces that she won a New Aesthetics grant—prompted by Patrick Collison and Tyler Cowen’s end‑of‑2025 Call—to pursue a project to invent a new ornamental architectural style through drawing. Her approach is practical and methodological: she will travel to cities notable for Art Deco and Brutalist buildings, make life sketches (not photo studies), and pair drawing studies with archival and observational research published on Fashionably Late Takes. Gafford frames drawing as a historically decisive tool—arguing Le Corbusier’s and Paul Rudolph’s drawings shaped modernism despite uneven built results—and intends to use prolonged, on‑site observation (she invokes a six‑month moon‑drawing project as precedent) to synthesize past motifs into novel ornament. She also critiques public discourse (Noah Smith, Matthew Yglesias) for limiting the design conversation to modernism versus European pastiche, urging exploration of American aesthetic resources instead.

By Megan Gafford from Fashionably Late Takes
27 Twitter/X 2026-05-12 1 min read
Open

Transcript excerpt from NIK (@ns123abc) captures the opening of Sam Altman's…

Why it matters

Transcript excerpt from NIK (@ns123abc) captures the opening of Sam Altman's cross-examination on 2026-05-12: Musk's lawyer repeatedly pressed Altman on honesty ('Are you completely trustworthy?', 'Do you always tell the truth?'), and Altman hedged and ultimately said 'I'll just amend my answer to yes.'

  • Gary Marcus reposted the excerpt with the caption 'speaks for itself,' signaling skepticism; original post link: https://nitter.net/ns123abc/status/2054274686180114816#m

Sam Altman's cross‑examination transcript excerpt (posted 2026-05-12) shows Musk's lawyer repeatedly pressing him about honesty and trustworthiness—Altman hedged several times, eventually amending an answer to 'yes.' Gary Marcus reshared the exchange with the one‑line caption 'speaks for itself,' framing the testimony as damning or self‑evident.

By @GaryMarcus
28 Twitter/X 2026-05-12 1 min read
Open

@emollick reported on 2026-05-12 that ChatGPT's Study Mode was silently removed…

Why it matters

@emollick reported on 2026-05-12 that ChatGPT's Study Mode was silently removed for most accounts, while competing models Claude (Anthropic) and Gemini (Google) still offer their own study/tutor modes.

  • He argues that assistant-style AI can harm learning by giving answers that create an illusion of mastery; Study Mode was an easy, recommendable preset for parents and teachers to mitigate this, and OpenAI still hosts a Study Mode page whose link can still activate it even though the menu option is missing for most users.

@emollick reports that ChatGPT’s Study Mode was silently removed for most accounts (reported 2026-05-12), while Claude and Gemini still offer similar options. He argues Study Mode provided a simple, recommendable setting for parents and teachers to reduce learning harms from assistant-style answers and notes OpenAI’s Study Mode page/link remains online.

By @emollick
29 Twitter/X 2026-05-12 2 min read
Open

$32,604 is the 2026 inflation-adjusted equivalent of Sears' 1910 $938 kit-house…

Why it matters

$32,604 is the 2026 inflation-adjusted equivalent of Sears' 1910 $938 kit-house price; the author asserts that amount "doesn't buy you a Sears house" in 2026.

  • California's average impact fee for a new single-family home is $37,471 (2024 California YIMBY), triple the national average; Fremont's single-family impact fee reached $157,000 (Terner Center 2018). $32,604 covers ~87% of the CA average fee and ~21% of Fremont's fee.
  • Permitting and code barriers now exceed the original kit model: Los Angeles permit packages cost $18k–$30k, San Francisco $25k–$40k; typical R‑1 minimum lots are 5,000–20,000 sq ft with minimum dwelling sizes 800–1,000+ sq ft (Sears models ~600 sq ft would be illegal); shared chimney flues, owner-assembled construction, pre-cut lumber grading/stamping, and 90-day assembly windows violate modern codes or licensing, while permitting cycles run 12–24 months.

Aakash Gupta argues $32,604 (the 2026 equivalent of Sears' 1910 $938 kit) no longer buys a Sears kit home: California's average impact fee is $37,471 and Fremont's hit $157,000, while Los Angeles and San Francisco charge $18k–$40k in permit fees. Modern lot-size rules, fire and contractor codes, grading standards and 12–24 month permitting make owner-assembled kit homes effectively infeasible today.

By @aakashgupta
30 e.economist.com 2026-05-11 8 min read
Open

Café Europa: The two Armenias

Why it matters

On May 4-5, 2026 Yerevan hosted the European Political Community meeting, Armenia’s first bilateral summit with the EU and the inaugural Yerevan Dialogue; French president Emmanuel Macron publicly praised Prime Minister Nikol Pashinyan, boosting his pro-European messaging ahead of the June 7, 2026 election.

  • Nikol Pashinyan’s Civil Contract leads opposition Strong Armenia by more than 15 percentage points in polls ahead of the June 7 vote; by contrast Gagik Tsarukyan’s Prosperous Armenia polled just 3% in February.
  • Prominent pro-Russian figures include oligarch Gagik Tsarukyan, who is funding a massive Jesus statue in Zovuni despite objections from the Armenian Apostolic Church, and Samvel Karapetyan, leader of Strong Armenia, currently under house arrest and campaigning for closer ties with Russia.
  • Armenia is moving toward a negotiated peace with Azerbaijan after a series of military defeats since 2020, but remains economically dependent on Russia; Pashinyan faces the near-term task of expanding trade with the EU and Turkey and undertaking deep institutional reforms inherited from prior strongman rule.

Fraser McIlwraith reports that Armenia is visibly pivoting toward the West under Prime Minister Nikol Pashinyan but remains politically and socially divided. High-profile events on May 4–5, 2026 — an EPC meeting, Armenia’s first EU bilateral summit and the Yerevan Dialogue where Emmanuel Macron praised Pashinyan — acted as an unofficial pre-election boost ahead of the June 7 vote, with Civil Contract leading Strong Armenia by over 15 percentage points. Yet pro-Russian currents persist: oligarch Gagik Tsarukyan is erecting a giant Jesus statue in Zovuni and leads a minor, Russia-friendly party (3% in a February poll), while oligarch Samvel Karapetyan, under house arrest, urges closer ties with Moscow. After crushing defeats since 2020 Armenia is negotiating peace with Azerbaijan, but deep economic dependence on Russia and decayed institutions mean Pashinyan must pursue gradual trade diversification and reforms to consolidate a pro-European trajectory.

By The Economist
31 producthabits.com 2026-05-11 3 min read
Open

AI broke the defaults

Why it matters

An Internet Archive–based study cited by Hiten Shah estimates that by mid-2025 about 35% of newly published websites were AI-generated or AI-assisted, producing semantic contraction and a positivity shift in web content.

  • Researchers found roughly 380,000 publicly accessible 'vibe-coded' apps, with about 5,000 leaking sensitive data via simple URL access—highlighting governance gaps as many production apps ship outside engineering/security review.
  • Chicago Booth researchers tested four AI-text detectors on roughly 2,000 passages and report that commercial detectors perform on medium/long text but break down for passages under ~50 words; Shah recommends using a policy cap (acceptable false positive rate) when evaluating detectors.
  • Shah highlights organizational impacts: an agent is usefully defined as 'runs tools in a loop toward a goal'; Copilot-style tools create private, hard-to-share productivity, Microsoft warns agents free humans to direct work but only 13% of workers say they're rewarded for reinvention, and Clay’s advantage stems from role plasticity.

Hiten Shah's newsletter "AI broke the defaults" (May 11, 2026) synthesizes recent research and practitioner notes showing how cheap AI creation is changing defaults across software, content, and organizations. Key findings include an Internet Archive sample estimating ~35% of new sites by mid‑2025 were AI‑generated/assisted (with semantic contraction and a positivity bias), Chicago Booth tests of four detectors on ~2,000 passages that fail for snippets under ~50 words, and a study finding ~380,000 vibe‑coded apps publicly reachable with ~5,000 leaking sensitive data via simple URLs. Shah frames agents as tool‑loop executors, warns Copilot‑style usage creates private learning trapped in loops, and cites Microsoft and survey data (13% rewarded for reinvention) plus Clay’s role‑plasticity as structural advantages in the AI era.

By Hiten Shah
32 substack.com 2026-05-12 18 min read
Open

Childhood and Education #18: Do The Math

Why it matters

Zvi Mowshowitz (Don't Worry About the Vase, May 12, 2026) summarizes and amplifies critiques of contemporary math education research—highlighting Kelsey Piper’s reporting on Stanford education professor Jo Boaler and other systemic failures in study design and transparency.

  • Critiques of Boaler’s work include non-disclosure of the school, inappropriate comparisons (e.g., top two quartiles at one school vs. middle quartiles elsewhere), tests that were 2–3 years below grade level, misgrading, no predictive validity for SAT scores, and likely practice effects on pre/post measures.
  • At UC San Diego (report cited), remedial Math 2 grew from 32 students in fall 2020 to about 1,000 in fall 2025 (≈12% of students); one-quarter of Math 2 students missed simple arithmetic (example: 7+2 = []+6), 42% of students who scored below middle‑school math reported having completed precalc or calculus in high school, and >25% had a 4.0 math GPA (average 3.7).
  • Academic outcomes: between 2017–2023, students coming from remedial tracks had 24% D/F/W in Calculus 10A and 30% D/F/W in 10B; similar remediation needs have been reported at elite schools (Harvard added remedial support).

Zvi Mowshowitz argues that contemporary math education is failing because weak research, grade inflation, and policy choices have removed objective reality checks. He amplifies Kelsey Piper’s exposé of Jo Boaler’s influential work, listing concrete methodological failures: undisclosed sites, inappropriate comparison groups (top quartiles vs. middle quartiles), assessments two to three years below grade level, incorrect grading, lack of predictive validity with standardized tests, and likely pre/post practice effects on short interventions. Those research flaws, he contends, fed policy choices that removed rigorous admissions signals (e.g., UC removal of SAT/ACT in 2020) and allowed students to accumulate transcripts that don’t reflect actual skill.

Mowshowitz cites UC San Diego data showing remedial Math 2 enrollments climbed from 32 (fall 2020) to ~1,000 (fall 2025), about 12% of students; many remedial students reported completing precalc/calculus in high school despite scoring below middle‑school levels (42%), and over a quarter had a 4.0 math GPA (avg. 3.7). Between 2017–2023, such students had 24% D/F/W in Calculus 10A and 30% D/F/W in 10B. He links these patterns to wider phenomena—grade‑scale inflation, dubious classroom grading practices, and experimental curricula (e.g., novel long‑division methods)—and urges restoring objective testing, stronger accountability for grade/reporting accuracy, and an end to “cargo‑cult equity” that masks skill deficits.

By Zvi Mowshowitz from Don't Worry About the Vase
33 Twitter/X 2026-05-12 1 min read
Open

@0xd1namit says they are building their 'tweets markets' strategy using…

Why it matters

@0xd1namit says they are building their 'tweets markets' strategy using cryptovcdegen's six-step Polymarket plan and warns that skipping long, hard tests will lead to losses and blaming Polymarket.

  • cryptovcdegen's exact steps: 1) Read Rules across 10 weather markets; 2) Build a data layer fetching ensemble forecasts and CLOB prices; 3) Run 30 days of history and compute Brier score per city; 4) Add signal layer (bias_correction, entry window, threshold, bin separation); 5) Paper trade ≥2 weeks and log everything; 6) Risk real money only on 2–3 cities with confirmed Brier scores.

Author @0xd1namit adopts cryptovcdegen's six-step Polymarket weather-bot plan, emphasizing staged validation over jumping to live trading. The workflow mandates reading the Rules on 10 markets, building a data layer (ensemble forecasts + CLOB prices), 30 days of historical Brier-score checks, a bias-corrected signal layer, ≥2 weeks of paper trading, then live risk on 2–3 validated cities.

By @0xd1namit
34 sweatystartup.com 2026-05-11 10 min read
Open

My controversial opinion on entrepreneurship

Why it matters

Nick Huber (Sweaty Startup, published 2026-05-11) argues most entrepreneurs end up owning a job, not a company, because they repeatedly prioritize urgent tasks over important, long-term work.

  • He uses a concrete example: a college friend who runs a 1,500 sq ft pizza restaurant earning about $250,000/year but is still called in every Thursday–Saturday night to flip pizzas and solve staffing/equipment problems.
  • Huber frames the problem with a four-quadrant time-management matrix: urgent+important (fires), urgent+not-important (delegate), not-urgent+not-important (time-wasters), and important+not-urgent (the quadrant he says drives growth).
  • Operational claims and tactics: Huber says he has hired 400+ offshore workers across his portfolio, achieving roughly 80% cost savings versus U.S. equivalents, and offers a $500 discount on a hire for readers who respond.

Nick Huber argues that the core failure of many small-business owners is time allocation: they constantly solve urgent problems and never do the uncomfortable, important work that scales a company. He illustrates the point with a concrete anecdote about a friend who runs a 1,500 sq ft pizza restaurant making roughly $250,000 annually yet is still required on-site every Thursday–Saturday to handle staffing, equipment, and customer issues. Huber contends this reflects ownership of a job rather than an organization because the owner spends most of his time in urgent quadrants instead of the important-but-not-urgent quadrant that produces long-term growth.

Huber lays out a four-quadrant time-management matrix (urgent/important; urgent/not-important; not-urgent/not-important; important/not-urgent) and prescribes delegation, hard conversations on hiring and sales, and system-building as remedies. He also shares operational claims: he’s hired 400+ offshore staff across his portfolio at roughly 80% lower cost than U.S. equivalents and advertises a $500 hiring discount. Practically, he promotes a May 19, 2026 workshop on profitability and cites business metrics (386 self-storage rentals in the first eight days of May 2026 vs. 206 last year) to underscore applying these principles.

By Nick Huber
35 Twitter/X 2026-05-13 3 min read
Open

Gary Marcus (tweeted 2026-05-13) argues the most accurate claim is

Why it matters

Gary Marcus (tweeted 2026-05-13) argues the most accurate claim is: “there won’t immediately be an AI jobpocalypse,” but says asserting it will never happen is implausible and that the opposite claim — an AI “jobapalooza” — is even less plausible.

  • Marcus quotes Andrew Ng’s thread (linked) in which Ng asserts “There will be no AI jobpocalypse,” cites a U.S. unemployment rate of 4.3% and continued strong hiring of software engineers as evidence that net job creation likely exceeds destruction.
  • Ng explains economic incentives: frontier AI labs have reason to overhype capabilities; typical SaaS charges ~$100–$1,000/user/year but AI that can replace a $100,000 employee (or boost productivity 50%) could justify prices up to ~$10,000; firms may attribute layoffs to AI to mask pandemic-era overhiring.
  • Ng predicts an “AI jobapalooza” with many new AI-engineering roles and shifting skill requirements across industries and urges broader AI proficiency — a prediction Marcus explicitly characterizes as less plausible than simply denying an immediate jobpocalypse.

Gary Marcus (posting 2026-05-13) pushes back on absolutist takes about AI and jobs: he prefers saying there won’t be an immediate AI-driven jobpocalypse, but rejects the certainty that one will never occur and calls the converse claim — that AI will trigger a broad “jobapalooza” — even less believable. Marcus reproduces Andrew Ng’s longer thread, in which Ng argues “There will be no AI jobpocalypse,” noting a 4.3% U.S. unemployment rate and continued strong software-engineer hiring as signs that AI so far tends to create more jobs than it destroys. Ng also lays out economic incentives that drive hype: frontier labs benefit from dramatic narratives, AI pricing can be anchored to $100K salaries (making $10K/user pricing plausible), and firms may cite AI as a convenient explanation for layoffs. Ng concludes optimistically that AI will produce many new AI-engineering jobs and changing skill demands, while Marcus urges caution about overconfident predictions on either extreme.

By @GaryMarcus
36 substack.com 2026-05-11 3 min read
Open

AGL Q1 Follow Up: “At Least” $125 million

Why it matters

On May 11, 2026 Lake Cornelia reported that a post‑Q1 investor relations call clarified AGL’s 2026 cash guidance is “at least $125 million” (the company had $376M cash at 12/31/2025), a point that increased the author’s bullishness.

  • Author’s valuation math: ~18 million diluted shares, 2026E net cash of $105 million and a total enterprise value (TEV) of $885 million; the stock trades below 0.2x TEV/Revenue and at 2.4x Medical Margin, implying the market prices material insolvency risk.
  • The writeup argues AGL should rerate if management’s conservative guidance and upside to EBITDA play out — the author sees shares needing to reach at least $150+ (on a path the author labels toward $250) before the company is fairly priced as a going concern.
  • Key corporate detail: presumed prior‑year claims created a $250 million negative outflow assumption that the Q1 print, the IR call, and further analysis suggest could have upside (i.e., downside risk to cash is overstated); new CEO’s equity vests at $50/100/150 (granted when shares were < $30).

AGL — following a May 11, 2026 post‑earnings investor relations call — clarified its 2026 cash guidance is “at least $125 million” (versus $376M cash at end‑2025), which, together with the Q1 print and further analysis, left the author incrementally bullish about upside to EBITDA and the company’s balance‑sheet conservatism. The note presents simple math: ~18 million diluted shares, 2026E net cash ~$105M and a TEV of $885M, while the stock trades at under 0.2x TEV/Revenue and 2.4x Medical Margin — levels the author says reflect an outsized market fear of insolvency. The memo also highlights a presumed $250M prior‑year claims outflow that may be overstated, and corporate incentives (CEO grant vesting at $50/100/150 after grants made when shares were < $30). The author contends shares must rally to at least $150+ (on a longer path to $250) before being fairly priced as a going concern.

By Lake Cornelia Commentary
37 ArXiv 2026-05-11 1 min read
Open

Confidence-Guided Diffusion Augmentation for Enhanced Bangla Compound Character Recognition

Why it matters

Introduces a confidence-guided diffusion augmentation pipeline that uses class-conditional diffusion with classifier guidance, Squeeze-and-Excitation–enhanced residual blocks in the U-Net backbone, and a classifier-based confidence filter to keep only high-quality, class-consistent synthetic Bangla compound character samples.

  • On the AIBangla compound character dataset, augmented training improves multiple classifiers (ResNet50, DenseNet121, VGG16, Vision Transformer); the best model achieves 89.2% accuracy. Paper posted 2026-05-11 and reports outperforming the previously published AIBangla benchmark by a substantial margin.

Handwritten Bangla compound character recognition is improved via a confidence-guided diffusion augmentation approach that synthesizes class-conditional samples, enhances U-Net blocks with Squeeze-and-Excitation residuals, and filters generated images using pre-trained classifiers. Fusing filtered synthetic images with real data yields consistent gains across ResNet50, DenseNet121, VGG16 and ViT, with a top accuracy of 89.2% on AIBangla. Only the abstract was available for this summary.

Authors: Md. Sultan Al Rayhan, Maheen Islam
38 Twitter/X 2026-05-12 1 min read
Open

John F., born without arms, is a fully licensed driver who drives with his…

Why it matters

John F., born without arms, is a fully licensed driver who drives with his feet—left foot on the steering wheel and right foot for gas and brake; his only legal restrictions are automatic transmission and power steering.

  • He drove a Tesla Model 3 for seven years but developed significant hip arthritis from congenital defects and recently upgraded to a Model Y equipped with Full Self-Driving (FSD) Supervised.
  • John reports FSD Supervised “dramatically reduces the physical pressure and fatigue of driving,” has helped preserve his freedom and mobility, and calls the technology “life-changing accessibility.”

John F., born without arms, is a fully licensed driver who steers with his left foot and controls gas/brake with his right, restricted to automatic transmission and power steering. After seven years in a Model 3 and worsening hip arthritis, he upgraded to a Model Y with Full Self-Driving (FSD) Supervised; he says it dramatically reduces fatigue and preserves his independence.

By @Tesla
39 substack.com 2026-05-11 2 min read
Open

Live on 5/11: Macro, Oil, BW AGL TOI EVH ASTH COMP CAR LPRO

Why it matters

Lake Cornelia Commentary published a 32-minute recorded live video titled "Live on 5/11: Macro, Oil, BW AGL TOI EVH ASTH COMP CAR LPRO" on May 11, 2026 (author line © 2026 Judd Arnold).

  • The session focuses on macro and oil topics and mentions the tickers/subjects BW, AGL, TOI, EVH, ASTH, COMP, CAR, and LPRO; the full post is subscriber-only on Substack and thanks viewers Dysphemist, Tmoney, Anxur, Dragos, and Chen Li.

Lake Cornelia Commentary posted a 32-minute recording of a live video on May 11, 2026 (© Judd Arnold) covering macro and oil commentary and discussing specific tickers/topics listed in the title: BW, AGL, TOI, EVH, ASTH, COMP, CAR, and LPRO. The piece is paywalled on Substack; the author thanks named viewers for tuning in.

By Lake Cornelia Commentary
40 ArXiv 2026-05-11 1 min read
Open

HarmoWAM: Harmonizing Generalizable and Precise Manipulation via Adaptive World Action Models

Why it matters

HarmoWAM is an end-to-end World Action Model that conditions two complementary action experts—a predictive expert that generates iterative actions from latent dynamics and a reactive expert that infers actions from predicted visual evolution—coordinated by a Process-Adaptive Gating Mechanism to switch timing and location of use.

  • On three training-unseen real-world environments spanning six robotic manipulation tasks, HarmoWAM achieves zero-shot generalization and outperforms prior state-of-the-art VLA models by 33% and prior WAMs by 29%.
  • The authors identify a fundamental trade-off between paradigms: 'Imagine-then-Execute' provides generalizable transit but weaker interaction precision, while 'Joint Modeling' yields fine-grained, temporally coherent actions but is constrained by the training exploration space.

HarmoWAM presents an end-to-end World Action Model that unifies predictive and reactive control by conditioning a predictive expert and a reactive expert on spatio-temporal priors from a world model, with a Process-Adaptive Gating Mechanism to coordinate switching. Reported zero-shot generalization covers three unseen real-world environments and six manipulation tasks, improving over VLA and WAM baselines by 33% and 29%, respectively. Full paper text not available (abstract only).

Authors: Qiuxuan Feng, Jiale Yu, Jiaming Liu...
41 Twitter/X 2026-05-11 1 min read
Open

Andrej Karpathy said, “I don’t think I’ve typed a line of code since December,” a…

Why it matters

Andrej Karpathy said, “I don’t think I’ve typed a line of code since December,” a remark framed as a meme but presented here as evidence that one person can orchestrate an entire software team.

  • Garry Tan treated that prompt as a design challenge and claims gstack (an open-source repo) delivers ~810x higher pace versus 2013 when normalized for logical changes; gstack models AI as “CEO + staff eng + QA + security + design + release + browser operator + parallel execution layer.”
  • gstack’s notable components — /office-hours (product vetting), /autoplan (CEO/design/eng pass), /qa (drives a browser to find and fix bugs), /review (catches prod-tier issues), plus pair-agent and parallel sprints — illustrate a shift from “AI helps devs code” to “devs operate systems of AI workers.”

@KayvonJafar cites Andrej Karpathy’s line “I don’t think I’ve typed a line of code since December” and Garry Tan’s design prompt to have one person run a whole software team. He points to the open-source gstack as that answer, with Garry claiming ~810x higher pace; gstack wires AI roles (CEO, eng, QA, browser operator) into workflows like /office-hours, /autoplan, /qa, /review, pair-agent and parallel sprints, reframing development as operating systems of AI workers.

By @KayvonJafar
42 Twitter/X 2026-05-12 1 min read
Open

Marc Andreessen (@pmarca) claims the opposite of a Luddite zero‑sum outcome…

Why it matters

Marc Andreessen (@pmarca) claims the opposite of a Luddite zero‑sum outcome: programmers are “working harder than ever” and “more hours than ever,” often “stop sleeping… bleary‑eyed… completely exhausted, but… euphoric.”

  • Andreessen argues that increasing a worker’s marginal productivity via AI expands human work rather than diminishing it, and he made these points on an MTS live episode with Erik Torenberg covering AI, jobs, and culture.

Marc Andreessen presents AI as an ongoing economic experiment where, contrary to zero‑sum Luddite expectations, programmers are working longer hours and feeling exhausted yet euphoric; he asserts higher marginal productivity expands work. He discussed this on MTS with Erik Torenberg, in a 00:00–52:25 episode covering Anthropic, AI doomers, jobs (coder→builder), AI psychosis, polls, UFOs, and career advice.

By @a16z
43 Twitter/X 2026-05-12 2 min read
Open

Bolt and Lovable now let non-developers ship a working production app in an…

Why it matters

Bolt and Lovable now let non-developers ship a working production app in an afternoon — SaaS landing page, full backend, auth, payments and admin dashboard — work that previously required “four years of CS plus three years of frameworks.”

  • Upwork rates for basic web work fell 60% in 18 months as buyers started shipping the same sites themselves over a weekend.
  • The disruption follows past patterns (Photoshop, Squarespace): tools that collapse the intent→execution gap hollow out people whose value was knowing implementation details; survivors become power users and ship far more (one non-developer in a Bolt power-user channel ships more software per week than a 5-person 2022 agency).
  • Aakash Gupta highlights an AI Skills conference on May 14 (conf.cosprints.ai/?32): 20+ AI leaders, 5,000+ registered, 5+ hours, free on Zoom — sessions at 8 AM PT / 11 AM ET / 4 PM London featuring speakers from Google DeepMind, AWS, Meta, DoorDash, Spotify and others.

Aakash Gupta argues the biggest disruption of 2026 is mid-market web builders — the people who charged, for example, $150/hour to build dentist websites — not writers or designers. Tools like Bolt and Lovable let non-developers ship production apps in an afternoon (landing page, backend, auth, payments, admin) replacing what used to require “four years of CS plus three years of frameworks.” Market evidence: Upwork basic web rates dropped 60% in 18 months as buyers learned to self-serve. Gupta frames this as a recurring pattern when tools collapse the gap between intent and execution (Photoshop, Squarespace precedents) and says winners become 10x users of the tools. He plugs an AI Skills conf on May 14 (conf.cosprints.ai/?32) — free, 5+ hours, 8 AM PT/11 AM ET/4 PM London — with 20+ speakers and 5,000+ registered.

By @aakashgupta
44 ArXiv 2026-05-11 1 min read
Open

Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data

Why it matters

Information-theoretic: for mixed-quality sparse recovery the sufficient sample-size condition is a linear trade-off in (n1, n2) defining a 'Price of Quality' — in the agnostic decoder case one high-quality sample is never worth more than two low-quality samples for this sufficient condition.

  • Algorithmic: for the agnostic setting the LASSO recovery threshold matches the homogeneous-noise case and depends only on the average noise level (showing computational recovery is robust to heterogeneity), while an informed decoder (aware of per-sample variances) can make the price of quality arbitrarily large.

Sparse recovery from mixed-quality measurements (n1 high-quality, n2 low-quality) is studied via information-theoretic and algorithmic sample-size conditions. The paper proves a linear 'Price of Quality' trade-off: agnostic decoders cap one high-quality sample at no more than two low-quality ones, whereas informed decoders can exploit variance knowledge to make the price unbounded. LASSO matches homogeneous-noise thresholds, depending only on average noise. Published ICLR 2026 (arXiv 2026-05-11).

Authors: Youssef Chaabouni, David Gamarnik
45 Twitter/X 2026-05-12 1 min read
Open

@free_ai_guides (post published 2026-05-12) prescribes a seven-section prompt…

Why it matters

@free_ai_guides (post published 2026-05-12) prescribes a seven-section prompt structure for every Claude task: 1) Task, 2) Context Files, 3) Reference, 4) Success Brief, 5) Rules, 6) Conversation, and 7) Plan.

  • The author assigns specific roles to each section — e.g., Task = desired output + success criteria; Context Files = files Claude must read first; Reference = examples + extracted rules; Success Brief = output type/length/tone/what to avoid; Rules = constraints Claude checks; Conversation = clarifying questions; Plan = list top 3 rules and map approach — and claims prompt quality, not the model, usually determines usefulness.

@freeaiguides' seven-section prompt structure (Task; Context Files; Reference; Success Brief; Rules; Conversation; Plan) is presented as a repeatable template for every Claude job. The thread assigns explicit roles to each section, urges clarifying questions before execution, and argues that prompt design — not the model — is almost always the deciding factor in useful outputs.

By @free_ai_guides
46 Twitter/X 2026-05-12 2 min read
Open

Ibrahim Khan accuses Emmanuel Macron of rebranding as a “Pan-Africanist” in…

Why it matters

Ibrahim Khan accuses Emmanuel Macron of rebranding as a “Pan-Africanist” in Nairobi while previously humiliating Burkina Faso’s President Kaboré in November 2017 by jokingly telling him to “fix the air conditioning.”

  • He states that Mali, Burkina Faso and Niger evicted French troops (Operation Barkhane) between 2022–2024 and claims the CFA franc system forced African states to deposit 50% of their reserves in the French Treasury.
  • Khan calls Macron hypocritical for providing diplomatic cover for Israel in Gaza and argues the 11 new Nairobi deals (including a railway and fiber-optic projects) are a “desperate pivot” to East Africa after losing influence in the Sahel.

Ibrahim Khan condemns Emmanuel Macron’s Nairobi “Pan-Africanist” posture, citing a 2017 humiliation of President Kaboré, the 2022–2024 expulsions of French troops (Operation Barkhane), and the CFA franc’s 50% reserve rule as evidence of French domination. He labels Macron’s Gaza stance hypocritical and describes eleven Nairobi deals (railway to fiber optics) as PR-driven outreach after France’s Sahel defeats.

By @FrenchResponse
47 ArXiv 2026-05-11 1 min read
Open

Pixal3D: Pixel-Aligned 3D Generation from Images

Why it matters

Introduces pixel back-projection conditioning that lifts multi-scale image features into a 3D feature volume, establishing explicit pixel-to-3D correspondence and enabling pixel-aligned 3D generation in the input view rather than a canonical pose.

  • Reports substantial fidelity gains—'approaching the fidelity level of reconstruction'—and extends naturally to multi-view by aggregating back-projected volumes; also presents a modular pipeline for high-fidelity, object-separated 3D scenes (Dong-Yang Li et al., SIGGRAPH 2026; project: https://ldyang694.github.io/projects/pixal3d/).

Pixal3D (Li et al., SIGGRAPH 2026) presents a pixel-aligned 3D generation paradigm that back-projects multi-scale image features into a 3D feature volume to create explicit pixel-to-3D correspondences, generating assets aligned with the input view. The method reportedly substantially raises fidelity—'approaching the fidelity level of reconstruction'—and extends to multi-view and scene synthesis.

Authors: Dong-Yang Li, Wang Zhao, Yuxin Chen...
48 mail.thepublishpress.com 2026-05-11 7 min read
Open

The Internet’s Biggest Stream...Was Fake? 👥

Why it matters

IShowSpeed’s YouTube livestream in the Dominican Republic was reported at a 1.9M peak concurrent viewers but the creator confirmed the true peak was ~300K—about 1.6M of the reported viewers were viewbots.

  • Viewbotting services charge roughly $0.28–$10 per 10 fake views depending on platform; YouTube said Speed did not pay for viewbots, warned repeat violators face channel termination, and Twitch CEO Dan Clancy announced temporary caps on concurrent viewers for channels using viewbots.
  • A StreamCharts study found ~10% of Twitch channels averaging 50+ viewers were flagged for suspicious viewing activity (Kick flagged ~16%), raising advertiser and creator concerns about falsified metrics.
  • Creator-economy business highlights: Dhar Mann Studios (CEO Sean Atkins) closed deals with NFL, Fox and Samsung by leveraging direct access, community-first asks, and 'and, not or' deals; comedy creator Druski spent $100K on a megachurch sketch (150M+ views but no direct profit); OnlyFans valuation rose to $3.15B after selling 16% for $535M; Hunter Peterson collected ~371K non-binding pledges totalling ~$337M toward a tongue-in-cheek Spirit Airlines buyout.

IShowSpeed’s viewbotted stream in the Dominican Republic sparked platform and industry action after the creator revealed a reported 1.9M peak concurrent viewers was actually about 300K, with ~1.6M automated viewbots inflating the count. Viewbot services sell falsified views for roughly $0.28–$10 per 10 views; YouTube said Speed didn’t buy bots and warned channels face termination for repeat violations while Twitch CEO Dan Clancy announced temporary caps on concurrent viewers for streams using bots. Research from StreamCharts flagged ~10% of Twitch channels averaging 50+ viewers (16% on Kick) for suspicious activity, a problem that undermines brand measurement. The newsletter also profiles Dhar Mann Studios’ CEO Sean Atkins—who used direct access, community-led proposals, and collaborative 'and not or' deals to land partnerships with NFL, Fox and Samsung—and notes creator-economy headlines: Druski reinvested $100K into a sketch with 150M+ views, OnlyFans reached a $3.15B valuation after a $535M partial sale, and Hunter Peterson amassed ~371K non-binding pledges (~$337M) for a jokey Spirit Airlines buyout.

By The Publish Press 💬
49 Twitter/X 2026-05-12 1 min read
Open

Profound announced the first-ever Marketing Engineering Hackathon on June 6, 2026…

Why it matters

Profound announced the first-ever Marketing Engineering Hackathon on June 6, 2026 in Union Square, NYC: one day, 50 builders, and $40,000 in total prizes.

  • Two prize tracks—Best Overall Build and Best Profound-Native Agent—each award $10,000 cash, $10,000 in Profound agent credits, and an interview at Profound.
  • The event is platform-agnostic (Profound access plus use of Claude Code, Cursor, n8n, Python, LangChain, raw APIs), will be judged by people from Ramp, Stripe, and MongoDB, and requires an application; only 50 spots.

Profound's Marketing Engineering Hackathon (June 6, 2026, Union Square, NYC) will host 50 builders for a one-day challenge to automate inhuman-scale marketing processes, offering $40,000 total in prizes and judges from Ramp, Stripe, and MongoDB. The platform-agnostic event allows Claude Code, Cursor, n8n, Python, LangChain, or raw APIs; winners receive cash, Profound credits, and interviews. Application required.

By @hasantoxr
50 ArXiv 2026-05-11 1 min read
Open

Variational Inference for Lévy Process-Driven SDEs via Neural Tilting

Why it matters

Introduces a neural exponential tilting variational family that exponentially reweights the Lévy measure with neural networks, preserving the jump structure of Lévy-driven SDEs while remaining computationally tractable (paper: "Variational Inference for Lévy Process-Driven SDEs via Neural Tilting", Kindap et al., arXiv:2605.10934v1, published 2026-05-11).

  • Proposes a quadratic neural parametrization that yields closed-form normalization of the tilted measure and a conditional Gaussian representation for stable processes to facilitate simulation, plus symmetry-aware Monte Carlo estimators for scalable optimization.
  • Empirically outperforms Gaussian-based variational approaches in capturing jump dynamics and producing reliable posterior inference on both synthetic and real-world datasets, according to the abstract (official code linked on the project page).

Variational Inference for Lévy Process-Driven SDEs via Neural Tilting introduces a neural exponential tilting framework that reweights the Lévy measure with neural nets to build a flexible variational family preserving jumps. A quadratic parametrization gives closed-form normalization; a conditional Gaussian representation for stable processes enables simulation and scalable, symmetry-aware Monte Carlo optimization. Summary based on the abstract (full text not reviewed).

Authors: Yaman Kindap, Manfred Opper, Benjamin Dupuis...
51 Twitter/X 2026-05-12 1 min read
Open

GBrain merged 14 PRs in 72 hours, with a net +28,746 / -1,173 lines of production…

Why it matters

GBrain merged 14 PRs in 72 hours, with a net +28,746 / -1,173 lines of production code.

  • Top merged PRs: #885 'facts join system-of-record' (+5,682) implementing a hot-memory layer; #795 'takes v2' (+5,306) rewritten from 100K-take production learnings; #796 'extract facts during sync' (+3,418) enabling real-time hot memory.
  • Release advanced v0.31.2 → v0.32.4 with eight version bumps; other highlights include #859 functional-area resolvers (+3,166) for routing-table compression, #810 five new embedding recipes (+1,818) that closed a 17-PR cluster, and #804 adapting five community PRs (+828).

GBrain underwent a rapid three-day sprint: 14 PRs merged and nearly 29K net added lines, pushing the project from v0.31.2 to v0.32.4. Key work focused on hot-memory and real-time fact extraction (#885, #796), a major rewrite of take handling (#795), routing/resolver improvements, embedding recipes, auto-upgrade and community fixes.

By @garrytan
52 ArXiv 2026-05-11 1 min read
Open

Geometry-aware Prototype Learning for Cross-domain Few-shot Medical Image Segmentation

Why it matters

GeoProto (Feifan Song, Yuntian Bo, Haofeng Zhang; arXiv:2605.10885v1, posted 2026-05-11) introduces Geometry-Aware Prototype Enrichment (GAPE) which augments local appearance prototypes with a learned geometric offset encoding an ordinal position within an organ's interior topology.

  • The geometric offset is produced by an auxiliary Ordinal Shape Branch (OSB) trained with an ordinally consistent objective that requires no annotations beyond standard segmentation masks; extensive experiments on seven datasets across three evaluation settings (cross-modality, cross-sequence, cross-context) report state-of-the-art performance.

GeoProto addresses cross-domain few-shot medical image segmentation by enriching prototypical matching with explicit geometric priors. The method's core, GAPE, augments appearance prototypes with learned ordinal offsets from an Ordinal Shape Branch trained under an ordinal-consistency loss (no extra labels beyond segmentation masks). According to the abstract, GeoProto attains state-of-the-art results across seven datasets and three settings (cross-modality, cross-sequence, cross-context). Only the paper abstract was available for this briefing.

Authors: Feifan Song, Yuntian Bo, Haofeng Zhang
53 Twitter/X 2026-05-12 1 min read
Open

On 2026-05-12 @emollick said frontier model writing shows distinct style and…

Why it matters

On 2026-05-12 @emollick said frontier model writing shows distinct style and tone, varied sentence length, and strong phrasing, but has notable weak spots in fiction and recurring tics; he warned its sheer volume online has rendered it clichéd.

  • Roon (@tszzl) agreed frontier models write clearly and recognizably—with tics that lower their 'aura' and can void value—yet insisted it's mostly wrong to claim model writing lacks analytical or informational worth.

Author @emollick praises frontier model writing for its sense of style, tone, varied sentence structure and memorable phrasing, but criticizes its fiction weaknesses and repetitive tics, saying ubiquity has made it clichéd. Roon (@tszzl) concurs on clarity and tics but rejects the claim that model prose lacks analytical or informational value.

By @emollick
54 substack.com 2026-05-11 13 min read
Open

On redistricting, Democrats are playing as the away team

Why it matters

Nate Silver (Silver Bulletin, May 11, 2026) says Democrats have lost ground in the 2026 mid‑decade redistricting fight after Florida and other GOP states redrew maps and two legal setbacks: the U.S. Supreme Court's Callais decision (weakening the Voting Rights Act) and the Virginia Supreme Court invalidating Virginia’s redistricting referendum on procedural grounds.

  • New York Times analyst Nate Cohn’s estimates shift the national map from roughly even (median district ~0.1 point R of the nation) to a Republican tilt of about R+2.5 to R+3.9 for November 2026; Silver notes the Virginia referendum’s portrayed 4-seat Democratic pickup was probabilistically closer to 2.5–3 seats.
  • Silver’s trackers show Democrats leading the generic congressional ballot by 6.1 points; prediction markets (Polymarket) trimmed Democrats’ chance of a House majority from ~87% to ~78% after the developments.
  • Court landscape matters: the U.S. Supreme Court has a 6–3 conservative majority, and Silver’s AI‑assisted rating of state supreme courts (using Google Gemini, Claude, ChatGPT to assess de facto behavior) tallies Republican-leaning courts = 264 electoral votes, Democrat-leaning = 252, tossups = 22 (Strong R 232 vs Strong D 205).

Redistricting in 2026 has tilted modestly toward Republicans after a combination of red‑state map redraws (notably Florida) and two legal blows to Democrats: the Supreme Court’s Callais decision that weakened the Voting Rights Act and the Virginia Supreme Court’s procedural nullification of a Virginia referendum that had appeared to favor Democrats. Nate Cohn’s district estimates move the national map from near‑parity (median district ≈ 0.1 point R of the country) to a Republican advantage of roughly R+2.5 to R+3.9; Silver argues the Virginia “+4” story was probably a 2.5–3 seat probabilistic gain. Democrats still lead the generic congressional ballot by 6.1 points in Silver’s tracker, but Polymarket cut Dems’ House‑majority probability from ~87% to ~78% after the rulings.

Silver emphasizes that courts — especially a 6–3 conservative U.S. Supreme Court and a modest Republican edge among state high courts (his AI‑assisted tally: R 264 EVs, D 252, tossup 22) — will shape how aggressive partisan maps fare. He recommends Democrats avoid both despondency and complacency: pursue state legislative and judicial gains, use probabilistic modeling (Silver Bulletin’s midterms model due in ~6 weeks) to assess outcomes, and balance aggressive redistricting in blue states with sensitivity to public opinion and potential legal scrutiny to reduce long‑term vulnerability heading into 2028.

By Silver Bulletin
55 Twitter/X 2026-05-12 3 min read
Open

Howard Schultz, former Starbucks CEO and chairman emeritus, says Starbucks will…

Why it matters

Howard Schultz, former Starbucks CEO and chairman emeritus, says Starbucks will shift “hundreds” of corporate roles from Washington state to Tennessee and that he no longer lives in Washington.

  • Schultz blames local leadership—singling out Seattle Mayor Katie Wilson’s “socialist rhetoric”—and Washington’s tax choices, calling the reliance on sales tax (10.55% in Seattle) deeply regressive.
  • He points to weakening local anchors: Microsoft and Amazon have slowed hiring and reduced head counts, while Seattle faces chronic homelessness, downtown vacancies, persistent budget deficits and declining public‑school outcomes.
  • Schultz urges rewriting Washington’s tax code, adopting pro‑entrepreneurship policies (citing a bipartisan National Governors Association initiative) and says his family foundation remains invested even as he warns that hostile rhetoric and policy will drive future entrepreneurs away.

Howard Schultz argues that Seattle and Washington have become hostile to the companies that built the region’s prosperity, saying Starbucks will move hundreds of corporate roles to Tennessee and that he no longer resides in the state. He attributes the shift to political rhetoric from Seattle Mayor Katie Wilson and to state fiscal choices—highlighting a 10.55% sales tax in Seattle and a tax system he calls regressive—that he says discourage business growth. Schultz points to slowed hiring at Microsoft and Amazon, downtown vacancies, chronic homelessness, budget deficits and falling school outcomes as signs the ecosystem is fracturing. He calls for tax‑code reform, accountable public spending and pro‑entrepreneurship policies (referencing a bipartisan NGA initiative) while noting his family foundation remains invested in Washington’s future.

By @VijayInWA
56 Twitter/X 2026-05-12 1 min read
Open

@FrenchResponse states that the ten agreements signed between France and Kenya on…

Why it matters

@FrenchResponse states that the ten agreements signed between France and Kenya on May 12, 2026 contain 'nothing secret' and offers to post all details publicly.

  • New Direction AFRICA alleges that President Emmanuel Macron and Kenya's William Ruto secretly signed 11 instruments at Nairobi's State House with no parliamentary debate, calls it 'Françafrique 2.0', and claims France was 'kicked out' of the Sahel after decades of neo‑colonial plunder—warning the deals threaten Kenyan sovereignty and resources.

France–Kenya agreements: @FrenchResponse says the ten agreements signed between France and Kenya on May 12, 2026 contain 'nothing secret' and offers to publish all details. New Direction AFRICA alleges Macron and President William Ruto secretly signed 11 instruments at Nairobi's State House with no parliamentary debate, calls it 'Françafrique 2.0' and warns of lost sovereignty and resources.

By @FrenchResponse
57 Twitter/X 2026-05-12 1 min read
Open

Trading on Polymarket via PolyHelper automatically farms two airdrops

Why it matters

Trading on Polymarket via PolyHelper automatically farms two airdrops: the Polymarket airdrop and a PolyHelper airdrop.

  • PolyHelper refunds/backs trading fees as cashback — they refunded all Polymarket fees paid on one randomly selected day, May 8.
  • PolyHelper adds extra info and UX features to Polymarket, is offered for free, and the author (@0xd1namit) states they see no reason not to use it.

When you trade on Polymarket with PolyHelper, you automatically farm two airdrops — Polymarket’s and a PolyHelper airdrop — while PolyHelper reimburses trading fees (they refunded all fees paid to Polymarket on a randomly chosen day, May 8). PolyHelper also adds extra UX info/features and is free; the author (@0xd1namit) urges there’s no reason not to use it.

By @0xd1namit
58 ArXiv 2026-05-11 1 min read
Open

When Are Trade-Off Functions Testable from Finite Samples?

Why it matters

Finite VC-dimension is necessary and sufficient: when Neyman–Pearson rejection regions for (P,Q) are attainable (up to null sets) by a prescribed class S, finite Vapnik–Chervonenkis dimension of S is both sufficient and necessary for nontrivial finite-sample testing of the trade-off curve.

  • They construct a test with nonasymptotic guarantees: type I error control holds without the attainability assumption, while uniform power holds over attainable alternatives satisfying an explicit separation condition; inverting the test yields simultaneous confidence bands for the entire trade-off function. Approximate attainability gives finite-sample guarantees for univariate log-concave distributions (via unions of intervals).
  • In the monotone likelihood-ratio model the authors derive local separation rates and prove matching lower bounds up to logarithmic factors. Paper: Kaining Shi, Qiaosen Wang, Cong Ma, arXiv:2605.10774v1, posted 2026-05-11.

The paper studies finite-sample hypothesis testing for the type I/type II error trade-off function between two unknown distributions. Without structure testing is impossible; they identify exact attainability of Neyman–Pearson rejection regions by a class S and show finite VC-dimension of S is necessary and sufficient. They give a test with nonasymptotic type I control and uniform power under explicit separations, produce simultaneous confidence bands, analyze local rates (monotone likelihood-ratio) with near-matching lower bounds, and extend to approximate attainability (e.g., univariate log-concave).

Authors: Kaining Shi, Qiaosen Wang, Cong Ma
59 substack.com 2026-05-12 5 min read
Open

Tuesday: Three Morning Takes

Why it matters

Steve Hilton, a British‑American ex‑Fox News host running as one of 62 candidates for California governor, drew online backlash after a recent weekend appearance in which he mischaracterized a hard‑shell taco as a 'street taco.'

  • OpenAI's secondary sale last October involved more than 600 employees selling about $6.6 billion in shares (WSJ); the newsletter reports roughly 75 employees made ~$30 million each, and links that windfall to intensified San Francisco rental pressure — citing a 1‑bed in Alamo Square listed at $8,000/month and rental bidding demands like six months' rent upfront (The Standard).
  • Rachel Dolezal (now using the name Nkechi Diallo) is profiled as moving into sex coaching after controversy; Pirate Wires notes her OnlyFans activity and a Daily Mail‑reported welfare‑fraud charge alleging $8,847 in improper benefits.
  • Pirate Wires frames these items with a snarky, opinionated tone and references WSJ, The Standard, and the Daily Mail as sources.

Pirate Wires Daily's 'Three Morning Takes' (published May 12, 2026) delivers three punchy cultural notes: a weekend campaign gaffe by Steve Hilton — a British‑American former Fox host and one of 62 gubernatorial hopefuls — who sparked online ridicule for calling a hard‑shell taco a 'street taco'; a housing snapshot linking last October's OpenAI secondary sale (more than 600 employees sold roughly $6.6 billion in shares, with about 75 reportedly netting ~ $30 million each, per the WSJ) to escalating San Francisco rental extremes (the newsletter cites a 1BR in Alamo Square at $8,000/month and reports of bids requiring six months' rent upfront from The Standard); and a profile of Rachel Dolezal/Nkechi Diallo pivoting to sex coaching amid OnlyFans activity and a Daily Mail‑reported $8,847 welfare‑fraud charge. The piece mixes sourced facts with sarcastic commentary on politics, housing, and culture.

By Pirate Wires Daily
60 Twitter/X 2026-05-13 1 min read
Open

Claude decides whether to load a skill based on a single-line description…

Why it matters

Claude decides whether to load a skill based on a single-line description; descriptions under 100 characters often remain invisible. In tests of 25 skills across 75 runs, a recipe-planner with a 37-character description ("Suggest recipes from what's in fridge.") failed to trigger for most of 10 prompts until its description was rewritten with 3 real-user trigger phrases, third-person voice, and an explicit "do not use X / use /Y instead" boundary.

  • After routing, follow six post-load rules: (1) write commands, not requests (e.g., "Flag every issue with severity Critical/High/Medium/Low" beats "could you take a look"), (2) provide a read-first 3-column table (Source, Path, What to extract), (3) include one worked input/output example, (4) keep skills under 500 lines (safety rules at line 700 never fire), and (5) pair every "do not use for X" with a "use /Y instead" pointer. A full audit checklist and a 10-sub-agent eval prompt are available at aibyaakash.com/p/claude-skil….

Author @aakashgupta (2026-05-13) shows that Claude's routing relies on a single-line skill description, causing sub-100-character descriptions to be invisible. He audited 25 skills (75 runs), fixed a failing recipe skill by rewriting its description with explicit triggers and boundaries, and provides a six-rule post-load checklist plus an eval prompt and audit kit for reliable skill design.

By @aakashgupta
61 e.economist.com 2026-05-12 8 min read
Open

Analysing Africa: A memorable trip to Goma

Why it matters

John McDermott (Chief Africa correspondent) visited Goma in early May 2026; M23 seized the city more than a year earlier and granted him access to its leadership after a frisking by armed men, where he was shown an exclusive slide deck pitching M23’s minerals to the Trump administration.

  • M23 has pursued a ‘Kigalification’ of Goma: it created a new police force with uniforms resembling Rwanda’s and introduced forced city cleanings every Saturday, which reduced petty crime and made the city cleaner but did not eliminate wider disorder.
  • Humanitarian services are strained: Médecins Sans Frontières resumed a clinic on the slopes of Nyiragongo where staff treated cases including a one‑month‑old with pneumonia and long queues of malnourished children; displaced people such as Kavira, 36, report acute need after her husband’s motorbike was stolen and being told to return to a village 200 km away in a war zone.
  • Diplomatic efforts (US‑overseen “Washington Accords” between DR Congo and Rwanda and a separate Qatar‑led process with M23) have not produced a permanent ceasefire; the US imposed sanctions on Rwanda’s army in March 2026, but Rwanda continues what it calls ‘defensive measures’ while M23 builds a quasi‑state in the Kivus.

Goma has been living under M23 control for more than a year, and the city’s inhabitants—what the author calls the forgotten people of eastern Congo—face a mix of order and acute suffering. M23 has introduced measures modelled on Rwanda (a new police force with Rwanda‑style uniforms and compulsory Saturday cleanings) that have reduced petty crime, even as everyday chaos persists. The author visited in early May 2026, was given access to M23’s leadership and shown a slide deck marketing the group’s mineral resources to the Trump administration. Médecins Sans Frontières has resumed a clinic on Nyiragongo’s slopes treating pneumonia in infants and feeding queues of malnourished children, but many aid groups have withdrawn and the UN’s World Food Programme deems much of the region too dangerous for full‑scale operations. Diplomacy (US Washington Accords; Qatar mediation) and March 2026 US sanctions on Rwanda have not yet produced a lasting ceasefire.

By The Economist
62 Twitter/X 2026-05-12 1 min read
Open

Market: “Putin out as President of Russia by December 31, 2026?” — author entered…

Why it matters

Market: “Putin out as President of Russia by December 31, 2026?” — author entered to farm LP rewards when the reward pool was above $100 and LP yields were ~4%

  • Author flipped a filled order a few days later by listing it 1¢ higher (a +1.14% spread) and it sold quickly
  • Author judges the market as very low-risk — calls it “basically just a bet on his age,” believes fair price is well above $0.89

The Polymarket market “Putin out as President of Russia by December 31, 2026?” was used by @0xd1namit to farm LP rewards (reward pool > $100, ~4% yields) and capture small spreads. He bought, relisted 1¢ higher (+1.14%) and sold quickly; he considers the outcome low-risk and thinks the fair price exceeds $0.89.

By @0xd1namit
63 Twitter/X 2026-05-12 1 min read
Open

CloakBrowser (post by @hasantoxr, published 2026-05-12) is a stealth Chromium…

Why it matters

CloakBrowser (post by @hasantoxr, published 2026-05-12) is a stealth Chromium that scores 0.9 on reCAPTCHA v3, passes 14/14 bot-detection tests, auto-resolves Cloudflare Turnstile, reportedly beats FingerprintJS and BrowserScan, and is a one-line drop-in Playwright replacement.

  • The project is 100% open-source under the MIT license, distributed via pip as a ~200MB binary; the author says it patches Chromium's C++ in 16 places (canvas, WebGL, audio fingerprint, fonts, hardwareConcurrency, GPU vendor strings, WebDriver flag, TLS fingerprint) so detection sites see a 'real browser'.
  • Author contrasts costs: Bright Data scraping browser ~$500+/month, Browserless stealth ~$200+/month, custom anti-detect builds ~$10K+ engineering — CloakBrowser is presented as a low-cost alternative installable with a pip command.

CloakBrowser is a MIT‑licensed stealth Chromium (pip install, ~200MB) that the author says achieves human-like anti-bot fidelity — 0.9 on reCAPTCHA v3 versus stock Playwright's 0.1 — by patching Chromium's C++ in 16 spots (canvas, WebGL, audio, fonts, hardwareConcurrency, GPU strings, WebDriver flag, TLS fingerprint). It claims to pass 14/14 detection tests, auto-resolve Turnstile, and replace commercial $200–$500+/month stealth browsers or costly custom builds.

By @hasantoxr
64 ArXiv 2026-05-11 1 min read
Open

CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

Why it matters

CapVector (Wenxuan Song et al., arXiv 2026-05-11) extracts transferable "capability vectors" by training two finetuned models on a small task set with distinct strategies (one enhancing general capabilities, one fitting task-specific action distributions) and using their parameter difference as the capability vector.

  • Merging capability vectors into pretrained vision-language-action models and using a lightweight orthogonal regularization during standard supervised finetuning achieves performance comparable to auxiliary-objective finetuning while reducing computational overhead; authors report the vectors are effective across diverse models and generalize to novel environments and embodiments.

CapVector introduces a method to decouple auxiliary-objective finetuning into two parameter-space components: general capability enhancement and task-specific action fitting. By finetuning two small-scale models with different strategies and taking their parameter difference as a capability vector, then merging it into pretrained VLA models (plus an orthogonal regularizer), standard SFT attains auxiliary-like gains with lower compute. Full text not provided here.

Authors: Wenxuan Song, Han Zhao, Fuhao Li...