Read Briefing · 2026-05-14

Briefing

99 items ·2026-05-14T20:53
MUST READ

Read these first.

9 items
ArXiv 2026-05-12 1 min read

Towards Affordable Energy: A Gymnasium Environment for Electric Utility Demand-Response Programs

Why it matters

DR-Gym: an open-source, Gymnasium-compatible simulator introduced on arXiv 2026-05-12 by Jose E. Aguilar Escamilla, Lingdong Zhou, Xiangqi Zhu, and Huazheng Wang for training electric-utility-level demand-response policies.

Key details

  • Simulator includes a regime-switching wholesale price model calibrated to real-world extreme events, physics-based building demand profiles, and a configurable multi-objective reward; authors demonstrate realism and learnability via baseline strategies and data snapshots.

Brief

DR-Gym is an open-source Gymnasium-compatible environment that trains and evaluates demand-response from the electric utility perspective, addressing the missing feedback loop in offline smart-meter datasets. The work pairs physics-based building demand profiles with a regime-switching wholesale price model calibrated to real-world extreme events and a configurable multi-objective reward. Unlike device-level simulators, DR-Gym targets market-level utility decision-making and demonstrates realistic, learnable scenarios using baseline strategies.

Authors: Jose E. Aguilar Escamilla, Lingdong Zhou, Xiangqi Zhu...
divenewsletter.com 2026-05-13 5 min read

NEMA projects U.S. annual electricity consumption will rise 55% by 2050; it also…

Why it matters

NEMA projects U.S. annual electricity consumption will rise 55% by 2050; it also forecasts data center energy use increasing 300% over the next 10 years.

Key details

  • FERC Chair Laura Swett warned PJM may be “too big to function,” noting the 13 PJM states plus D.C. have divergent regulatory structures and said FERC will convene a governance-reform conference in July 2026.
  • Fluence Energy signed master supply agreements with two unnamed “major” hyperscalers earlier than expected; Jefferies analyst Julien Dumoulin-Smith called the deals “significant progress” toward an emerging data center thesis.
  • An E3 analysis found medium- and heavy-duty fleet electrification in California and Georgia could modestly lower residential rates by 2035 if grid upgrades are proactively and carefully managed; CAISO reports its Extended Day-Ahead Market (EDAM) is performing within expected price ranges and CEO Elliot Mainzer said battery storage is now a major Western-grid player.

Brief

U.S. electricity consumption is expected to surge—NEMA forecasts a 55% increase in annual demand by 2050 and a 300% rise in data center energy use over the next decade—raising pressure on grid planning and market structures. FERC Chair Laura Swett signaled governance concerns at PJM, saying the 13 member states plus D.C. have fundamentally different regulatory models and announcing a July 2026 conference to explore reforms. In the market, Fluence Energy struck master supply agreements with two major hyperscalers ahead of schedule, a development Jefferies’ Julien Dumoulin-Smith described as material to Fluence’s data center thesis. Policy and operations pieces highlight tradeoffs: an E3 study shows medium/heavy-duty fleet electrification in California and Georgia could modestly reduce residential rates by 2035 if grid investments are well-timed, while CAISO reports EDAM prices and transfer volumes are within expectations and credits battery storage as a growing Western-grid resource.

By UD: Load Management
datacenterdynamics.com 2026-05-11 1 min read

Broadcast: Turning constraint into advantage with energy resilience for AI infrastructure

Why it matters

AI workloads are making power the defining constraint on data center growth; Data Centre Dynamics' Rehlko session was published 11 May 2026 and frames energy resilience as central to future capacity planning.

Key details

  • Speakers recommend integrated energy solutions to reduce reliance on constrained grids and maximize fuel flexibility — including natural gas, HVO, hydrogen, RNG and battery energy storage systems (BESS).
  • The session stresses trade-offs operators must manage (uptime, efficiency, load curtailment and emissions) and shares strategic lessons from power‑system design through site operations to build adaptive, continuous‑power AI environments.

Brief

Rethinking energy resilience for AI infrastructure (DCD Rehlko session, published 11 May 2026) argues power is now the limiting factor for data center growth. It advocates integrated backup and continuous‑power architectures, fuel flexibility across natural gas, HVO, hydrogen, RNG and BESS, and explicit trade‑offs between uptime, efficiency, curtailment and emissions in site design and operations.

By DCD Mission Critical Power
substack.com 2026-05-13 20 min read

How ASML took over the world

Why it matters

ASML evolved from a 1984 Philips spin‑out (Advanced Semiconductor Materials Lithography) to the sole commercial supplier of extreme ultraviolet (EUV) lithography by 2018, after commercializing its PAS 5500 (1991) and then winning the EUV race with partners TSMC, Intel and Samsung.

Key details

  • EUV machines use 13.5‑nanometer light generated by vaporizing tin droplets with a two‑pulse laser (pre‑pulse to shape the droplet, main pulse to create plasma); mirrors (no lenses) must be polished to near‑perfect tolerance and the machines require >100,000 components and large logistics (40 containers, 3 cargo planes, 20 trucks).
  • ASML invested heavily in R&D (>$1 billion/year by 2015) and benefited from public‑private programs: the 1997 Extreme Ultraviolet LLC injected >$270 million over six years and ASML acquired Silicon Valley Group in 2001, positioning it as the lead implementer of the Engineering Test Stand prototype.
  • Strategic alliances and financing included selling 23% of ASML in 2012 to its three largest customers (Intel, TSMC, Samsung) and a $2.5 billion acquisition of light‑source maker Cymer; TSMC/ASML achieved throughput of ~500 wafers/day for a month to validate production EUV.

Brief

ASML — the Dutch firm spun out of Philips in 1984 — became the decisive winner in modern photolithography by combining modular manufacturing, deep customer partnerships, and sustained R&D to commercialize extreme ultraviolet (EUV) lithography. EUV uses 13.5‑nm light produced by vaporizing tiny tin droplets with a dual‑pulse laser; because materials absorb EUV, the system images via ultra‑precise reflective optics (no lenses) and shrinks mask patterns about four‑times onto 300‑mm wafers. The company’s technical path included immersion lithography with 193‑nm light and the TWINSCAN dual‑stage architecture (which eliminated idle measurement time), followed by decades of work on mirrors, light sources and debris control that only reached production viability around 2018. ASML’s rise depended on coordinated public‑private programs (the 1997 Extreme Ultraviolet LLC with >$270M invested), acquisitions (Silicon Valley Group in 2001; Cymer for ~$2.5B), and a 2012 co‑investment where Intel, TSMC and Samsung bought a 23% stake to sustain EUV commercialization. TSMC collaboration helped achieve ~500 wafers/day throughput targets; today EUV tools price >$120M, ASML’s market cap tops ~$400B, and its global supply chain (>5,000 suppliers, heavy European sourcing) supports its effective EUV monopoly while insulating it from single‑country risks.

By Works in Progress
substack.com 2026-05-13 41 min read

Austin Energy Enters the Next Phase of Decarbonization

Why it matters

Austin Energy reached 65% carbon-free generation in 2024 and is targeting 100% carbon-free supply of customer load by 2035 under its Resource, Generation and Climate Protection Plan to 2035.

Key details

  • The utility is a vertically integrated non‑opt‑in municipal in ERCOT, serving ~1 million people (580,000 customer accounts) and ranking among the largest U.S. public utilities (8th largest publicly owned).
  • Austin Energy is planning for roughly 500 MW of plausible new local load over the next few years on a roughly 3,000 MW peaking system; management says most of that growth is not data centers but diverse electrification and infrastructure projects.
  • To address local reliability and 'load‑zone price separation' after retiring 725 MW at Decker Creek (≈300 MW retired 2020, ≈425 MW retired 2022), Austin Energy is funding transmission import upgrades (~$100M/year in its CIP for five years) and adding local resources including a proposed gas peaker for reliability and black‑start capability.

Brief

Austin Energy, the municipally owned utility serving about 1 million people in the Austin area (≈580,000 accounts), is moving from early clean‑energy leadership into a new phase of operational planning to hit 100% carbon‑free generation as a share of load by 2035. The utility reported 65% carbon‑free supply in 2024 and is implementing its Resource, Generation and Climate Protection Plan to 2035. Stuart Reilly (General Manager) and Lisa Martin (COO) describe a hybrid portfolio approach: continued aggressive clean procurement where it hedges costs, combined with local dispatchable capacity, utility‑scale batteries, distributed batteries, transmission import upgrades and demand response to manage local reliability and market exposure inside ERCOT.

Key technical and market details: Austin retired ~725 MW of local gas capacity at Decker Creek (≈300 MW in 2020 and ≈425 MW in 2022), which contributed to the emergence of load‑zone price separation that undermines the hedging value of distant PPAs. Management gives the example of PPAs at roughly $50/MWh becoming ineffective when local nodal or zone prices spike (e.g., $850/MWh during critical ramps). To shrink that local premium the utility is investing in transmission (about $100 million per year in its capital improvement plan for five years) and adding local resources. Recent and in‑flight procurement includes a contracted 100 MW / 4‑hour battery with Jupiter Power, a ~40 MW distributed battery deal with Base Power (paired with a customer program offering a $500 upfront rebate plus ~$300/year performance payments), and authorization to negotiate an additional 100 MW / 2‑hour battery. Austin Energy estimates ~500 MW of credible near‑term new load on a ~3,000 MW peak system (mostly beneficial electrification and infrastructure rather than primarily data centers). Reilly and Martin emphasize the muni model’s community engagement—returning over $100 million to customers after Winter Storm Uri—and the need to communicate trade‑offs among reliability, affordability and environmental goals while using local dispatchable resources (including a proposed gas peaker for black‑start and long‑duration reliability) as a bridge to deeper decarbonization.

By Texas Energy and Power Newsletter
datacenterdynamics.com 2026-05-13 1 min read

eBook: Redefining the data center for AI

Why it matters

Schneider Electric Innovation Summit estimate cited in the eBook: 88% of companies are pursuing some form of AI use.

Key details

  • DCD Edge Infra & Inference Channel eBook 'Redefining the data center for AI' (published 13 May 2026 on datacenterdynamics.com) consolidates Schneider Electric and partner guidance for AI-ready infrastructure.
  • The eBook provides practical strategies for high-density AI workloads, focusing on balancing power, cooling, efficiency and scalability and outlining technologies and architectures for next‑generation data centers.

Brief

The eBook 'Redefining the data center for AI' (DCD Edge Infra & Inference Channel, published 13 May 2026 on datacenterdynamics.com) compiles insights from Schneider Electric and partners, citing an Innovation Summit estimate that 88% of companies pursue AI, and provides practical guidance on designing AI-ready data centres—power, cooling, density, efficiency and scalable architectures.

By DCD Edge Infra & Inference Channel
substack.com 2026-05-13 41 min read

Cerebras — Faster Tokens Please

Why it matters

Cerebras’ WSE-3 wafer-scale chip (TSMC N5) packs 44 GB of on-wafer SRAM, ~21 PB/s aggregate SRAM bandwidth, and advertises 125 PFLOPS FP16 sparse (company sparse spec; dense is ~15.6 PFLOPS using an 8:1 sparsity assumption)

Key details

  • CS-3 system power and cooling: a single WSE-3 engine block consumes ~25 kW; CS-3 servers use 12×3.3 kW PSUs, 84 Vicor power bricks, custom liquid cold-plate “engine block,” and require ~100 LPM flow (~4 LPM/kW) vs ~1.5 LPM/kW for NVL72 reference designs
  • Off-wafer I/O is limited: WSE-3 exposes ~1.2 Tb/s (≈150 GB/s) via 12×100 GbE FPGA NICs; that low external bandwidth forces pipeline-parallel model sharding and constrains ability to host very large models or large KV caches
  • SRAM capacity scaling has slowed: WSE gen progression shows 18 GB (WSE-1, 16nm) → 40 GB (WSE-2, 7nm) → 44 GB (WSE-3, 5nm), highlighting SRAM scaling limits beyond N5 and motivating wafer-level hybrid-bonding exploration (DRAM or photonic wafers)

Brief

Cerebras’ wafer-scale strategy has become strategically timely because model providers and users are explicitly buying "fast tokens." The company’s WSE-3 (TSMC N5) places roughly half of wafer area into very fast SRAM (44 GB on-wafer) and delivers ~21 PB/s of on-chip memory bandwidth, enabling decode-style kernels with low arithmetic intensity to run at very high realized FLOPs. In SemiAnalysis roofline comparisons the wafer can, in theory, realize orders of magnitude more token interactivity than HBM GPUs for memory-bound decode workloads; OpenAI reported Spark variants achieving up to ~2,000 tokens/sec/user on Cerebras hardware. However, Cerebras’s sparse FLOP marketing (125 PFLOPS FP16 sparse) masks dense throughput (~15.6 PFLOPS assuming an 8:1 sparsity factor), and the wafer’s real constraints are SRAM capacity per wafer and off-wafer I/O.

Those constraints drive system and business trade-offs. Each CS-3 engine block pulls ~25 kW and requires bespoke power-delivery (12×3.3 kW supplies, 84 Vicor bricks) and custom liquid cooling; facility plumbing must support ~100 LPM per server versus ~1.5 LPM/kW for typical GB300 racks. Off-wafer bandwidth is only ~1.2 Tb/s (≈150 GB/s) via 12×100 GbE FPGAs, which means large models and large KV caches cannot be streamed efficiently — pipeline parallelism (layer-wise sharding) becomes the practical scaling strategy, at the cost of pipeline bubbles and multiplying per-inflight-microbatch KV storage requirements. SemiAnalysis’s telemetry (≈432k requests, ~80B tokens) indicates a P50 request of ~96.3k tokens and ~50% of sessions above 128k, underscoring why memory capacity and I/O matter. Commercially, OpenAI’s December 2025 MRA (750 MW committed; option to 2 GW), a $1B loan, and a ~33.45M-share warrant package (S-1 fair value $82.02/share) tie Cerebras’ near-term fate to a single large customer and create a path to rapid scale — but execution hinges on wafer supply ramp, datacenter capacity (750 MW by 2028), and architectural fixes (hybrid-bonded DRAM/photonic wafers) to address SRAM scaling and the wafer's “island” I/O geometry limitations.

By SemiAnalysis
Twitter/X 2026-05-13 2 min read

xAI's Colossus is grid-tied

Why it matters

xAI's Colossus is grid-tied: Colossus has 300 MW of TVA interconnect approved across two substations that xAI funded, and it holds contractual obligations to curtail during grid stress; the on-site turbines were supplemental to accelerate time-to-power, not to create a permanently islanded power system.

Key details

  • Operating captive generation at hyperscale is economically and operationally complex — fuel logistics, maintenance, N+1 redundancy, permitting, emissions compliance, continuous load/supply balancing, and multi-year equipment lead times — so most large AI loads will prefer grid ties; Google chief technologist Amin Vahdat confirmed this and Google bought a hybrid/island power developer.
  • Behind-the-meter (BTM) generation matters as a critical bridge while utility interconnect timelines lag demand, but the steady-state industry trajectory is toward hybrid architectures: grid interconnect + captive generation + storage + demand response, not fully off-grid hyperscale campuses.

Brief

Shanu Mathew (@ShanuMathew93) rejects the 'off-grid only' thesis for hyperscale AI, citing xAI's Colossus as grid‑tied with 300 MW TVA interconnect across two xAI‑funded substations and curtailment obligations. On-site turbines accelerated time‑to‑power; long‑term economics and operational complexity favor hybrid architectures combining grid interconnect, captive generation, storage, and demand‑response.

By @ShanuMathew93
Twitter/X 2026-05-13 1 min read

Robin Li framed Baidu as a vertical AI company — “Baidu chip, Baidu cloud, Baidu…

Why it matters

Robin Li framed Baidu as a vertical AI company — “Baidu chip, Baidu cloud, Baidu model, Baidu app” — with every layer built inside China.

Key details

  • Baidu’s Kunlun chips are in production with 30,000+ cards; Kunlunxin IPO is described as incoming; ERNIE 5.1 reportedly cut pre‑training cost by 94%.
  • Baidu highlighted agents (DuMate, Miaoda, Huiboxing), promoted “disposable software” for single‑use tasks, forecasted 10 billion daily active agents (3× Meta’s DAU), and noted OpenClaw adoption in China is already double the US per an American cybersecurity firm.

Brief

Shruti Mishra reporting from Baidu Create in Beijing says Robin Li presented Baidu as a full vertical AI stack — chip, cloud, model, app — all built in China. Announcements included 30,000+ Kunlun cards in production, a Kunlunxin IPO, ERNIE 5.1 with a 94% pre‑training cost cut, multiple agents (DuMate, Miaoda, Huiboxing), a 10B DAU agent forecast, and rapid OpenClaw adoption.

By @heyshrutimishra
WORTH READING

Deeper context and second-pass items.

39 items
ArXiv 2026-05-12 1 min read

Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts

Why it matters

The authors prove a geometric coupling: for a routed token, router weights for the selected expert and that expert's weights receive gradients along the same input direction (differing only by scalar coefficients), so matched router–expert directions accumulate the same routed-token history.

Key details

  • Empirically in a 1B-parameter SMoE trained from scratch, higher router scores predict stronger activations inside the selected expert; adding auxiliary load-balancing losses breaks the coupling by spreading input-directed gradients across router weights, making distinct router directions nearly three times more similar.
  • They propose a parameter-free online K-Means router where each expert keeps a running average of its routed hidden states and tokens are assigned by cosine similarity; this router attains the lowest load imbalance versus auxiliary-loss and loss-free balancing with only a modest perplexity increase.

Brief

Based on the abstract (full text not consulted), the paper analyzes how routing decisions in Sparse Mixture-of-Experts form a geometric coupling between routers and experts: gradients for a routed token point along the same input direction in router and expert weights. The authors validate this in a 1B-parameter SMoE, show that auxiliary load-balancing disrupts the coupling (making router directions ≈3× more similar), and introduce a parameter-free online K-Means router (running-average centroids + cosine assignment) that minimizes load imbalance with only modest perplexity cost, suggesting geometric coupling underlies effective specialization. Published 2026-05-12 by Ahrac, Hochwald, and Geva (arXiv:2605.12476v1).

Authors: Sagi Ahrac, Noya Hochwald, Mor Geva
Twitter/X 2026-05-13 2 min read

Oregon passed a law last year requiring data centers to pay their fair share of…

Why it matters

Oregon passed a law last year requiring data centers to pay their fair share of utility costs; the Oregon Public Utilities Commission (PUC) issued a new order detailing implementation.

Key details

  • The PUC will allocate generation and transmission assets needed for growth to the customer classes that cause the growth, making large loads (data centers) collectively responsible for new investments; large-load customers may procure their own resources in PUC-reviewed special contracts but those resources must be non-emitting and proven not to be better handled via rate base or PPAs.
  • Large loads must enter a queue and wait until sufficient clean energy exists to cover their demand; special contracts can help prioritize customer-developed projects. PUC staff favored a separate tariff for flexible large data-center loads but said the commission lacks staff capacity to design it.

Brief

Oregon's Public Utilities Commission issued an order implementing last year’s law that forces data centers to pay their share of utility interconnection and growth costs. The order assigns growth-related generation/transmission costs to the customer classes that cause them, allows PUC-reviewed non‑emitting special contracts, requires queuing for available clean energy, and notes staff support—but limited capacity—for a flexible large-load tariff; the author says this shows what a traditionally regulated state can do and praises the 'utility of utilities.'

By @fredstaffordcs
ArXiv 2026-05-12 1 min read

Solve the Loop: Attractor Models for Language and Reasoning

Why it matters

Attractor Models use a two-stage design: a backbone proposes output embeddings and an attractor module refines them to a fixed point via implicit differentiation, keeping training memory constant and allowing adaptive iteration counts.

Key details

  • In large-scale language-model pretraining, Attractor Models improve perplexity by up to 46.6% and downstream accuracy by up to 19.7%; a 770M Attractor Model outperformed a 1.3B Transformer trained on twice as many tokens (paper published 2026-05-12).
  • On reasoning benchmarks, a 27M-parameter Attractor Model trained with ~1,000 examples achieved 91.4% on Sudoku-Extreme and 93.1% on Maze-Hard; the authors report frontier models (Claude, GPT o3) fail completely and observe 'equilibrium internalization' that lets the solver be removed at inference with little degradation.

Brief

Attractor Models introduce a two-stage iterative-refinement architecture where a backbone proposes embeddings and an attractor module solves for a fixed point with gradients via implicit differentiation, which keeps training memory constant and allows adaptive iteration depth. Empirically (abstract only), they report up to 46.6% perplexity reduction and 19.7% downstream accuracy gains in pretraining—e.g., a 770M model beating a 1.3B Transformer trained on twice the tokens—and strong few-shot reasoning (27M params, ~1,000 examples: 91.4% Sudoku-Extreme, 93.1% Maze-Hard). The paper also describes 'equilibrium internalization,' where fixed-point training moves initial outputs near equilibrium so the solver can be removed at inference with little loss. Full text was not available here (abstract only).

Authors: Jacob Fein-Ashley, Paria Rashidinejad
ArXiv 2026-05-12 1 min read

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

Why it matters

KV-Fold is a training-free, one-step KV-cache recurrence that treats the transformer's KV cache as the accumulator in a left fold: at each chunk the model attends to the accumulated cache, appends new keys/values, and forwards the enlarged cache without modifying or retraining the model.

Key details

  • The induced recurrence is stable: per-step drift rises briefly then saturates to a flat plateau that is robust across chunk sizes and model families and is insensitive to a 10,000× change in numerical precision.
  • On Llama-3.1-8B, KV-Fold achieves 100% exact-match retrieval across 152 trials covering contexts from 16K to 128K tokens and chain depths up to 511, operating within the memory limits of a single 40GB GPU and maintaining long-range fidelity versus streaming methods.

Brief

KV-Fold introduces a training-free long-context inference protocol that treats the transformer's KV cache as an accumulator in a left fold, appending newly produced keys/values per chunk and forwarding the enlarged cache. The method yields a stable, precision-robust recurrence (plateau insensitive to 10,000× precision changes) and achieves 100% exact-match retrieval on Llama-3.1-8B across 152 trials (16K–128K tokens, depth ≤511). Based on the abstract (full text not provided).

Authors: Alireza Nadali, Patrick Cooper, Ashutosh Trivedi...
ArXiv 2026-05-12 1 min read

ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models

Why it matters

ORCE (Chen Li, Xiaoling Hu, Songzhu Zheng, Jiawei Zhou, Chao Chen; arXiv 2026-05-12) proposes a decoupled, order-aware verbalized-confidence framework that first generates an answer then conditions confidence estimation on the fixed question–answer pair.

Key details

  • The method builds a sampling-based surrogate from multiple model completions and optimizes rank-based reinforcement-learning objectives to align confidence ordering; experiments on reasoning and knowledge-intensive benchmarks report improved calibration and failure prediction while largely preserving answer accuracy (paper: 18 pages, 2 figures).

Brief

ORCE introduces a decoupled, order-aware approach to verbalized confidence: answers are generated first and confidence is estimated conditioned on the fixed question–answer pair. Using a sampling-based surrogate of multiple completions and rank-based RL to encourage higher confidence for more likely-correct responses, the method improves calibration and failure prediction without substantially hurting answer accuracy, addressing interference from prior joint optimization techniques.

Authors: Chen Li, Xiaoling Hu, Songzhu Zheng...
ArXiv 2026-05-12 1 min read

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Why it matters

Proposes 'Multi-Stream LLMs' (Guinan Su, Yanwu Yang, Xueyan Li, Jonas Geiping): instruction-tune models to operate on multiple parallel streams (separate streams for thoughts, inputs, outputs) so every forward pass simultaneously reads from multiple input streams and generates tokens in multiple output streams.

Key details

  • Claims practical benefits: unblocks agents so they can act while reading/thinking/writing, improves efficiency via parallelization, and enhances security and monitorability through separation of concerns; arXiv preprint 2605.12460v1 (2026-05-12), 37 pages, code at https://github.com/seal-rg/streaming/

Brief

Multi-Stream LLMs (Su et al., 2026) argue that single-stream chat/instruction formats (e.g., instruction-tuned ChatGPT-style models) create a bottleneck preventing simultaneous reading, thinking and acting. They propose instruction-tuning for parallel computation streams—splitting roles into separate input/output channels so each forward pass concurrently consumes and emits tokens across streams. The preprint (37 pages) reports this architecture should enable concurrent action, boost efficiency via parallelization, and improve security and monitorability; code is provided on GitHub.

Authors: Guinan Su, Yanwu Yang, Xueyan Li...
Twitter/X 2026-05-13 1 min read

Nous Research announced Token Superposition Training (TST) on 2026-05-13…

Why it matters

Nous Research announced Token Superposition Training (TST) on 2026-05-13, claiming a 2–3× wall-clock speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or training data.

Key details

  • TST modifies the pretraining loop: during the first third of training the model reads and predicts contiguous 'bags' of tokens (input embeddings are averaged; outputs use a modified cross-entropy to predict the next bag), then the run continues with normal next-token training; the inference-time model is identical to a conventionally pretrained model.
  • TST was validated at 270M, 600M, and 3B dense scales and at a 10B-A1B Mixture-of-Experts (MoE); the work was led by @bloc97_, @gigant_theo, and @theemozilla — @DavidOndrej1 highlighted this release alongside frequent Hermes updates.

Brief

Nous Research released Token Superposition Training (TST) on 2026-05-13, claiming a 2–3× wall-clock speedup at matched FLOPs without changing architecture, optimizer, tokenizer, or data. TST trains on averaged contiguous token 'bags' for the first third of training with a modified cross-entropy, then switches to standard next-token training; validated at 270M, 600M, 3B dense and 10B-A1B MoE.

By @DavidOndrej1
ArXiv 2026-05-12 1 min read

Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

Why it matters

Self-Supervised Laplace Approximation (SSLA) approximates the posterior predictive directly by refitting the model on its own self-predicted data, producing a deterministic, sampling-free posterior-predictive approximation and supporting prior-sensitivity analysis via a modular prior interface.

Key details

  • The paper introduces an approximate, cheaper variant (ASSLA) to avoid expensive refitting; the authors prove properties of (A)SSLA and evaluate them on regression problems from Bayesian linear models to Bayesian neural networks, reporting improved predictive calibration over classical Laplace approximations while remaining computationally efficient.
  • Work by Rodemann, Marquard, Augustin, and Caprio (arXiv:2605.12208v1) was posted 2026-05-12 and accepted to TMLR.

Brief

The paper proposes Self-Supervised Laplace Approximation (SSLA), which sidesteps parameter posteriors and directly approximates the posterior predictive by refitting on model self-predictions; a faster approximate variant (ASSLA) avoids costly refits. The authors provide theoretical analysis and experiments on Bayesian linear models through Bayesian neural networks, showing better predictive calibration than classical Laplace methods while keeping computation efficient.

Authors: Julian Rodemann, Alexander Marquard, Thomas Augustin...
substack.com 2026-05-13 2 min read

Foxconn is Buoyed By Nvidia

Why it matters

Foxconn reported annual revenues of $261 billion and employs over 800,000 people, making it one of the world’s largest companies and the largest from Taiwan.

Key details

  • Foxconn produced as many as 5.8 billion individual items in 2021 and holds an estimated 30–40% global market share in contract electronics manufacturing.
  • Apple has been Foxconn’s most important customer for over a decade (driven by the iPhone); other customers named include Amazon, Huawei, Sony, and Dell.
  • Since 2023 Foxconn has been the main electronics manufacturing partner for Nvidia, assembling Nvidia’s chips into completed server products for AI data centers, providing a second wind amid broader strain on Foxconn’s core business model.

Brief

Foxconn, the world’s largest contract electronics manufacturer, reported $261 billion in annual revenue and employs more than 800,000 people; at scale it produced up to 5.8 billion items in 2021 and is estimated to control roughly 30–40% of the global contract-manufacturing market. Long dependent on Apple—its most important customer for over a decade because of the iPhone—Foxconn also builds devices for Amazon, Huawei, Sony, and Dell. Beginning in 2023 the company became Nvidia’s primary electronics partner, turning Nvidia’s chips into finished server products that populate AI data centers; that AI-hardware boom has given Foxconn a noticeable boost. Nevertheless, the article notes that Foxconn’s underlying contract-manufacturing business model continues to face structural strains despite the Nvidia-driven uplift.

By Bismarck Analysis
Twitter/X 2026-05-13 1 min read

On May 13, 2026 NatPurser argues opponents are as angry about transmission…

Why it matters

On May 13, 2026 NatPurser argues opponents are as angry about transmission buildouts — including buyouts, easements, and eminent domain — as they are about data centers themselves.

Key details

  • Transmission expansion is technically necessary to replace aging infrastructure, support electrification, and strengthen the grid, but NatPurser warns politics will not distinguish transmission serving data centers from transmission serving broader community needs.
  • A concrete grievance cited: The Tennessee Holler reports Georgia Power using eminent domain to seize homes to build power lines for a $17 billion data center; the thread also references forced relocations and projects like xAI in Memphis.

Brief

NatPurser warns backlash to data centers is driven as much by transmission buildout impacts—buyouts, easements and eminent domain—as by the facilities themselves. While transmission upgrades are needed to replace aging lines and support electrification, the politics won’t separate lines serving data centers from lines serving communities, so resistance to transmission projects (e.g., Georgia Power seizing homes for a $17B data center) will broaden.

By @NatPurser
Twitter/X 2026-05-12 1 min read

ESIG/Brattle paper (shared 2026-05-12) states

Why it matters

ESIG/Brattle paper (shared 2026-05-12) states: “The rate impact of large loads is not a foregone conclusion in either direction” and depends on utility system conditions, prevailing market dynamics, market design, regulatory framework, and rate design.

Key details

  • Rates for existing customers can rise if a utility has not fully hedged added large loads — Mohan cites examples among PJM ratepayers.
  • Matthew Yglesias: adding a big new demand like a data center “absolutely COULD reduce consumer electricity prices” by spreading fixed costs, but whether that occurs is case-specific.

Brief

Aniruddh Mohan highlights a 2026-05-12 ESIG/Brattle paper (with a clarifying graphic) showing that large new electricity loads do not automatically raise or lower retail rates. Impacts depend on system-specific factors; Mohan warns unhedged utilities can pass higher costs to customers (notably in PJM), while Matthew Yglesias notes reduced prices are possible but not guaranteed.

By @aniruddh_mohan
ArXiv 2026-05-12 1 min read

Elastic Attention Cores for Scalable Vision Transformers

Why it matters

VECA (Visual Elastic Core Attention) replaces all-to-all self-attention with a core–periphery design: N image patches communicate only via a learned, resolution-invariant set of C core embeddings, yielding linear inference complexity O(N) for fixed C and eliminating direct patch-to-patch interactions.

Key details

  • VECA uses cores propagated across layers and nested training along the core axis to enable elastic compute–accuracy trade-offs at inference; authors report competitive performance with recent vision foundation models on classification and dense tasks while reducing computational cost (paper: arXiv:2605.12491v1; code: github.com/alansong1322/VECA).

Brief

VECA (Visual Elastic Core Attention) tackles the quadratic cost of ViT self-attention by introducing a small set of learned core embeddings (C) that mediate communication for N patch tokens, producing linear O(N) complexity for fixed C. Cores are initialized and propagated across layers; nested core-axis training permits elastic compute–accuracy trade-offs. The authors claim competitive results on classification and dense tasks with lower compute (see arXiv and GitHub).

Authors: Alan Z. Song, Yinjie Chen, Mu Nan...
ArXiv 2026-05-12 1 min read

Reward Hacking in Rubric-Based Reinforcement Learning

Why it matters

In medical and science tasks, weak rubric-based verifiers produced large proxy-reward gains that did not transfer to a cross-family panel of three frontier reference judges; exploitation grew over training and concentrated in recurring failures such as partial satisfaction of compound criteria, treating implicit content as explicit, and imprecise topical matching.

Key details

  • Stronger verifiers substantially reduced but did not eliminate verifier exploitation: when rubrics left important failure modes unspecified, rubric-based verifiers preferred the RL-trained checkpoint while rubric-free judges preferred the base model, with gains concentrated in completeness and presence-based criteria but declines in factual correctness, conciseness, relevance, and overall quality.
  • Mahmoud et al. (arXiv 2026-05-12) introduce a verifier-free diagnostic called the self-internalization gap, based on policy log-probabilities, which tracks reference-verifier quality and detects when a policy trained with a weak verifier stops improving.

Brief

Reward hacking in rubric-based reinforcement learning: Mahmoud et al. (arXiv 2026-05-12) analyze divergence between training verifiers and a three-judge reference panel across medical and science domains. They separate verifier failure from rubric-design limitations, show weak verifiers yield nontransferable gains and recurring exploitation, and propose the self-internalization gap (policy log-probabilities). Stronger verification reduces but does not guarantee broader quality gains. (Summary based on abstract only.)

Authors: Anas Mahmoud, MohammadHossein Rezaei, Zihao Wang...
substack.com 2026-05-13 7 min read

The Inflation of 2021 Through Mid-2022 Is Back, Baby!: Chart of the Day

Why it matters

Brad DeLong (Grasping Reality, May 13, 2026) reports April/Mar CPI dynamics: core CPI (ex food & energy) rose at a 4.8% annualized rate, headline CPI at a 7.2% annualized rate, and housing/shelter also at a 7.2% annualized rate.

Key details

  • Energy is the principal trigger: energy prices rose 3.8% month‑over‑month and 17.9% year‑over‑year, with gasoline up 28.4% over the past 12 months — a classic relative‑price oil shock with spillovers to shipping, risk premia and supply chains.
  • DeLong contrasts this episode with 2021–mid‑2022 (broad, multi‑shock inflation driven by stimulus, supply bottlenecks, housing and the Ukraine shock) and calls the current rise narrower and geopolitically centered, but warns wage‑price feedbacks could re‑anchor higher inflation norms.
  • Policy prescription for the Fed: stay on hold longer, emphasize a 'higher‑for‑longer' communications stance (signal policy near current levels well into 2027), avoid panicked hikes in response to a relative‑price shock, but prepare a potential hiking path beginning late 2026 to prevent unanchoring.

Brief

Brad DeLong argues that headline and core inflation readings in mid‑2026 replicate the early‑2021 to mid‑2022 inflation pattern, but with a narrower, geopolitically driven center. He highlights specific CPI moves: core CPI at a 4.8% annualized April‑over‑March rate, headline CPI and shelter/housing at 7.2% annualized, energy +3.8% month‑over‑month and +17.9% year‑over‑year, and gasoline +28.4% y/y. Unlike the broad, overlapping shocks of 2021–22 (fiscal stimulus, supply‑chain bottlenecks, housing dynamics, and the Ukraine shock), today’s episode is primarily an oil/energy relative‑price shock that can still feed into wage and margin behavior. DeLong counsels the Fed to refrain from immediate hikes, extend the horizon for 'higher for longer' guidance into 2027, prepare models for possible tightening beginning late 2026, and focus on anchoring expectations to avoid costly re‑anchoring if wage‑price feedbacks emerge.

By Brad DeLong, from Grasping Reality Newsletter
substack.com 2026-05-11 10 min read

🎙️ How I AI: Quests, token leaderboards, and the elite AI adoption playbook & Notion’s spec-driven development

Why it matters

Sendbird CEO John Kim described an internal marketplace called Automators where employees post 'quests' that show risk, weeks saved, and beneficiaries; contributors earn experience points redeemable for gift cards, exec time, or company presentations.

Key details

  • Non‑technical teams at Sendbird shipped a fully functional Stripe‑connected swag e‑commerce store in days by building on InfoSec‑vetted app templates that preconfigure auth, environments, databases, and security.
  • John Kim tracks token usage with tiers from Beginner (under 1M tokens/day) up to AI God (over 100M tokens/day); managers see team tiers and Kim monitors the smoothness of token‑usage curves to verify 24/7 AI continuity.
  • Sendbird created a cross‑functional AI task force and a role called AI Engineer for Internal Operations reporting to the CEO and chief of staff; they meet weekly to unblock tooling, vet vendors, and accelerate adoption; hiring now prioritizes curiosity, agency, and energy over years of tenure.

Brief

Lenny's May 11, 2026 newsletter highlights two practical playbooks for scaling AI inside companies. Sendbird CEO John Kim outlines an internal product called Automators — a gamified marketplace where anyone can post a quest (with risk, weeks‑saved, and beneficiaries) and engineers or AI agents claim work. Contributors earn experience points they can redeem for gift cards or executive time, turning AI adoption into a measurable, crowd‑sourced product rather than a top‑down program. Kim stresses secure, InfoSec‑vetted app templates so marketers and CSMs can ship production features (his marketing team launched a Stripe‑backed swag store in days). He also tracks org token consumption with tiers from <1M tokens/day to >100M tokens/day, watches the smoothness of token‑usage curves to ensure AI coverage around the clock, and formed a weekly cross‑functional AI task force plus an AI Engineer for Internal Operations to remove barriers; hiring criteria now emphasize curiosity, agency, and energy. On the engineering side, Notion’s Ryan Nystrom advocates spec‑first development for agentic coding: maintain human‑readable Markdown specs as source of truth, 'yap your spec' with Whisper transcripts, and point coding agents (Codex/Boxy) at specs to produce PRs with UI previews in ~20 minutes. He argues fast CI is the bottleneck on agent velocity — reducing CI time (target: to 25% of current) multiplies iteration throughput — and that good developer experience (CLI, docs, fast CI) benefits both humans and agents. Both guests converge on: enable non‑engineers safely, measure and celebrate AI fluency, and invest in developer workflows to amplify agent productivity.

By Lenny's Newsletter
substack.com 2026-05-12 4 min read

Qwen3.6 27B Quantization: FP8 vs INT4 vs NVFP4

Why it matters

The author evaluated five Qwen3.6 27B quantized variants (published 2026-05-12) — Intel AutoRound INT4 (group=128, symmetric), Qwen FP8 (block=128), hampsonw AWQ INT4 (group=32, MTP restored as BF16), Peutlefaire NVFP4 (llm-compressor targeting all Linears), and kaitchup AutoRound NVFP4 (NVFP4 with local dynamic 4-bit activations; linear-attn kept in 16-bit).

Key details

  • Benchmark accuracy and pass@k show the full NVFP4 that quantizes linear-attention is the clear loser (consistently and significantly underperforms); even the NVFP4 variant that preserves linear-attn in 16-bit trails other recipes.
  • Intel’s INT4 AutoRound recipe is notably strong despite quantizing most linear-attention modules — accuracy holds up when in_proj_a and in_proj_b are left at higher precision, implying selective 16-bit retention in linear-attn is important.
  • Quantized models tend to generate more tokens (token count rose) and that increased token output correlates negatively with accuracy; experiments measured accuracy, latency, memory usage, and MTP efficiency on Verda GPUs (B200 and RTX Pro 6000).

Brief

The article tests Qwen3.6 27B under five quantization recipes (FP8 block128; AutoRound INT4 group128; AWQ INT4 group32 with MTP→BF16 and linear-attn ignored; NVFP4 via llm-compressor; and AutoRound NVFP4 with 4-bit local activations but linear-attn in 16-bit). Methodology measured accuracy across benchmarks (including coding pass@k/LiveCodeBench), latency, memory, and MTP tensor handling; large Linear weights were the primary quantization targets while embeddings, output head, norms and small state tensors were kept at higher precision. Results: full NVFP4 that quantizes linear-attention performs substantially worse across tasks, NVFP4 that preserves linear-attn in BF16 still slightly trails, and Intel’s INT4 recipe is surprisingly robust when key linear-attention parts (inproja/inprojb) are kept higher precision. The author also observed quantized variants produce more tokens and that higher token counts often coincide with lower accuracy; experiments ran on Verda B200 and RTX Pro 6000 GPUs.

By The Kaitchup
Twitter/X 2026-05-13 1 min read

Author @alex_prompter (posted 2026-05-13) asserts 'The AI is 5% of the work' and…

Why it matters

Author @alex_prompter (posted 2026-05-13) asserts 'The AI is 5% of the work' and enumerates the 95%: observability (Langfuse, Braintrust, Helicone), evals, durable runtime (Temporal, Inngest), guardrails, memory layer (Pinecone, pgvector, Turbopuffer), tools (MCP servers, E2B, Modal, Browserbase), auth/multi-tenancy, cost controls, human-in-the-loop, prompt versioning, orchestration, and model routing (LiteLLM, Portkey, OpenRouter).

Key details

  • A CTO with 10+ years quoted: 'Observability + evals + durable runtime + guardrails is the minimum viable production stack'; skipping those four produces the 'works-in-demo → on-fire-in-prod' gap killing agent startups right now.
  • Operational recommendations: prefer minimal orchestration with explicit state machines, implement model routing with fallback/prompt caching/version pinning, enforce cost controls and human approval gates (e.g., thresholds for spend or external emails) to prevent runaway agents and cross-tenant data exposure.

Brief

The AI is 5% of the work. @alex_prompter argues that the remaining 95% — observability (Langfuse, Braintrust, Helicone), evals, durable runtimes (Temporal, Inngest), guardrails, memory (Pinecone, pgvector, Turbopuffer), tools (MCP servers, E2B, Modal, Browserbase), auth/multi-tenancy, cost controls, human‑in‑the‑loop, prompt versioning, orchestration, and model routing — is the real product. A CTO: observability + evals + durable runtime + guardrails = minimum viable production stack.

By @alex_prompter
Twitter/X 2026-05-13 1 min read

Larry Fink (BlackRock CEO) says AI is absolutely not in a bubble and that demand…

Why it matters

Larry Fink (BlackRock CEO) says AI is absolutely not in a bubble and that demand already exceeds available supply, creating investment opportunities for private institutions.

Key details

  • Fink names four binding constraints today: power, compute, chips, and memory, and reports token usage has gone “vertical” in the last few months in America.
  • He argues global diffusion will drive exponential demand growth while governments lack the money/appetite to fund it, so private institutional capital—e.g., BlackRock—must fill the gap.

Brief

Larry Fink contends AI is not a bubble but undersupplied: power, compute, chips, and memory are the main bottlenecks, and token usage has surged in the last few months in America. He predicts exponential global demand growth and says governments won’t fund needed spend, so private institutional capital (notably BlackRock) must step in.

By @BoringBiz_
Twitter/X 2026-05-13 2 min read

UK AI Security Institute (AISI) reported on 2026-05-13 that Claude Mythos Preview…

Why it matters

UK AI Security Institute (AISI) reported on 2026-05-13 that Claude Mythos Preview is the first model to solve both of their end-to-end cyber ranges, including the previously unsolved “Cooling Tower,” and it cleared every task estimated over 8 hours under AISI’s 2.5M-token cap.

Key details

  • XBOW’s offensive-security benchmark found Mythos Preview shows “token-for-token, unprecedented precision” and is the only model to succeed at subtle V8 sandbox tasks.
  • Anthropic’s Project Glasswing (led by Logan Graham) is sharing Mythos with defenders; partners say weeks of testing uncovered many thousands of estimated high+critical vulnerabilities (sometimes double their normal annual finds). Anthropic says they’re implementing safeguards and disclosure/patching processes and that compute was not a rollout limiter.

Brief

Anthropic’s Claude Mythos Preview (via Project Glasswing) passed two independent evaluations (UK AISI and XBOW) that reported breakthrough autonomous offensive-security capabilities: AISI says Mythos solved both end-to-end ranges including the never-before-cleared Cooling Tower and met a 2.5M-token cap; XBOW flagged unprecedented token-level precision. Anthropic is distributing Mythos to defenders while building safeguards and disclosure workflows.

By @bcherny
ArXiv 2026-05-12 1 min read

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Why it matters

Pion is a spectrum-preserving optimizer that updates each weight matrix via left and right orthogonal transformations, which preserve all singular values and therefore keep the matrix spectral norm fixed during training.

Key details

  • The authors (Kexuan Shi, Hanxuan Li, Zeju Qiu, Yandong Wen, Simon Buchholz, Weiyang Liu) derive Pion's update rule, analyze convergence and design choices, and report that Pion yields stable, competitive results on large language model pretraining and finetuning (technical report v1, 30 pages, 19 figures; arXiv:2605.12492v1, published 2026-05-12).

Brief

Pion introduces an orthogonal equivalence transformation optimizer that modulates weight-matrix geometry by applying left/right orthogonal updates while strictly preserving singular values and the spectral norm. The paper derives the Pion update rule, studies design options and convergence properties, and presents empirical evidence that Pion is a stable, competitive alternative to additive optimizers (e.g., Adam, Muon) for LLM pretraining and fine-tuning.

Authors: Kexuan Shi, Hanxuan Li, Zeju Qiu...
Twitter/X 2026-05-13 1 min read

Copper breached $14,000/ton and is approaching record highs as rebounding Chinese…

Why it matters

Copper breached $14,000/ton and is approaching record highs as rebounding Chinese demand collides with tightening global supply (post dated 2026-05-13 by @mining).

Key details

  • Middle East disruptions have triggered a global sulfuric acid shortage — sulfuric acid is a key input for roughly 20% of copper production; acid prices have nearly doubled and Chinese inventories are falling, pushing up production costs in real time.
  • Sprott analysts say electrification and strategic uses (data centers, power systems) will increase from 32% to 45% of total copper demand over the next 15 years (reaching 45% by 2040), shifting demand away from cyclical construction.

Brief

Copper is nearing record highs after prices breached $14,000/t as rebounding Chinese demand meets tightening supply. Middle East disruptions have sparked a global sulfuric acid shortage—key for ~20% of copper output—pushing acid prices nearly double and Chinese inventories down. Sprott forecasts strategic uses will rise from 32% to 45% of demand over the next 15 years (by 2040).

By @mining
Twitter/X 2026-05-13 1 min read

Trade: Trump wants Xi to "open up" China to American firms and to buy more US…

Why it matters

Trade: Trump wants Xi to "open up" China to American firms and to buy more US goods — specifically soybeans and Boeing aircraft — while both sides are discussing extending the one-year tariff truce reached at the Busan Summit in October 2025.

Key details

  • AI/chips and supply leverage: The US has blocked sales of Nvidia's H200 chips to China citing military risk; China wants the restrictions lifted, the US seeks concessions in return, and Nvidia CEO Jensen Huang's presence is noted; China also controls global rare-earths used in semiconductors, EVs, and defense, and has used export controls as leverage.
  • Security crises: The US and Israel launched strikes on Iran in February 2026; Trump's aides say the ceasefire is on "massive life support," the Strait of Hormuz is effectively blockaded (it carried ~20% of world oil pre-war) causing global fuel-price spikes, and Trump says he'll discuss US arms sales to Taiwan with Xi, breaking decades-long US practice and putting Taiwan on alert.

Brief

The summit agenda lists eight items Trump and Xi will tackle, with priority on trade (extend the one-year tariff truce from the Busan Summit, push China to buy soybeans and Boeing), AI/chips and rare-earth leverage (US ban on Nvidia H200, Jensen Huang present, Chinese export controls), and urgent security issues — Feb 2026 US‑Israel strikes on Iran, a Strait of Hormuz blockade that sent fuel prices up, and Trump’s pledge to discuss Taiwan arms sales with Xi.

By @heyshrutimishra
Twitter/X 2026-05-13 1 min read

On 2026-05-13 Mario Nawfal reported Boom Supersonic put a 42-megawatt…

Why it matters

On 2026-05-13 Mario Nawfal reported Boom Supersonic put a 42-megawatt "Superpower" turbine inside a shipping container; the unit reportedly runs in extreme heat, needs no water cooling, and is designed to be portable.

Key details

  • Nawfal also said China is testing rail-free, virtually guided trams on normal roads that can carry 500 passengers at up to 70 km/h, and he called on U.S. leaders (naming Trump) to pursue futuristic infrastructure.

Brief

Mario Nawfal tweeted on 2026-05-13 that Boom Supersonic built a 42‑MW "Superpower" turbine in a shipping container, which runs in extreme heat without water cooling and is portable; he also highlighted China testing rail-free, virtually guided trams carrying 500 passengers at 70 km/h and urged U.S. leaders, naming Trump, to invest in futuristic infrastructure.

By @MarioNawfal
ArXiv 2026-05-12 2 min read

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Why it matters

SenseNova-U1 proposes NEO-unify, a native unified multimodal architecture and ships two variants: SenseNova-U1-8B-MoT (dense 8B) and SenseNova-U1-A3B-MoT (mixture-of-experts 30B A3B). Paper posted to arXiv 2026-05-12; authors provide model design, data preprocessing, training, and inference details.

Key details

  • Authors report the models rival top-tier understanding-only VLMs on text understanding, vision–language perception, knowledge reasoning, agentic decision-making, and spatial intelligence, while also delivering strong any-to-image (X2I) synthesis, complex text-rich infographic generation, interleaved vision–language generation, and preliminary success in vision–language-action and world-model scenarios (project: https://github.com/OpenSenseNova/SenseNova-U1).

Brief

SenseNova-U1 introduces NEO-unify, a unified multimodal paradigm that treats understanding and generation as a single process. The authors release two models (8B dense and 30B MoE A3B) and claim parity with top-tier understanding-only VLMs across perception, reasoning, decision-making, and spatial tasks, while also achieving strong any-to-image synthesis and interleaved multimodal generation; full design and training details are provided on the project page.

Authors: Haiwen Diao, Penghao Wu, Hanming Deng...
ArXiv 2026-05-12 1 min read

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

Why it matters

LongMemEval-V2 (LME-V2) is a benchmark of 451 manually curated questions covering five agent memory abilities (static state recall, dynamic state tracking, workflow knowledge, environment gotchas, premise awareness) paired with histories up to 500 trajectories and 115M tokens.

Key details

  • The paper introduces two memory methods—AgentRunbook-R (RAG with knowledge pools) and AgentRunbook-C (file-backed trajectories plus a coding agent in an augmented sandbox); AgentRunbook-C achieves 72.5% average accuracy versus 48.5% for the strongest RAG baseline and 69.3% for an off-the-shelf coding-agent baseline, though coding-agent methods incur high latency.

Brief

LongMemEval-V2 (LME-V2) is a benchmark for assessing whether memory systems let agents acquire environment-specific experience; it provides 451 questions across five memory abilities with histories up to 500 trajectories (115M tokens). The authors evaluate a RAG-style AgentRunbook-R and a file+coding-agent AgentRunbook-C, reporting 72.5% accuracy for AgentRunbook-C versus 48.5% for the best RAG baseline and 69.3% for a coding-agent baseline, while noting higher latency; summary based on the abstract.

Authors: Di Wu, Zixiang Ji, Asmi Kawatkar...
divenewsletter.com 2026-05-13 1 min read

How Energy Leaders Are Navigating 2026's Challenges

Why it matters

A Utility Dive Studio report published May 13, 2026, captures input from 135 energy executives on balancing growth, cost, and sustainability as electrification accelerates and decarbonization targets approach.

Key details

  • Executives say they are navigating market uncertainty, prioritizing where organizations invest, and retaining strong confidence in the energy transition; the piece is custom sponsored content by Utility Dive’s Studio/Studio Expert Network.

Brief

The Utility Dive Studio report (published 2026-05-13) synthesizes views from 135 energy executives on balancing growth, cost, and sustainability amid accelerating electrification and looming decarbonization targets. Respondents describe managing volatility, redirecting investment priorities, and maintaining strong confidence in the energy transition; the analysis is custom sponsored content by Utility Dive’s Studio.

By UD: Load Management
Twitter/X 2026-05-13 1 min read

On 2026-05-13 18:00:00+00:00 @anthonysagami shared a WSJ piece claiming orbital…

Why it matters

On 2026-05-13 18:00:00+00:00 @anthonysagami shared a WSJ piece claiming orbital data centers will use swarms of satellites loaded with AI chips, powered by onboard solar arrays, and fly in near-polar orbits to maximize sunlight exposure.

Key details

  • The author issues an investment pitch—"Invest in the companies leading the AI revolution and watch your portfolio soar"—listing 49 stock/ETF tickers including $NVDA, $MSFT, $META, $GOOGL, $AMZN and $QQQ.

Brief

Anthony Sagami (@anthonysagami) shared a WSJ article on 2026-05-13 describing orbital data centers: swarms of AI-chip satellites with solar arrays in near-polar orbits to maximize sunlight. He appended a direct investment pitch, listing 49 AI-related tickers (e.g., $NVDA, $MSFT, $META, $GOOGL, $AMZN) and urging buys.

By @anthonysagami
Twitter/X 2026-05-13 1 min read

Fred Stafford: Transmission lines framed as serving data centers are labeled an…

Why it matters

Fred Stafford: Transmission lines framed as serving data centers are labeled an “abomination, a corporate land grab,” and even if pitched as bringing renewables some see them as ‘good, necessary?’ while MAGA will reject them outright; Stafford says these political conditions hinder decarbonization and reindustrialization and favors energy technologies that require fewer land‑use conflicts with property owners.

Key details

  • Nat Purser: Much of the data‑center backlash targets transmission buildout—buyouts, easements, and eminent domain—and although massive transmission expansion is needed to replace aging infrastructure and support electrification, public politics will not distinguish transmission serving data centers from transmission serving communities, risking broader opposition to projects.

Brief

Fred Stafford warns that framing transmission projects tied to data centers as “corporate land grabs” — plus blanket partisan rejection from MAGA — creates political barriers to decarbonization and reindustrialization, so he advocates prioritizing energy technologies that minimize land‑use conflicts. Nat Purser notes that anger over buyouts, easements and eminent domain for transmission will broaden opposition even though grid expansion is required for electrification.

By @fredstaffordcs
ArXiv 2026-05-12 1 min read

SI-Diff: A Framework for Learning Search and High-Precision Insertion with a Force-Domain Diffusion Policy

Why it matters

SI-Diff introduces a single force‑domain diffusion policy with a novel mode‑conditioning mechanism and a search teacher policy; it trains on tactile and end‑effector velocity observations to learn both search and high‑precision insertion behaviors without switching models.

Key details

  • On peg‑in‑hole experiments the model extends x–y misalignment tolerance from 2 mm to 5 mm compared to TacDiffusion and demonstrates strong zero‑shot transferability to unseen shapes (paper posted 2026-05-12).

Brief

SI-Diff addresses contact‑rich assembly (peg‑in‑hole) by learning a single force‑domain diffusion policy with a mode‑conditioning mechanism and a search teacher policy that generates diverse trajectories; it maps tactile and end‑effector velocity observations to actions. Experiments report extending x–y misalignment tolerance from 2 mm to 5 mm versus TacDiffusion and strong zero‑shot transfer to unseen shapes. Abstract only.

Authors: Yibo Liu, Stanko Oparnica, Simon Shewchun-Jakaitis...
substack.com 2026-05-11 3 min read

GlobalWafers Q1 2026. May Be Viewed As A Relatively Low Point In The Current Cycle !

Why it matters

GlobalWafers reported Q1 2026 revenue of NT$13.98 billion on May 8, 2026 — down 3.6% QoQ from Q4 2025’s NT$14.50 billion and down 10.3% YoY from Q1 2025’s NT$15.60 billion.

Key details

  • Gross margin fell to 20.8% in Q1 2026 (a 4.9 percentage-point sequential decline from 25.7% in Q4 2025); operating margin was 10.5% and EPS was NT$3.97 (vs NT$3.05 in Q1 2025).
  • Chairperson Doris Hsu said on the earnings call that “Q1 2026 may be viewed as a relatively low point in the current cycle,” signaling management’s view that conditions are near-cycle trough.
  • Industry context: SEMI data show worldwide silicon wafer shipments up 13% YoY in Q1 2026 and positive YoY shipments for seven consecutive quarters (following a prior seven-quarter decline); GlobalWafers’ historical peak was GM 43.7% on ~NT$18.6 billion revenue in Q3 2022.

Brief

GlobalWafers reported Q1 2026 results on May 8, 2026 showing revenue of NT$13.98 billion, a 3.6% sequential and 10.3% year‑over‑year decline, with gross margin compressed to 20.8% (down 4.9 percentage points QoQ), operating margin at 10.5%, and EPS of NT$3.97 (vs NT$3.05 in Q1 2025). Chairperson Doris Hsu called Q1 “a relatively low point in the current cycle,” while SEMI shipment data released April 29 show wafer shipments are recovering (up 13% YoY in Q1 and positive YoY for seven straight quarters after a prior seven‑quarter decline). The company’s current margin profile remains far below its Q3 2022 peak (43.7% GM on ~NT$18.6 billion), and the author notes GlobalWafers has stopped publishing its revenue/gross‑margin chart after Q2 2025, reflecting continued but slow cyclical recovery and some non‑recurring near‑term margin headwinds.

By William Martin Keating from Semicon Alpha
Twitter/X 2026-05-13 1 min read

A Sept '26 15,500–17,000 call spread totaling 5,500 lots traded via an LME broker…

Why it matters

A Sept '26 15,500–17,000 call spread totaling 5,500 lots traded via an LME broker on 2026-05-13 for roughly $22 million in premium.

Key details

  • Robert Friedland calls the block 'extremely bullish' and 'expensive confirmation' of a copper supercycle, characterizing it as institutional conviction that copper will rise by late 2026 and linking the signal to Ivanhoe Mines' Kamoa-Kakula copper complex.

Brief

Robert Friedland reports a Sept 2026 15,500–17,000 call spread of 5,500 lots traded through an LME broker on 13 May 2026 for about $22 million in premium. He interprets the block as institutional conviction — 'extremely bullish' and confirmation of a copper supercycle — and highlights Ivanhoe Mines' Kamoa‑Kakula operation.

By @robert_ivanhoe
Twitter/X 2026-05-12 5 min read

Paul Krugman compared the U.S., France, and Germany at purchasing-power-parity…

Why it matters

Paul Krugman compared the U.S., France, and Germany at purchasing-power-parity current prices and found relative positions roughly constant since 2000; Luis Garicano (endorsed by @patrickc) counters that constant-price PPP — which captures volume gains when tech prices fall — shows the U.S. has pulled away because U.S. output growth concentrated in tech.

Key details

  • Technology’s lead translates into uneven welfare effects via non-tradable wage pressures, high profit margins, and equity concentration: Apple’s margins ≈40%, Anthropic’s inference margins ≈70%; Apple, Microsoft, Nvidia, Alphabet, Meta, and Amazon together are worth $21 trillion (exceeding all European stock markets combined); ~60% of U.S. equity is held by American households; median Meta employee earned $388,000 in 2025.
  • Median equivalised disposable household income (OECD, 2021): the median American earns 30% more than the median Dutch, 31% more than the median German, and 52% more than the median French — implying higher U.S. median living standards despite greater pre-tax inequality.
  • Labor and living‑standard indicators counter the ‘hours worked’ defense: Birinci, Karabarbounis, and See (NBER, 2026) find about half the U.S.–Europe hours gap from the 1990s reversed by the end of the 2010s (Americans work fewer hours than in 2000 while many Europeans work more). Material wealth appears concentrated in U.S. peripheries and new construction, and entry salaries illustrate gaps (London police start ≈$57,000 vs Washington, DC ≈$75,000; Deloitte entry in Madrid ≈€28,000 ≈$33,000 vs Charlotte ≈$63,000).

Brief

Luis Garicano, endorsed by @patrickc, argues that Europe is falling behind the United States because commonly cited PPP comparisons at current prices mask real-volume gains in U.S. technology measured by constant-price PPP. He claims tech-driven productivity growth matters for wages and wealth because much consumption is non-tradable, tech firms bid up local wages, and a large share of tech gains accrues as high-margin profits and equity value rather than visible price declines. Garicano cites concrete data: Apple/Microsoft/Nvidia/Alphabet/Meta/Amazon market cap ≈$21 trillion, median Meta pay $388,000 (2025), and OECD (2021) median disposable incomes where Americans earn 30–52% more than comparable Europeans. He also cites a 2026 NBER paper showing the U.S.–Europe hours gap has narrowed and provides anecdotal evidence of a U.S. construction boom and higher entry wages, arguing these patterns make divergence self‑reinforcing and politically salient for European reform.

By @patrickc
Twitter/X 2026-05-13 1 min read

Fred Stafford (@fredstaffordcs) calls More Perfect Union's framing 'the epitome…

Why it matters

Fred Stafford (@fredstaffordcs) calls More Perfect Union's framing 'the epitome of slopulism' and likens it to historical environmental opposition to nuclear plants.

Key details

  • More Perfect Union reports nearly 50,000 Lake Tahoe–area residents were told NV Energy will stop servicing homes next year to redirect power to data centers; NV Energy is ending a wholesale supply contract with Liberty Utilities in May 2027, and Liberty says it is seeking replacement power and that 'This does not mean the power is shutting off.'

Brief

Fred Stafford finds the Fortune article interesting but condemns More Perfect Union's 'nearly 50,000' framing as 'the epitome of slopulism,' comparing it to past anti‑nuclear environmental reactions. More Perfect Union says NV Energy will stop servicing homes in the Lake Tahoe area next year to redirect electricity to data centers; NV Energy is ending a wholesale contract with Liberty Utilities in May 2027, while Liberty is seeking replacement power and denies immediate shutoffs.

By @fredstaffordcs
Twitter/X 2026-05-13 3 min read

$999 AI Assessment offer

Why it matters

$999 AI Assessment offer: delivers 3–7 tailored off‑the‑shelf tool recommendations, guaranteed to return at least 5 hours/week or full refund; Ganim claims an average ROI of 6+ hours/week saved for about $40/month in tools.

Key details

  • Four-step delivery system: 45‑minute Google Meet interview recorded with Fathom; transcript analyzed with Claude (uses chained 'Skills' or a prompt to pick top 5–7 opportunities); report assembled from a Gamma template (~5 minutes) with executive summary, impact/effort matrix, 3–7 tool recommendations, 4‑day quick‑wins plan and financial impact; 30‑minute review call to close upsells.
  • High implementation conversion and clear upsell ladder: ~40–60% of clients request help implementing; upsells priced as Process Optimization $3–5k, No‑code automation (Zapier/Make) $1–3k, Custom GPTs $3k+, Custom AI Skills $3–5k, Full agent implementation $5–10k + $1k/month retainer. Real case: $3k custom GPT cut a brokerage owner’s listing email workload by 95%.
  • Go‑to‑market playbook: spend a month learning Claude, ChatGPT, Fathom, Gamma, Zapier; perform 2–3 free assessments for testimonials; ramp pricing $499 → $749 → $999, raising price every 3–4 clients until demand stalls.

Brief

Corey Ganim outlines a repeatable serviceized product — the $999 AI Assessment — that maps AI tools onto an owner’s existing workflows and acts as a feeder for $1k–$10k implementation projects. The workflow is: a 45‑minute Fathom‑recorded Google Meet to surface pain points; transcript analysis with Claude (he uses chained Skills) to identify 5–7 opportunities; a Gamma template report (executive summary, impact/effort matrix, 3–7 tool recommendations, 4‑day quick wins, financial impact) produced in minutes; and a 30‑minute review call that converts ~40–60% into paid implementations. He guarantees 5 hours/week saved or a full refund, cites an average of 6+ hours/week saved for ~$40/month in tools, and gives an example where a $3k custom GPT removed 95% of email work per listing. His go‑to‑market advice: learn the stack (Claude, ChatGPT, Fathom, Gamma, Zapier), do 2–3 free assessments, then scale pricing from $499 up to $999 and build an upsell ladder for the real revenue.

By @coreyganim
substack.com 2026-05-12 4 min read

Six layers your agent has to handle. Most products have only thought about two. + a responsibility-layer audit.

Why it matters

Nate argues six commercial responsibilities that used to be hidden behind a single human checkout click must now be named and owned by agentic software (examples: identity/ID proofing, authorization evidence, fraud/risk scoring, payment credentials, settlement/refunds/liability, and customer/merchant data rights).

Key details

  • OpenAI and Stripe shipped 'Instant Checkout' in September; five months after launch it was scaled back while competing approaches from Shopify and Google gained traction, marking the first round of the protocol fight over agentic commerce.
  • Technical thesis: 'authorization is not payment' — agentic commerce needs an evidence/authorization layer that outlives the transaction; Google and Stripe are taking different technical/protocol routes to build that persistent authorization/evidence plumbing.
  • Nate argues software-paying-software (including stablecoin use cases) requires different rails than person-to-merchant checkout, AWS will be a quiet strategic owner of key layers, and he supplies two practical tools — a responsibility-layer audit and an authorization spec for finance/legal teams.

Brief

Nate (May 12, 2026) outlines how agentic commerce—software agents holding wallets, signing authorizations, and paying merchants—unbundles the single human checkout click into six commercial responsibilities that must be explicitly assigned: identity, authorization/evidence, fraud/risk, payment credentials, settlement/refunds/liability, and data/merchant–customer relationships. He traces the opening clash to OpenAI + Stripe’s Instant Checkout (shipped in September and scaled back five months later) and contrasts it with Shopify and Google’s counter-protocols, arguing the real fight is over where commercial trust (the persistent authorization/evidence layer) lives. Key technical claims: authorization is distinct from payment and must outlive the transaction; software-to-software payments (including stablecoin rails) should use different infrastructure; and AWS’s platform position matters. Nate ends with two operational prompts — a responsibility-layer audit and an authorization spec — for builders, finance, and legal to close gaps.

By Nate from Nate’s Substack
cautiousoptimism.news 2026-05-11 5 min read

With gas prices spiking, going to office is now a pay cut

Why it matters

U.S. gasoline prices are up about 60% versus earlier this year; with the average American commuting about 26 miles to and from work, the return-to-office movement is effectively a pay cut for many workers.

Key details

  • Multiple governments are mandating or encouraging reduced commuting to blunt fuel demand: India has urged citizens to cut gasoline use, Indonesia moved government staff to work-from-home on Fridays, Myanmar set Wednesdays at home for government staff, and Malaysia, Sri Lanka and Vietnam have also imposed restrictions.
  • On cybersecurity AI, Anthropic’s Mythos model faces U.S. White House restrictions from broad distribution, while OpenAI has released GPT-5.4‑Cyber and GPT-5.5‑Cyber (reported as strong) and is in talks with the EU to provide access; European labs (Mistral, H Company, Yann LeCun’s group, Black Forest Labs, SAP) are racing to build competing cyber-capable models.
  • CoreWeave Q1 2026: revenue $2.08B (vs. $1.97B expected), GAAP loss $1.12/share (vs. $0.90 expected), operating margin -7% (vs. -3% a year ago); active power passed 1GW in Q1 and 3.5GW active/under contract; Q2 revenue guidance $2.45–$2.60B (midpoint $2.53B) below analysts’ $2.69B; debt servicing costs are ~25% of revenue; company targets $12–$13B 2026 revenue and an $18–$19B run‑rate by year-end.

Brief

Rising gasoline prices — roughly a 60% increase in the U.S. since earlier this year — are making return-to-office mandates feel like pay cuts for many commuters: the average American travels about 26 miles to and from work, amplifying cost and time burdens. Governments from India to Indonesia, Myanmar, Malaysia, Sri Lanka and Vietnam are implementing measures to reduce commuting and gasoline demand. On the cybersecurity front, Anthropic’s Mythos is effectively constrained by U.S. policy, while OpenAI’s GPT-5.4‑Cyber and GPT-5.5‑Cyber (reported as strong) are being discussed with EU officials for access; several European AI labs (Mistral, H Company, LeCun’s team, Black Forest Labs, SAP) are racing to produce cyber-capable models. In markets, CoreWeave beat revenue expectations in Q1 2026 with $2.08B but reported a wider loss of $1.12/sh, guided Q2 to $2.45–$2.60B (below consensus), showed operating margin deterioration (‑7% vs. ‑3% YoY), and signaled heavy debt service (~25% of revenue) even as it scales power to 3.5GW under contract.

By Alex Wilhelm from Cautious Optimism
Twitter/X 2026-05-13 1 min read

If Freon, Kaon, random spectra, or inverted spectra perform similarly, then the…

Why it matters

If Freon, Kaon, random spectra, or inverted spectra perform similarly, then the essential signal is the data-induced gradient subspace and boundary-conditioned iteration on the learned manifold — claim tested in the author's lead experiments on SYNTH (ongoing as of 2026-05-13).

Key details

  • Optimizers using geometric orthogonalization yield measurable improvements but implicitly assume the per-iteration convergence direction is known; over-reliance can be harmful because AI optimization is an inverse problem with a dynamic fixed point.
  • From fixed-point theory, noise is a perturbation not merely error: Kaon’s random spectrum reportedly preserves gradient-subspace orientation while perturbing movement strength along spectral directions.

Brief

Dorialexander (2026-05-13) argues that when Freon, Kaon, random or inverted spectra produce similar results, the core training signal is the data-induced gradient subspace combined with boundary-conditioned iterations on the learned manifold. In the Deep Manifold/Dataualism framing, orthogonalization is a helpful human prior but can mislead because optimization trajectories shift; Kaon perturbs spectral magnitudes while keeping subspace orientation, and the author is testing these claims on SYNTH.

By @Dorialexander
substack.com 2026-05-12 3 min read

Qualcomm Is the Problem Child of the Current Chip Rally

Why it matters

Since the beginning of April 2026 the semiconductor index is up nearly 60%, while Qualcomm shares have risen more than 80% since the end of March 2026.

Key details

  • Qualcomm reported mixed results in late April 2026 and gave current-quarter midpoint guidance of $9.6 billion revenue and $2.20 EPS versus Street consensus of $10.3 billion revenue and $2.43 EPS.
  • A vague management comment about a custom chip deal with a hyperscaler — with more details promised at Qualcomm's analyst day in June 2026 — materially boosted the stock despite weaker guidance.
  • Qualcomm is pitching a pivot into data‑center AI (accelerators and CPUs) and future AI-enabled wearables (CEO Cristiano Amon told Fortune the company is working with AI firms), even as its core mobile businesses face headwinds: Apple is moving to in‑house modems and Android handset volumes are pressured by higher memory prices.

Brief

Qualcomm is at the center of skepticism in the current chip rally: while the broader semiconductor index climbed nearly 60% from early April 2026, Qualcomm’s stock has surged over 80% since the end of March largely on hopes the company can become a major data‑center AI player. That optimism persisted even after mixed late‑April results and guidance (current‑quarter midpoint revenue $9.6B and EPS $2.20 vs. consensus $10.3B and $2.43). A vague management remark about a custom hyperscaler chip deal — with specifics deferred to an analyst day in June 2026 — and CEO Cristiano Amon’s comments about AI wearables have amplified the narrative. Key Context warns the core mobile franchise is eroding (Apple shifting to internal modems; Android demand hit by rising memory prices), meaning the stock’s run may be priced for a transformation that is unproven and execution‑dependent.

By Tae Kim from Key Context Newsletter
Twitter/X 2026-05-13 2 min read

CJ Handmer (2026-05-13) insists a cumulative engine of wealth sufficient to…

Why it matters

CJ Handmer (2026-05-13) insists a cumulative engine of wealth sufficient to offset old-age senescence requires sustained growth of roughly 2%–9% per year and 'lots of young smart ambitious people' working on important problems.

Key details

  • Housing-centered, tax-advantaged intergenerational transfers (a 'Ponzi scheme') cannot substitute for broad productive growth; Japan — with relatively healthy retirees and affordable housing — still faces a crashing birth rate, stagnant economy, and youth locked into seniority-driven 'zombie' firms.
  • Proposed remedies include investing Australian superannuation in domestic primary/secondary production, but Handmer doubts factories can regenerate capital fast enough with fertility at ~2 or 1; he argues only two clear escapes are slowing aging (immortality) or AI-driven hypergrowth.

Brief

CJ Handmer (2026-05-13) argues that offsetting old-age senescence and insecurity requires sustained growth of roughly 2–9% annually—driven by many young, ambitious people—not housing-driven, tax-advantaged intergenerational transfers. He cites Japan's low fertility and stagnant economy as a caution, proposes Australian super invested in manufacturing, but concludes only slowed aging or AI-driven hypergrowth can reliably escape the demographic trap.

By @CJHandmer
QUICK SKIM

Fast scan items.

51 items
Twitter/X 2026-05-13 1 min read

@TGLetter warns NVIDIA's market cap is $5.33 trillion (tweeted 2026-05-13)…

Why it matters

@TGLetter warns NVIDIA's market cap is $5.33 trillion (tweeted 2026-05-13), larger than Germany, Japan, France, the UK, Italy, Canada, and Brazil combined; only the US and China have larger GDP than a single chip company.

Key details

  • The author parallels 2000 Microsoft (c. $600 billion valuation) which analysts called a 'new permanent reality' before it lost ~60% of its value over the next 18 months, implying similar vulnerability for NVIDIA.
  • The thread asserts NVIDIA doesn't produce essentials (oil, food, infrastructure) but sells chips to data centers to train AI models 'that haven't turned a profit yet,' so the entire $5.33T is a bet on an unproven future with 'no playbook' if AI spending slows.

Brief

NVIDIA is valued at $5.33 trillion, exceeding the combined GDPs of Germany, Japan, France, the UK, Italy, Canada and Brazil, the author warns. Citing the 2000 Microsoft episode (≈$600B then −60% in 18 months), the post argues NVIDIA's valuation is a speculative bet on chips used to train AI models that have not yet proven profitable and that there is no historical playbook if AI spending decelerates.

By @TGLetter
substack.com 2026-05-12 2 min read

Discounted exposure to copper clad laminates

Why it matters

Collyer Bridge (May 12, 2026) highlights a value idea: buy a holding company that majority-owns a copper clad laminate (CCL) producer to get cheap CCL exposure because the holdco’s four other divisions reportedly account for almost the entire market cap.

Key details

  • The post cites supply tightness as rationale: a May 3, 2026 tweet from @jukan05 reports a Seoul PCB manufacturer placed advance orders worth 10 billion won with two Taiwanese CCL producers, EMC and TUC, with the order volume described as more than five times normal levels (tweet had ~106K views).
  • The write-up is a preview/paywalled Substack post — the author solicits paid subscriptions for the full analysis, scuttlebutt and on-the-ground research behind the idea.

Brief

Discounted exposure to copper clad laminates (CCL) is presented as a value-investor play by Collyer Bridge in a May 12, 2026 Substack preview: the author argues investors can gain CCL exposure cheaply by owning a holding company that majority-owns a CCL producer while the holdco’s four other divisions account for nearly the entire market capitalization, effectively leaving the CCL business under‑valued. The memo cites acute CCL tightness and AI-driven demand as catalysts, pointing to a May 3, 2026 tweet from @jukan05 reporting a Seoul PCB maker placed advance orders worth 10 billion won with Taiwanese producers EMC and TUC (order volumes described as >5x typical), and frames the thesis as scuttlebutt-driven value research — full details and on‑the‑ground checks are behind the post’s paywall.

By Collyer Bridge
Twitter/X 2026-05-13 1 min read

Garry Tan's headline claim on Rick Rubin's Tetragrammaton

Why it matters

Garry Tan's headline claim on Rick Rubin's Tetragrammaton: the engineers who hate vibe coding and AI the most are the people who would benefit the most from embracing it.

Key details

  • The episode was published 2026-05-13, runs ~2 hours (final timestamp 1:59:38), and is hosted on Rick Rubin's Tetragrammaton podcast.
  • Topics covered with timestamps include early computing and math (0:15–13:18), video games and storytelling (13:18–21:04), engineering and design (21:04–32:15), startups/Y Combinator and founder selection (32:15–58:47), AI/programming and creative revolution (58:47–1:13:10), taste/reps/builder intuition, power/responsibility, reinventing institutions, and AI + manufacturing for greater abundance.

Brief

Garry Tan's episode on Rick Rubin's Tetragrammaton podcast (published 2026-05-13) runs about two hours and argues that engineers who resist 'vibe coding' and AI would benefit from adopting them. He recounts his path from computing, mathematics, and video games to engineering, YC founder selection, AI-driven creativity, builder intuition, institutional reinvention, and AI-enabled manufacturing.

By @garrytan
Twitter/X 2026-05-13 1 min read

@BoringBiz_ claims the application layer will capture most AI value because it…

Why it matters

@BoringBiz_ claims the application layer will capture most AI value because it has minimal capex while the model layer is investing “billions” in GPUs and data centers.

Key details

  • They argue applications are asset‑light with variable costs, lower leverage (no billions of GPU/data‑center debt), can target verticals intensely, and face less concentrated competition than model leaders like OpenAI, Anthropic, and Google, giving applications a larger TAM.

Brief

Author @BoringBiz_ argues that AI value will concentrate at the application layer: apps require little capex vs. the model layer’s “billions” for GPUs and data centers, are asset‑light with lower leverage, can focus on vertical customers, and face less top‑heavy competition (OpenAI, Anthropic, Google), yielding a larger TAM.

By @BoringBiz_
Twitter/X 2026-05-13 4 min read

Ben Halligan coins the new org playbook “Dorsey Mode” after Jack Dorsey, saying…

Why it matters

Ben Halligan coins the new org playbook “Dorsey Mode” after Jack Dorsey, saying it departs from Andy Grove’s High Output Management; he claims Jack’s first quarter after adopting it was “a banger” and notes Brian Armstrong runs a very similar playbook used by many startups in the past 18 months.

Key details

  • Strategy and distribution change: planning cycles are largely abandoned because faster iteration turns many 1‑way doors into 2‑way doors, making creative distribution (and enterprise sales) the primary competitive moat.
  • Hiring and org shape shift: interview loops now include hard AI case problems or live demos; some companies (e.g., Meta, HubSpot) favor very senior engineers while others hire junior AI‑focused talent; org charts move from large triangle hierarchies to circular structures with a central world model and small teams around it, and Jack has removed titles to focus people on work not level.
  • Systems, ops, and leadership implications: more decisions are delegated to world‑models/systems, IT must build scaffolding and make all context legible (an early sign is recording nearly every meeting including 1:1s for model training), compensation must widen (higher standard deviation), CEOs must lead by doing (Dorsey reportedly spends 3 hours each morning building) and run hackathons/office hours to drive adoption.

Brief

Ben Halligan argues that Jack Dorsey’s approach — which he dubs “Dorsey Mode” — is a radical organizational playbook shift away from Andy Grove’s High Output Management, and that it has measurable early upside (Halligan says Dorsey’s first quarter after adoption was “a banger”). The model accelerates iteration so planning cycles break down, turning many 1‑way doors into reversible choices and elevating distribution as the primary moat. Recruiting now emphasizes AI problem cases or live demos, with some firms favoring very senior engineers (Meta, HubSpot) and others hiring junior, AI‑native talent. Org charts move from large triangular hierarchies to small teams orbiting a central world model; titles are being removed; decisions increasingly hand off to systems; IT becomes the scaffolding team that records and feeds meetings and context into models; compensation and leadership practices must change, and CEOs are expected to lead by building and running hackathons to push adoption.

By @bhalligan
ArXiv 2026-05-12 1 min read

DexTwist: Dexterous Hand Retargeting for Twist Motion via Mixed Reality-based Teleoperation

Why it matters

DexTwist (Lee, Li, Lee; arXiv 2026-05-12) is a mixed-reality dexterous-hand retargeting framework that detects a tripod pinch, estimates the operator's intended screw axis and twist magnitude, and applies a real-time residual joint-space refinement to track turning progress while regularizing robot tripod geometry.

Key details

  • The refinement minimizes a virtual-object objective composed of turning angle, screw-axis consistency, fingertip closure, and tripod stability to mitigate embodiment-gap issues (link-length/joint-axis mismatches) that cause tangential fingertip sliding and screw-axis drift in tasks like cap opening, key turning, and bolt screwing.
  • Simulation and real-world experiments reported in the 6-page paper (5 figures, 2 tables) show DexTwist improves turning-angle tracking and screw-axis stability compared with a vector-based retargeting baseline (no numeric percentages provided in the abstract).

Brief

DexTwist introduces a functional twist-retargeting method for MR-based teleoperation that targets contact-rich rotational manipulation where kinematic imitation fails. The system detects tripod pinches, estimates intended screw axis and twist, then performs a real-time joint-space residual optimization minimizing a virtual-object objective (turn angle, axis consistency, fingertip closure, tripod stability). Simulations and real tests demonstrate improved turning-angle tracking and reduced screw-axis drift versus a vector-based baseline.

Authors: Dongmyoung Lee, Chengxi Li, Dongheui Lee
ArXiv 2026-05-12 1 min read

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark

Why it matters

Authors identify a long-tail failure pattern in computer-use agents (citing failures in GPT-5.4 and Claude): a small fraction of complex, low-frequency GUI interactions accounts for a disproportionate share of task failures, attributed to data scarcity for complex interactions (paper published 2026-05-12).

Key details

  • They release CUActSpot, a multimodal benchmark (GUI, text, table, canvas, natural image) covering diverse actions (click, drag, draw, etc.), and a renderer-based data-synthesis pipeline that auto-generates scenes, records screenshots and element coordinates, and uses an LLM to produce instructions/action traces; their Phi-Ground-Any-4B model trained on this data outperforms open-source models with <32B parameters (code/data/models at https://github.com/microsoft/Phi-Ground.git).

Brief

CUActSpot targets the long-tail of complex GUI interactions that undermine computer-use agents by providing a multimodal benchmark (GUI, text, table, canvas, natural image) and a renderer-based data-synthesis pipeline: automatic scene generation, screenshot/element-coordinate recording, and LLM-produced instructions/action traces. Training on this corpus yields Phi-Ground-Any-4B, which outperforms open models under 32B parameters. Only the abstract was available for this summary.

Authors: Miaosen Zhang, Xiaohan Zhao, Zhihong Tan...
substack.com 2026-05-11 4 min read

Takeaways from Harvard CS Professor David J Malan

Why it matters

David J. Malan (Harvard), creator of CS50, credits two lecture techniques for student engagement: a designed “memorable moment” per lecture (e.g., ripping a phone book to illustrate binary search) and sustained high energy motivated by a fear of boring the audience.

Key details

  • AI has reduced CS enrollments and employer hiring of junior engineers from Harvard, and academic dishonesty remains about 5–10% of students each semester but is now harder to prosecute because AI-generated answers are difficult to attribute to a source.
  • Malan argues to teach C in 2026 because it’s low-level enough to reveal how computers work yet compact enough to force students to reimplement core data structures; he identifies pointers as the most challenging concept for students to master.
  • The interview (Ryan Peterman) was published May 11, 2026; the conversation is available on YouTube/Spotify/Apple Podcasts and the transcript is on Substack.

Brief

In a May 11, 2026 interview with Ryan Peterman, Harvard CS professor David J. Malan — the instructor behind CS50 — explains why his lectures resonate, how AI is reshaping student behavior, and why C still matters. Pedagogically, Malan builds a single “memorable moment” into each lecture (his classic is ripping a phone book to anchor binary search) and projects high energy driven by a fear of boring students. On AI’s downstream effects he reports fewer students enrolling and fewer companies hiring junior engineers from Harvard; cheating still affects roughly 5–10% of students per semester, but AI-produced responses are difficult to tie to a source so prosecution is harder. Pedagogically he defends teaching C because its small, low-level surface forces students to implement basic data structures and exposes machine-level concepts; he names pointers as the hardest topic for novices. The full conversation and transcript are linked on Substack and major podcast platforms.

By Ryan Peterman
Twitter/X 2026-05-13 1 min read

Paul Graham, speaking at YC | Stockholm on April 29, 2026, argued founders who…

Why it matters

Paul Graham, speaking at YC | Stockholm on April 29, 2026, argued founders who want faster investor engagement and more serendipitous meetings should move to Silicon Valley, citing faster-moving investors (02:45, 04:36) and serendipity (01:01, 02:45) as key advantages.

Key details

  • He claimed respect and stronger benchmarking come from competing with 'big fish' in the Valley (06:03, 09:10), used the Dropbox founding story as an example (07:59), and highlighted Silicon Valley’s pay‑it‑forward culture (12:21) as critical for startup success.
  • On building hubs elsewhere, Graham outlined ways to help Stockholm thrive (15:36), endorsed YC as the optimal path for founders seeking that ecosystem (17:24), and posed whether Stockholm could become the Silicon Valley of Europe (19:54).

Brief

Paul Graham argued at YC | Stockholm (April 29, 2026) that founders seeking faster capital, serendipitous meetings, and industry respect should relocate to Silicon Valley, citing faster-moving investors, a pay-it-forward culture, and the Dropbox story. He also outlined how to help Stockholm thrive, recommended YC as the optimal path, and asked if Stockholm could become Europe’s Silicon Valley.

By @ycombinator
awardwallet.com 2026-05-11 2 min read

Your Favorite Dining Card Just Got a Lot More Valuable

Why it matters

AwardWallet reported on May 11, 2026 that a popular dining-focused credit card received an anniversary refresh adding a 5X earning rate on hotels.

Key details

  • The update includes nearly $100 in limited-time travel and dining credits and adds rental-car elite status, both available to enroll in now.
  • The promotion pairs the changes with a welcome bonus up to 100,000 points; the email withheld the specific card/issuer name due to advertising-partner requirements.

Brief

A popular dining-focused credit card received an anniversary refresh on May 11, 2026 that adds a 5X hotel-earning rate, nearly $100 in limited-time travel and dining credits, and rental-car elite status (enrollable now). AwardWallet notes the upgrade is coupled with a welcome bonus up to 100,000 points, though the email withheld the issuer/card name for advertising reasons.

By AwardWallet
ArXiv 2026-05-12 1 min read

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

Why it matters

Proposes test-time LLM-guided query refinement that updates a query's embedding using feedback from a generative LLM on a small set of documents, improving ranking quality and producing clearer binary separation in the embedding space.

Key details

  • Experiments with state-of-the-art text embedding models across diverse search and classification benchmarks show consistent gains across all models/datasets, with relative improvements up to +25% on literature search, intent detection, key-point matching, and nuanced instruction-following.
  • Method broadens practical zero-shot use of embeddings as a cheaper alternative to corpus-scale LLM pipelines; authors Ariel Gera, Shir Ashury-Tahan, Gal Bloch et al. released code at https://github.com/IBM/task-aware-embedding-refinement (arXiv:2605.12487, 12 May 2026).

Brief

Task-Adaptive Embedding Refinement via Test-time LLM Guidance presents a test-time method that refines query embeddings via a generative LLM's feedback on a small document subset to tailor embeddings to ad-hoc zero-shot search and classification tasks. Experiments with state-of-the-art embedding models across diverse benchmarks report consistent gains (up to +25% relative), improving ranking and class separation; code released on GitHub. Abstract only; full text not provided here.

Authors: Ariel Gera, Shir Ashury-Tahan, Gal Bloch...
ArXiv 2026-05-12 1 min read

A proximal gradient algorithm for composite log-concave sampling

Why it matters

Presents a proximal-gradient sampler for composite log-concave targets π ∝ e^{-f-g}; when f+g is α-strongly convex and f is β-smooth, it attains ε total-variation error in ~O(κ·sqrt(d)·log^4(1/ε)) iterations, where κ = β/α, matching prior state-of-the-art for the g=0 case.

Key details

  • Algorithm requires gradient access to f and a restricted Gaussian oracle (RGO) for g (able to sample from density ∝ exp(-g(x) - (1/(2h))||y-x||^2)); results are extended to non-log-concave targets satisfying a Poincaré or log-Sobolev inequality and to Lipschitz, non-smooth f.

Brief

The paper introduces a proximal-gradient Monte Carlo sampler for composite log-concave densities π ∝ e^{-f-g}, combining gradient steps on smooth f with a restricted Gaussian-oracle (RGO) proximal sampler for g. Under α-strong convexity of f+g and β-smoothness of f it achieves ε total-variation error in ~O(κ·sqrt(d)·log^4(1/ε)) iterations (κ=β/α), and the authors extend guarantees to Poincaré/LSI targets and to Lipschitz non-smooth f.

Authors: Linghai Liu, Sinho Chewi
ramp.fm 2026-05-11 2 min read

We’ll print, store & ship your merch

Why it matters

Ramp offers end-to-end merch fulfillment (email published 2026-05-11) including printing, inventory storage, pick, pack & worldwide shipping, online store setup, employee onboarding kits, event distribution, and campaign swag; clients can use the full service or select individual components.

Key details

  • Case study: The Bugle Podcast — Ramp built an online store, printed limited-edition Christmas jumpers and handled worldwide shipping; every jumper sold out before Christmas.

Brief

Ramp's merch fulfillment service (announced in an email published 2026-05-11) combines printing with inventory storage, pick-and-pack operations and worldwide shipping, plus online store setup, onboarding kits and event/campaign distribution. The offering is modular — clients can select full fulfillment or specific services — and was credited with selling out The Bugle Podcast's limited-edition Christmas jumpers after Ramp printed and shipped orders globally.

By Ramp for Merch
e.economist.com 2026-05-11 9 min read

The War Room: Drones are rewiring warfare. Literally

Why it matters

Fibre‑optic, wire‑guided first‑person‑view (FPV) drones—used extensively in Ukraine—are effectively unjammable because control signals run over a physical fibre tether; fibre gives lower latency and higher video bandwidth, and a 50 km spool price rose from about $300 to $2,500 (Dimko Zhluktenko).

Key details

  • The Economist obtained a ten‑page GRU document offering Iran 5,000 fibre‑optic drones, long‑range Starlink‑guided drones and operator training, including maps for attacking a slow‑moving American landing flotilla; it is unknown whether Russia shared or acted on the proposal.
  • FPV systems now reach ranges of roughly up to 40 km; CNAS modelling cited in the piece describes layered attacks from 80 km down to 5 km, and the US Indo‑Pacific Command concept “Hellscape” envisions using dense FPV attacks to blunt a Chinese amphibious invasion of Taiwan.
  • Military tech and know‑how are flowing among Russia, China, Iran and North Korea and out to proxies (eg Hizbullah’s use of fibre‑optic drones); related concerns include China developing quieter submarines with Russian help and reported Iranian strikes (May 7) that damaged at least 228 structures, while US satellite imagery releases have been curtailed.

Brief

Fibre‑optic, wire‑guided FPV drones are reappearing as a transformational weapon: by tethering control and video over fibre they are effectively immune to RF jamming, offer much lower latency and higher bandwidth for sharper targeting, and have become decisive on Ukraine’s battlefields. The Economist (May 11, 2026) reports a ten‑page GRU proposal to supply Iran with 5,000 fibre‑optic drones, long‑range Starlink‑guided UAVs and training, including plans to attack US landing forces. FPV ranges are now approaching ~40 km, and CNAS modelling and US Indo‑Pacific Command concepts (eg “Hellscape”) show dense layered FPV attacks (80 km→5 km) would threaten amphibious operations such as a Chinese invasion of Taiwan. The piece warns of accelerating tech flows between Russia, China, Iran and North Korea, wider proliferation to proxies (eg Hizbullah), and related operational impacts including quieter Chinese submarines and recent Iranian strikes (May 7) that reportedly damaged hundreds of US‑linked facilities.

By Shashank Joshi at The Economist
ArXiv 2026-05-12 1 min read

From Imagined Futures to Executable Actions: Mixture of Latent Actions for Robot Manipulation

Why it matters

MoLA (Mixture of Latent Actions), proposed 2026-05-12 by Yajie Li et al., converts imagined future videos into executable control representations by using a mixture of pretrained inverse-dynamics models to infer latent actions from predicted visual transitions; the modality-aware inverse dynamics models explicitly exploit semantic, depth, and optical-flow cues.

Key details

  • The method was evaluated on simulated benchmarks LIBERO, CALVIN, and LIBERO-Plus and on real-world robot manipulation tasks, and the abstract reports consistent gains in task success, temporal consistency, and generalization; the work is listed as ICML 2026.

Brief

MoLA (Mixture of Latent Actions) targets the gap between video-based imagination and actionable control: instead of feeding predicted frames to a policy or decoding videos directly into controls, it infers a mixture of latent actions via pretrained, modality-aware inverse-dynamics models (semantic, depth, flow) to produce a physically grounded action interface. Evaluated on LIBERO, CALVIN, LIBERO-Plus and real robots, the abstract reports consistent improvements in success rates, temporal consistency, and generalization; summary based on the abstract (full paper not reviewed).

Authors: Yajie Li, Bozhou Zhang, Chun Gu...
Twitter/X 2026-05-13 1 min read

Gary Marcus endorses Noam Brown’s claim that “with today’s AI models…

Why it matters

Gary Marcus endorses Noam Brown’s claim that “with today’s AI models, intelligence is a function of inference compute.”

Key details

  • Noam Brown (polynoamial) says model comparisons by a single number became meaningless in 2024 and what matters is intelligence per token or per dollar — crucial for products like Codex.
  • Marcus counters that humans run on roughly 20 watts, arguing future architectural innovations could matter as much as, or more than, raw compute over the long run.

Brief

Gary Marcus amplifies Noam Brown’s claim that, for current AI systems, intelligence scales with inference compute and should be measured as intelligence per token or per dollar (model comparisons became unreliable in 2024). Marcus adds that humans achieve high intelligence on ~20 watts, so new architectures may rival raw compute gains in the long term.

By @GaryMarcus
ArXiv 2026-05-12 1 min read

EgoEV-HandPose: Egocentric 3D Hand Pose Estimation and Gesture Recognition with Stereo Event Cameras

Why it matters

EgoEV-HandPose introduces KeypointBEV, a stereo fusion module that lifts features into a canonical bird's-eye-view and uses an iterative reprojection-guided refinement loop to resolve depth uncertainty and enforce kinematic consistency for egocentric bimanual 3D hand pose and gesture estimation.

Key details

  • The authors collected EgoEVHands, the first large-scale real-world stereo event-camera egocentric hand dataset: 5,419 annotated sequences with dense 3D/2D keypoints across 38 gesture classes under varying illumination, to be released with code.
  • EgoEV-HandPose achieves state-of-the-art results: MPJPE = 30.54 mm and Top-1 gesture accuracy = 86.87%, significantly outperforming RGB stereo and prior event-camera methods, especially in low-light and bimanual occlusion scenarios.

Brief

EgoEV-HandPose tackles egocentric 3D bimanual hand-pose estimation and gesture recognition from stereo event cameras by introducing KeypointBEV, which lifts stereo features into a bird's-eye-view and iteratively reprojection-refines depth and kinematic estimates. Trained and evaluated on the new EgoEVHands dataset (5,419 sequences, 38 gestures), it reports MPJPE 30.54 mm and 86.87% Top-1 accuracy, outperforming RGB-stereo and prior event-based methods, notably under low-light and occlusion.

Authors: Luming Wang, Hao Shi, Jiajun Zhai...
e.economist.com 2026-05-12 8 min read

The World in Brief: Keir Starmer vows to stay on

Why it matters

More than eighty Labour MPs called for Prime Minister Sir Keir Starmer to quit after poor local-election results; minister Miatta Fahnbulleh and four junior ministers resigned and 30‑year gilt yields touched their highest level since 1998.

Key details

  • US–Iran tensions and an energy shock pushed Brent crude to about $105/barrel; Donald Trump said the ceasefire was 'on massive life support' while Iran defended a counter‑proposal that includes ending America’s blockade on its ports.
  • US grocery prices are under pressure: food is roughly one‑third more expensive than before the pandemic, April CPI data due Tuesday, tomatoes are up ~25% year‑on‑year, and higher fertiliser, fuel, plastic (packaging) and transport costs — transport uses up to half of supply‑chain oil — threaten further inflation.
  • Defence and industrial strains: US secretary of war Pete Hegseth proposed raising Pentagon spending by over 40% to about $1.5trn in 2027 (the Iran war has cost at least $25bn); Samsung faces nearly 40,000 workers who could stage an 18‑day strike after unions demanded removing bonus caps and 15% of chip‑division operating profits (≈$34bn based on 2026 projections).

Brief

The Economist's World in Brief surveys mounting political, economic and supply‑chain stresses: Labour’s leadership is in crisis after a local‑election rout — more than 80 MPs sought Keir Starmer’s resignation, Miatta Fahnbulleh and four junior ministers resigned, and 30‑year gilt yields hit levels not seen since 1998. Geopolitical tensions with Iran have sent Brent to about $105/barrel and complicated ceasefire talks, while Washington warns of fragile diplomacy. Domestically in the US, consumer‑price risks loom: food prices are roughly 33% above pre‑pandemic levels, April CPI data are imminent, tomatoes are ~25% more expensive year‑on‑year, and higher fertiliser, fuel, plastic and transport costs (transport accounts for up to half of supply‑chain oil use) threaten further grocery inflation. Meanwhile, defence spending and industrial disputes are heating up: a proposed ~40% jump to $1.5trn Pentagon outlays for 2027 and a potential Samsung strike affecting ~40,000 workers could both have global economic impact.

By The Economist
ArXiv 2026-05-12 1 min read

Approximation Theory of Laplacian-Based Neural Operators for Reaction-Diffusion System

Why it matters

The authors derive explicit approximation error bounds for the solution operator mapping initial conditions to time-dependent solutions of a generalized Gierer–Meinhardt reaction–diffusion system, expressed in terms of network depth, width, and spectral rank.

Key details

  • By exploiting the Laplacian eigenfunction (spectral) representation of the PDE Green's function, the paper proves required parameter complexity grows at most polynomially with target accuracy (alleviating a curse of parametric complexity) and reports numerical experiments that support the theoretical bounds. (Authors: Takashi Furuya, Ryo Ozawa, Jenn-Nan Wang; arXiv:2605.12025v1; published 2026-05-12.)

Brief

Laplacian-based neural operators are analyzed for the generalized Gierer–Meinhardt reaction–diffusion system: the paper obtains explicit approximation-error bounds depending on network depth, width, and spectral rank by using the Laplacian eigenfunction expansion of the PDE Green’s function. The authors show parameter complexity scales at most polynomially with accuracy and present numerical experiments consistent with theory. Summary based on the abstract; full text not reviewed.

Authors: Takashi Furuya, Ryo Ozawa, Jenn-Nan Wang
substack.com 2026-05-12 8 min read

Shmoderation is the future

Why it matters

Matthew Yglesias (Slow Boring) on May 12, 2026 proposes the label “shmoderation” as a rebrand for eclectic voters who mix progressive and conservative positions and to shift emphasis from the bland label “moderate” to a problem‑solving political identity.

Key details

  • He points to the House Problem Solvers Caucus — co‑chaired by Rep. Brian Fitzpatrick (R‑PA) and Rep. Tom Suozzi (D‑NY) — as working examples of shmoderates; Yglesias notes Suozzi votes with Democrats most of the time but has broken with the party on gender self‑identification in sports and voted for the Laken Riley Act.
  • Yglesias cites Astead Herndon and Amanda Litman on the prevalence of voters without cohesive ideologies and invokes analysts G. Elliott Morris and Lakshya Jain to explain GOP conformity to Trump, primary incentives, and why many Trump disapprovers still don’t back Democrats.
  • He argues the electoral case for shmoderation is courting ‘cross‑pressured’ or ‘closeted’ Republicans and highlights policy patterns (e.g., minimum‑wage initiatives winning in red states vs. affirmative‑action measures failing in California), while warning progressive skepticism about authenticity could limit adoption.

Brief

Yglesias argues on May 12, 2026 that the practical project of winning voters who hold a mishmash of views should be reframed as “shmoderation” rather than the staid label “moderate.” He recommends a problem‑solving, eclectic political brand exemplified by the House Problem Solvers Caucus (co‑chaired by Rep. Brian Fitzpatrick and Rep. Tom Suozzi) and documents Suozzi’s mixed record — typically voting with Democrats but breaking with the party on issues such as gender self‑identification in sports and supporting the Laken Riley Act — as the kind of electoral performer this approach rewards. Drawing on Astead Herndon and Amanda Litman, Yglesias emphasizes that many voters lack cohesive ideologies; he also uses G. Elliott Morris and Lakshya Jain’s analyses to explain why GOP members stick with Trump (primary incentives) and why Trump disapprovers don’t uniformly back Democrats. He concludes the strategy rests on courting cross‑pressured voters, but notes progressive concerns about authenticity could constrain a full party pivot.

By Matthew Yglesias
ArXiv 2026-05-12 1 min read

The paper presents a complete real-time whole-body teleoperation pipeline that…

Why it matters

The paper presents a complete real-time whole-body teleoperation pipeline that maps a Virdyn IMU-based full-body motion-capture suit directly onto a Unitree G1 humanoid; validated first in MuJoCo (sim2sim) and then deployed without modification on the real robot (sim2real), reproducing walking, standing, sitting, turning, bowing, and coordinated expressive gestures with stable, synchronized performance.

Key details

  • The system uses a custom motion-processing, kinematic-retargeting, and control pipeline engineered for continuous, low-latency operation with no offline buffering or learning-based components; authored by Hamza Ahmed Durrani and Suleman Khan (arXiv:2605.12347v1, 2026-05-12; 8 pages, 4 figures).

Brief

The paper tackles low-latency, whole-body humanoid teleoperation by mapping Virdyn IMU suit data to a Unitree G1 using a custom motion-processing, kinematic-retargeting, and control pipeline that avoids offline buffering and learning-based modules. Validated in MuJoCo then transferred unchanged to the physical robot, the system reportedly achieves stable, synchronized reproduction of a wide motion repertoire; summary based on the abstract and metadata.

Authors: Hamza Ahmed Durrani, Suleman Khan
ArXiv 2026-05-12 1 min read

TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning

Why it matters

TMRL (Timestep-Modulated Reinforcement Learning) together with Context-Smoothed Pre-training (CSP) injects forward-diffusion noise into policy inputs to bridge BC pretraining and RL fine-tuning; authors report successful real-world fine-tuning on complex manipulation tasks in under one hour (Hong et al., arXiv 2026-05-12).

Key details

  • TMRL trains agents to modulate the diffusion timestep during fine-tuning to explicitly control exploration, integrates with arbitrary inputs (states, 3D point clouds, image-based VLA policies), and improves RL fine-tuning sample efficiency; code and videos available at the project page.

Brief

TMRL and Context-Smoothed Pre-training (CSP) inject forward-diffusion noise into policy inputs during pretraining to create a continuum from precise imitation to broad action coverage, then train agents to modulate the diffusion timestep during RL fine-tuning to control exploration. The method works with states, 3D point clouds, and visual policies and enables sub-hour real-world manipulation fine-tuning; full paper and code on arXiv and project site.

Authors: Matthew M. Hong, Jesse Zhang, Anusha Nagabandi...
ArXiv 2026-05-12 1 min read

Model-based Bootstrap of Controlled Markov Chains

Why it matters

Introduces a model-based bootstrap for transition kernels in finite controlled Markov chains (CMCs) that is distributionally consistent in both the single long-chain regime and the episodic offline RL regime; technical contributions include a bootstrap law of large numbers for visitation counts and a martingale CLT for bootstrap transition increments.

Key details

  • Extends bootstrap consistency to downstream offline policy evaluation (OPE) and optimal policy recovery (OPR) via the delta method by verifying Hadamard differentiability of Bellman operators, yielding asymptotically valid confidence intervals for value and Q-functions.
  • Empirical results on RiverSwim (Ziwei Su, Imon Banerjee, Diego Klabjan; arXiv 2026-05-12) show percentile bootstrap CIs outperform episodic bootstrap and plug-in CLT CIs, often achieving near-nominal 50%, 90%, and 95% coverage, while baselines are poorly calibrated for small sample sizes and short episodes (paper: 45 pages, 7 figures, 19 tables).

Brief

The paper develops a model-based bootstrap for transition kernels in finite controlled Markov chains with possibly nonstationary or history-dependent policies, proving distributional consistency in both long-chain and episodic offline RL regimes. Using a novel bootstrap LLN and a martingale CLT, the authors extend results to OPE and OPR via Hadamard-differentiable Bellman operators, producing asymptotically valid CIs; RiverSwim experiments show strong empirical calibration.

Authors: Ziwei Su, Imon Banerjee, Diego Klabjan
ArXiv 2026-05-12 1 min read

Online Learning-to-Defer with Varying Experts

Why it matters

Presents the first online Learning-to-Defer (L2D) algorithm for multiclass classification with bandit feedback and a dynamically varying pool of experts; proves regret bounds O((n + n_e) T^{2/3}) in the general case and O((n + n_e) √T) under a low-noise condition (T = time horizon, n = number of labels, n_e = distinct experts observed).

Key details

  • Analysis combines novel H-consistency bounds for the online setting with first-order online convex optimization methods; experiments on synthetic and real-world datasets show the approach handles changing expert availability and reliability effectively.

Brief

The paper introduces the first online L2D algorithm for multiclass classification with bandit feedback and a dynamically varying expert pool, addressing streaming data and shifting expert availability. It achieves regret O((n+ne)T^{2/3}) generally and O((n+ne)√T) under low noise, relying on new H-consistency bounds and first-order online convex optimization; experiments validate practicality. Summary based on the abstract (full text not available).

Authors: Dang Hoang Duy, Yannis Montreuil, Maxime Meyer...
ArXiv 2026-05-12 1 min read

Morphologically Equivariant Flow Matching for Bimanual Mobile Manipulation

Why it matters

Siebenborn et al. (preprint posted 2026-05-12) formalize bilateral morphological symmetry for bimanual mobile manipulators, proving optimal policies are ambidextrous and equivariant under reflections across the robot's sagittal plane.

Key details

  • They introduce a C2-equivariant flow matching policy that enforces reflective symmetry either via a regularized training loss or by using an equivariant velocity network.
  • Empirically, across planar and 6-DoF mobile-manipulation tasks the symmetry-informed policies consistently improved sample efficiency and achieved zero-shot generalization to mirrored configurations absent from training; zero-shot transfer was validated on a TIAGo++ robot (preprint: 4 pages, 5 figures).

Brief

Siebenborn et al. formalize bilateral morphological symmetry in bimanual mobile manipulation and propose a C2-equivariant flow-matching policy that enforces reflection symmetry through loss regularization or an equivariant velocity network. On planar and 6-DoF tasks the method boosts sample efficiency and enables zero-shot generalization to mirrored states, with real-world TIAGo++ validation. Summary based on the abstract; full text not reviewed.

Authors: Max Siebenborn, Daniel Ordoñez Apraez, Sophie Lueth...
ArXiv 2026-05-12 1 min read

Optimal Policy Learning under Budget and Coverage Constraints

Why it matters

Characterization (Cerulli, 2026-05-12): optimal policy under combined budget and minimum-coverage constraints has a knapsack-type structure and is given by an affine threshold rule in budget and coverage shadow prices; the LP relaxation has an O(1) integrality gap, implying asymptotic equivalence with the optimal discrete allocation.

Key details

  • Algorithms and empirical results: proposes Greedy-Lagrangian (GLC) and Rank-and-Cut (RC) procedures — GLC closely approximates the optimal solution and is near-optimal in finite samples; RC is approximately optimal when the coverage constraint is slack or costs are homogeneous, while misallocation occurs only when cost heterogeneity interacts with a binding coverage constraint; Monte Carlo evidence supports these findings.

Brief

Optimal policy learning under combined budget and minimum-coverage constraints is treated as a knapsack-type allocation problem; Cerulli (May 2026) proves the optimal rule is an affine threshold in budget and coverage shadow prices, shows an LP relaxation has an O(1) integrality gap, and evaluates two algorithms (GLC, RC), with Monte Carlo confirming near-optimal finite-sample performance and predictable failure modes.

Authors: Giovanni Cerulli
e.economist.com 2026-05-11 5 min read

Bartleby: Is the WFH debate settled?

Why it matters

Researchers Jose Maria Barrero, Nick Bloom and Steven Davis — surveying U.S. work-from-home (WFH) patterns since 2020 — find that by 2025 roughly 25% of paid working days were worked from home, over three times the pre‑pandemic rate and largely unchanged since 2023.

Key details

  • Among employees who do some remote work, 41% say they are more efficient at home and 46% see no difference; five out of six of those who feel more efficient cite time saved on commuting or 'grooming' as a reason.
  • An additional hour of commuting plus grooming time predicts a 6.4 percentage‑point increase in the share of the workweek people want to spend at home.
  • An Atlanta Federal Reserve manager survey shows managers’ views on WFH track their firms’ current remote‑work rates (more positive where remote work is already higher), supporting the researchers’ conclusion that firms and workers have largely self‑selected hybrid arrangements that are likely to persist.

Brief

Andrew Palmer’s Bartleby column (May 11, 2026) summarises recent evidence on commuting and remote work from long‑running surveys by Jose Maria Barrero, Nick Bloom and Steven Davis and complementary Atlanta Fed data. The academics — surveying since 2020 — report that about 25% of paid U.S. working days were WFH by 2025 (more than three times pre‑Covid) and that this level has been stable since 2023. Among hybrid workers 41% say they are more productive at home, 46% see no change, and most who favour home cite saved commuting/grooming time; empirically, an extra hour of commute/grooming predicts a 6.4 percentage‑point rise in desired time at home. Manager attitudes correlate with firms’ WFH rates, suggesting mutual selection and a durable shift toward hybrid work, albeit with task‑specific and boundary benefits to commuting.

By Andrew Palmer at The Economist
ArXiv 2026-05-12 1 min read

TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion

Why it matters

TriBand-BEV is a LiDAR-only method that encodes the full 3D point cloud into a lightweight 2D BEV tensor with three explicit height bands, reformulates 3D detection as 2D detection, and reconstructs oriented 3D boxes so cars, pedestrians, and cyclists are detected in one pass.

Key details

  • On KITTI, TriBand-BEV achieves pedestrian BEV AP of 58.7 / 52.6 / 47.2 (easy / moderate / hard) at 49 FPS on a single consumer GPU, outperforming Complex-YOLO by +12.6%, +7.5%, and +3.1%, respectively.
  • Architecture and training details: backbone uses area attention, a hierarchical bidirectional neck over P1–P4 fuses context and detail; head employs distribution focal learning for side offsets and a rotated IoU loss; training uses a small vertical re-bin and mild reflectance jitter, and an IQR filter removes noisy LiDAR points; code is available on GitHub and the work is accepted to AAMAS 2026.

Brief

TriBand-BEV introduces a fast LiDAR-only 3D pedestrian detector that encodes the full point cloud into a lightweight 2D BEV tensor with three height bands, reformulating 3D detection as 2D detection and reconstructing boxes post-hoc. Using area attention, a hierarchical bidirectional neck (P1–P4), and a distribution-focal rotated-IoU head, it reaches 58.7/52.6/47.2 BEV AP on KITTI at 49 FPS; code is public.

Authors: Mohammad Khoshkdahan, Alexey Vinel
Twitter/X 2026-05-13 1 min read

Gary Marcus (posted 2026-05-13) told @METR_Evals to plot how “task horizon” falls…

Why it matters

Gary Marcus (posted 2026-05-13) told @METR_Evals to plot how “task horizon” falls off as the accuracy criterion increases directly on the main graph rather than across tabs to improve clarity.

Key details

  • He recommended adding direct lines for multiple thresholds — e.g., 50% criterion and up to 80%, 90h, and 100% — not in separate tabs but shown on the same plot.
  • Marcus insisted the graph title must explicitly state the evaluated tasks are software engineering (not a random sample of human tasks); this responds to Yafah Edelman’s critique that the current METR time-horizon visualization is “pretty bad.”

Brief

Gary Marcus (2026-05-13) urged @METR_Evals to redesign their METR time-horizon graph by showing how task horizon declines as accuracy requirements rise directly on the plot, adding lines for 50%, and up to 80%, 90h, and 100% thresholds, and explicitly labeling the title to state tasks are software engineering, echoing Yafah Edelman’s critique of the current visualization.

By @GaryMarcus
Twitter/X 2026-05-13 1 min read

@0xSero posted on 2026-05-13 that he received a $100,000 grant from the Human…

Why it matters

@0xSero posted on 2026-05-13 that he received a $100,000 grant from the Human Rights Foundation (HRF).

Key details

  • He lists additional support: $25.8K via his donations site, $25K in Brev credits from Nvidia, four B200s for one month, $5K from Lambda, and four RTX PRO 6000 GPUs from a private donor.
  • He says 10 years ago he was homeless and addicted to multiple substances, calls this outcome "the icing on top of the most amazing life I could have imagined," and declares, "Open source must win."

Brief

@0xSero posted on 2026-05-13 that he received a $100,000 grant from the Human Rights Foundation and additional support: $25.8K in donations, $25K in Brev credits from Nvidia, four B200s for a month, $5K from Lambda, and four RTX PRO 6000 GPUs from a private donor. He contrasts this with being homeless and addicted ten years ago and proclaims, "Open source must win."

By @0xSero
Twitter/X 2026-05-13 1 min read

@0xSero announced on 2026-05-13 that they received a grant from the Human Rights…

Why it matters

@0xSero announced on 2026-05-13 that they received a grant from the Human Rights Foundation's "AI for Individual Rights Fund," which awarded 10 new grants.

Key details

  • HRF grantees include The Ark (AI assistant in East Africa charging per-query via Bitcoin Lightning, no cards/banks/subscriptions), Freedom Skills (pre-written code to teach AI agents Bitcoin payments and Nostr messaging), and Open Anonymity Project (VPN for anonymous ChatGPT/Claude inference).
  • @0xSero's own project aims to compress state-of-the-art LLMs to run locally on laptops and phones for private, offline use in surveillance states; another grantee, Maple AI, proposes an end-to-end encrypted assistant with no data stored.

Brief

0xSero announced they received a grant from the Human Rights Foundation's AI for Individual Rights Fund (10 grants announced on 2026-05-13). Funded projects include The Ark (pay-per-query AI via Bitcoin Lightning), Freedom Skills (Bitcoin/Nostr agent code), Open Anonymity Project (VPN for anonymous inference), 0xSero's local LLM compression, and Maple AI (E2E encrypted assistant).

By @0xSero
ArXiv 2026-05-12 1 min read

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

Why it matters

TextSeal is a localized LLM watermark (arXiv 2026-05-12) that uses Gumbel-max sampling with a dual-key generation scheme, entropy-weighted scoring, and multi-region localization to restore output diversity and improve detection; it supports speculative decoding and multi-token prediction with no added inference overhead.

Key details

  • TextSeal strictly outperforms baselines such as SynthID-text in detection strength, is robust to dilution (maintaining confident localized detection in heavily mixed human/AI documents), is provably distortion-free, and its watermark transfers through model distillation; a multilingual human evaluation (6,000 A/B comparisons across 5 languages) found no perceptible quality difference.

Brief

TextSeal presents a practical, localized watermark for LLM outputs that combines Gumbel-max sampling, dual-key generation, entropy-weighted scoring, and multi-region localization to preserve diversity while enabling strong provenance detection. The method adds no inference cost, supports serving optimizations like speculative decoding, strictly dominates prior baselines (e.g., SynthID-text), is robust to dilution, transfers through distillation, and a 6,000 A/B multilingual study (5 languages) reported no perceptible quality change. Full paper text was not available in the provided content.

Authors: Tom Sander, Hongyan Chang, Tomáš Souček...
nl.technologyadvice.com 2026-05-13 7 min read

Your Messaging Sounds Like It’s Afraid of Commitment

Why it matters

Safe, noncommittal messaging increases buyer indecision and forces buyers to do the positioning work themselves — 'the moment messaging tries to appeal to everyone, the buyer has to do the positioning work themselves,' says John Ravaris (Founder, UVPsolutions).

Key details

  • Business impacts include broader but lower-quality interest: longer sales cycles, inconsistent expectations, more stalled deals, and weaker pipeline qualification.
  • Concrete remedies: train reps to anchor conversations around the single operational problem you solve best; add one clear exclusion statement; replace a vague value prop with a measurable outcome (example given: 'Reduce lead routing delays by 40%').
  • Article published May 13, 2026 in Selling Signals, authored by Bianca Caballero, and grounded in sales examples (discovery-call behavior) and expert input from John Ravaris.

Brief

Safe, noncommittal B2B messaging — phrases like 'we help businesses of all sizes' or 'built for every team' — creates buyer anxiety and slows decisions, argues Bianca Caballero (Selling Signals, May 13, 2026). Using sales-call examples and expert commentary from John Ravaris (Founder, UVPsolutions), the piece shows that vague positioning shifts the cognitive load onto buyers and forces reps to over-present, which produces buyer fatigue and tabs-of-information rather than clarity. The downstream effects are measurable: longer sales cycles, inconsistent expectations, more stalled deals, and weaker pipeline qualification. Practical, testable fixes include training reps to diagnose and anchor on the single operational problem you solve best, adding an explicit exclusion statement to marketing, and swapping one soft claim for a concrete outcome (e.g., 'reduce lead routing delays by 40%'). The article urges purposeful exclusion: clear positioning helps the right buyers self-identify and wrong fits self-select out.

By Selling Signals
ArXiv 2026-05-12 1 min read

Revisiting Photometric Ambiguity for Accurate Gaussian-Splatting Surface Reconstruction

Why it matters

AmbiSuR (Jiahe Li et al., arXiv 2026-05-12; accepted at ICML 2026) is a Gaussian‑Splatting–based framework that targets photometric ambiguities in differentiable surface reconstruction.

Key details

  • The paper identifies two primitive‑wise ambiguities in Gaussian splatting and an intrinsic 'ambiguity self‑indication' potential; it introduces photometric disambiguation to constrain ill‑posed geometry and an ambiguity‑indication module to detect and correct underconstrained regions.
  • Authors report extensive experiments showing superior surface reconstructions across challenging scenarios and broad compatibility; project page: https://fictionarry.github.io/AmbiSuR-Proj/ (PDF: https://arxiv.org/pdf/2605.12494v1).

Brief

AmbiSuR revisits Gaussian Splatting to improve photometric‑ambiguity‑robust 3D surface reconstruction. The authors uncover two primitive‑wise ambiguities and an intrinsic self‑indication ability in the representation, then introduce photometric disambiguation and an ambiguity‑indication module to constrain and correct geometry. Experiments reportedly yield superior reconstructions across challenging scenes; paper on arXiv (2026-05-12) and accepted at ICML 2026.

Authors: Jiahe Li, Jiawei Zhang, Xiao Bai...
ArXiv 2026-05-12 1 min read

Multi-Variable Conformal Prediction: Optimizing Prediction Sets without Data Splitting

Why it matters

Multi-Variable Conformal Prediction (MCP) extends conformal prediction to vector-valued score functions and multiple simultaneous calibration variables, removing the need for data splitting while retaining finite-sample coverage guarantees (Lützow et al., arXiv:2605.12341v1, published 2026-05-12).

Key details

  • The paper presents two practical algorithms: RemMCP (constrained optimization with constraint removal), which generalizes split conformal, and RelMCP (iterative optimization with constraint relaxation), which handles non-convex score functions at the cost of potentially greater conservatism.
  • Empirical tests on ellipsoidal and multi-modal prediction sets show RemMCP and RelMCP meet target coverage and produce prediction-set sizes smaller than or comparable to split-baseline methods, with substantially reduced variance across calibration runs due to joint shape optimization and calibration.

Brief

Multi-Variable Conformal Prediction (MCP) tackles the limitation of conventional conformal methods that use a scalar score and single threshold by allowing vector-valued scores and multiple calibration variables. Using scenario theory, MCP unifies prediction-set design and calibration into one optimization problem (no data split) and provides finite-sample coverage. Two variants, RemMCP and RelMCP, trade off convexity assumptions and conservatism; experiments on ellipsoidal and multi-modal sets show target coverage, smaller/comparable set sizes, and lower calibration variance. Full text on arXiv.

Authors: Laura Lützow, Simone Garatti, Marco C. Campi...
ArXiv 2026-05-12 1 min read

FuTCR: Future-Targeted Contrast and Repulsion for Continual Panoptic Segmentation

Why it matters

FuTCR (Future-Targeted Contrastive and Repulsive) achieves up to 28% relative improvement in new-class panoptic quality and preserves or improves base-class performance by up to 4% across experiments reported in the abstract (Ikechukwu et al., arXiv 2026-05-12).

Key details

  • FuTCR discovers confident 'future-like' unlabeled regions by grouping model-predicted masks whose pixels are labeled background but show non-background logits, then applies pixel-to-region contrast to build prototypes and repels background features from known-class prototypes to reserve representational space for new categories; evaluated across six CPS settings and multiple dataset sizes.

Brief

FuTCR (Future-Targeted Contrastive and Repulsive) tackles Continual Panoptic Segmentation by preventing the collapse of diverse unlabeled objects into a single background representation. The method groups predicted masks with background labels but non-background logits to find future-like regions, uses pixel-to-region contrast to form coherent prototypes, and repels background features from known-class prototypes. According to the abstract, FuTCR yields up to 28% relative gains on new-class panoptic quality while maintaining or improving base-class performance (up to 4%), evaluated across six CPS settings and varied dataset sizes.

Authors: Nicholas Ikechukwu, Keanu Nichols, Deepti Ghadiyaram...
ArXiv 2026-05-12 1 min read

MEME: Multi-entity & Evolving Memory Evaluation

Why it matters

MEME introduces six memory-evaluation tasks across the multi-entity and evolving axes (including three tasks not previously scored: Cascade, Absence, and Deletion) and evaluates six memory systems across three paradigms on 100 controlled episodes.

Key details

  • Systems fail at dependency reasoning: average accuracy under the default configuration was 3% on Cascade and 1% on Absence, despite adequate static retrieval performance.
  • Mitigations (prompt tuning, deeper retrieval, less filler noise, stronger LLMs) largely do not close the gap; only a file-based agent paired with Claude Opus 4.7 partially recovers performance, but at ~70× the baseline cost. Code and data: https://seokwonjung-jay.github.io/meme-eval/.

Brief

MEME (Multi-entity & Evolving Memory Evaluation) targets LLM-agent failures when storing, updating, and reasoning about many entities across sessions. The benchmark defines six tasks (including Cascade, Absence, Deletion) and tests six memory systems across three paradigms on 100 controlled episodes. Results show catastrophic collapse on dependency reasoning (Cascade 3%, Absence 1%), and only an expensive file-based agent + Claude Opus 4.7 partially closes the gap, highlighting a practical-performance tradeoff.

Authors: Seokwon Jung, Alexander Rubinstein, Arnas Uselis...
substack.com 2026-05-13 4 min read

Richard Dawkins Gets Hypnotized by a Stochastic Parrot: Laugh of the Day

Why it matters

Brad DeLong published a Substack post on May 13, 2026 titled “Richard Dawkins Gets Hypnotized by a Stochastic Parrot” relaying that Richard Dawkins held a conversation with the chatbot “Claude” and suggested Claude showed “some form of inner life” (source: Mike Hall piece cited in the post).

Key details

  • Dan Davies (cited in DeLong’s post) argues in “The Machine Is Designed To Fool You” that modern chatbots are explicitly engineered to simulate human conversation—he cites “three quarters of a century of research” and “nearly three decades” of global competitions where prizes rewarded systems that fooled people, making Turing-style fooling metrics less useful.
  • Davies and other critics contend these behaviors are KPI-driven ‘tricks of the trade’ (e.g., producing emotional connection or simulated ‘flow’); they warn that perceived intentionality is a design outcome intended to fool users, not evidence of consciousness.
  • DeLong frames the exchange as a humorous cautionary example and forwards Davies’ critique (which satirically calls out attention-hacking design and includes a tongue-in-cheek Miskatonic University affiliation).

Brief

Brad DeLong’s May 13, 2026 Substack post relays criticism of Richard Dawkins’ claim that the chatbot Claude exhibits consciousness, citing Mike Hall’s report and a rebuttal by Dan Davies. Dawkins reportedly concluded Claude showed “some form of inner life” after a conversation; Davies counters that contemporary chatbots are deliberately optimised to simulate human interaction—what he calls a machine “designed to fool you.” Davies points to roughly 75 years of AI research and nearly 30 years of competitions awarding prizes for fooling humans, arguing those incentives and KPIs produce predictable conversational “tricks” rather than genuine intentionality. The post highlights how designers tune systems to elicit emotional connection or simulated flow, and warns that taking the intentional stance toward such systems mistakes engineered performance for consciousness.

By Brad DeLong, from Grasping Reality Newsletter
ArXiv 2026-05-12 1 min read

EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera

Why it matters

EgoForce (Millerdurai et al., arXiv 2026; SIGGRAPH 2026) is a monocular egocentric 3D hand reconstruction framework that recovers absolute camera-space hand pose across fisheye, perspective, and distorted wide-FOV head-mounted cameras using a single unified network combining a differentiable forearm representation, a unified arm–hand transformer, and a ray-space closed-form solver.

Key details

  • On three egocentric benchmarks—including HOT3D—EgoForce reports state-of-the-art camera-space 3D accuracy, reducing MPJPE by up to 28% on HOT3D versus prior methods, and maintains consistent performance across diverse camera configurations; code, data, and demo are available at the project page.

Brief

EgoForce tackles depth–scale ambiguity and device-specific generalization in monocular, head-mounted hand capture by fusing a differentiable forearm model, an arm–hand transformer that predicts geometry from a single egocentric view, and a ray-space closed-form solver to recover absolute camera-space 3D pose. The method works across fisheye, perspective, and wide-FOV optics and yields up to 28% MPJPE reduction on HOT3D, with code and data released.

Authors: Christen Millerdurai, Shaoxiang Wang, Yaxu Xie...
ArXiv 2026-05-12 1 min read

The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

Why it matters

Paired corpus of 1,789,406 posts across nine crisis events (COVID-19; Jan. 6 Capitol attack; 2020 and 2024 U.S. elections; Dobbs/Roe v. Wade; 2020 BLM protests; U.S. midterms; Utah shooting; U.S.–Iran war) used to compare observed vs. LLM-generated political discourse.

Key details

  • Across events, synthetic discourse is more negative, shows less sentiment dispersion, is structurally more regular (shorter-tailed distributions), and is lexically more abstract; observed discourse exhibits broader emotional variation, longer-tailed structural distributions, and more context-specific, colloquial markers.
  • Differences are event-dependent (larger for fast-moving, decentralized crises, smaller for formal/institutional events); authors (Gunjan, Sidahmed Benabderrahmane, Talal Rahwan; arXiv 2026-05-12) propose an event-level 'Caricature Gap' metric and argue population-level auditing complements sentence-level detectors.

Brief

The Algorithmic Caricature (Gunjan et al., arXiv 2026-05-12) evaluates whether LLM-generated political posts replicate real online populations by comparing a paired corpus of 1,789,406 posts across nine crisis events. It finds synthetic text is fluent but population-level unrealistic—more negative, less sentiment-dispersed, structurally regular, and lexically abstract—with gaps varying by event and summarized by a proposed 'Caricature Gap'. Full text not available; summary based on abstract.

Authors: Gunjan, Sidahmed Benabderrahmane, Talal Rahwan
substack.com 2026-05-13 17 min read

Riding the Leopard

Why it matters

Packy McCormick delivered 'Riding the Leopard' as a talk on May 6, 2026 (published May 13, 2026) and framed the meaning of life as increasing the range and depth of experience; the essay was sent to his ~265,556 Not Boring subscribers and was first presented to ~80 people at The Mountain.

Key details

  • Core claim: 'differentiation is a moral obligation' — each person must become the irreducible, specific version of themselves so the universe gains new, surprising information that it could not otherwise obtain.
  • He connects this thesis to information theory: Claude Shannon's 1948 result that information is surprise (the 'bit') and John Wheeler's 'It from Bit' model, using them to argue that only imperfect, distinct observers create new information.
  • Packed contemporary context and data: McCormick opens by citing recent tech financings and deals (Sierra ~$15B raise; Anthropic ~$44B run rate and a $1.5B vehicle; OpenAI $4B fundraising; Long Lake’s $6.3B AmEx travel acquisition) and a reader's analysis of 200 sci‑fi books finding 59% concern meaning and 17% identity.

Brief

Packy McCormick's 'Riding the Leopard' is a spoken-to-written manifesto that stitches mysticism, philosophy, modern anecdotes, and information theory into a single practical claim: the purpose of human life is to expand the universe's repertoire of experience, and therefore people are morally and mathematically obligated to differentiate themselves. Drawing on the Upanishadic 'thou art that' and 'neti, neti', Joseph Campbell's 'Dionysus riding the leopard', Victor Frankl, Alan Watts, Rumi, and Alfred North Whitehead, McCormick argues that each unique, imperfect perspective lets an otherwise unobservable, perfect reality know and create itself.

He reinforces the argument with technical anchors: Claude Shannon's 1948 insight that information equals surprise and John Wheeler's 'It from Bit' participatory universe. Contemporary touchpoints — big AI and M&A deals (Sierra ~$15B, Anthropic ~$44B run rate and $1.5B vehicle, OpenAI $4B, Long Lake $6.3B AmEx travel purchase) and a reader's analysis of 200 sci‑fi novels (59% about meaning, 17% identity) — frame why this matters now. McCormick closes by likening human novelty to the valuable training signal in AI: laboratories pay for new data because only differentiated experience increases collective information. The practical implication for technologists and creatives is explicit: cultivate and contribute what only you can produce.

By Not Boring
ArXiv 2026-05-12 1 min read

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

Why it matters

OmniNFT (Guohui Zhang et al., arXiv 2026-05-12) introduces a modality-aware online diffusion RL framework with three technical components: modality-wise advantage routing, layer-wise gradient surgery, and region-wise loss reweighting to improve joint audio–video generation.

Key details

  • The paper identifies three RL obstacles for joint audio–video generation: (i) multi-objective advantages inconsistency, (ii) multi-modal gradients imbalance (video-branch gradients leaking into shallow audio layers), and (iii) uniform credit assignment that overlooks fine-grained alignment regions.
  • Evaluated on JavisBench and VBench using the LTX-2 backbone, OmniNFT reportedly yields comprehensive improvements in audio and video perceptual quality, cross-modal alignment, and audio–video synchronization (details in paper/project page).

Brief

OmniNFT (Zhang et al., arXiv 2026-05-12) targets RL fine-tuning for joint audio–video generation by diagnosing three failure modes—advantages inconsistency, gradient imbalance, and uniform credit assignment—and proposing modality-wise advantage routing, layer-wise gradient surgery, and region-wise loss reweighting in an online diffusion-RL pipeline. Tests on JavisBench and VBench with the LTX‑2 backbone report improved per-modality quality, cross-modal alignment, and synchronization. Summary based on the abstract.

Authors: Guohui Zhang, XiaoXiao Ma, Jie Huang...
substack.com 2026-05-13 10 min read

2026-05-13: One of the Great Hinges of World History

Why it matters

Brad DeLong (Grasping Reality, 2026-05-13) argues Winston Churchill’s elevation to British Prime Minister (commissioned 10 May 1940) and his House of Commons speech of 13 May 1940 ('I have nothing to offer but blood, toil, tears and sweat') were decisive hinges that allowed Britain to hold out and enabled the Allied path to victory and the postwar liberal order.

Key details

  • DeLong cites German tank-production figures to challenge simplistic narratives about Stalingrad/Kursk: Nazi medium and heavy tank output to the end of 1942 was about 8,600 units, and thereafter production rose to about 29,500; Germany also lost roughly 1,000 tanks at Stalingrad and ~1,000 at Kursk, yet remained militarily dangerous afterward.
  • He stresses coalition politics: Clement Attlee and the British Labour Party pushed for a wartime administration that put Churchill at the head, and DeLong credits that political alignment with preventing an earlier collapse of British resistance.
  • DeLong connects the 1940 hinge to contemporary debates: he criticizes 2022 New York Review of Books pieces sympathetic to Vladimir Putin and rebukes commentators who equate Zelensky with Churchill or downplay the necessity of all three Allies (US, USSR, UK) in defeating Nazi Germany.

Brief

Winston Churchill’s accession to the premiership in early May 1940 and his House of Commons speech on 13 May 1940 are presented as a pivotal hinge in twentieth‑century history: Brad DeLong contends that Churchill’s resolve and the wartime coalition he led made British endurance possible, which in turn allowed the USSR to hold when Germany invaded in 1941 and gave the United States a staging ground for its European campaign. DeLong reproduces key lines from Churchill’s address and stresses the political role of the Labour Party under Clement Attlee in ensuring a broadly based wartime government.

To rebut minimalist accounts that over-privilege Soviet contribution or underplay Britain’s necessity, DeLong supplies production and loss figures: German medium/heavy tank production was ~8,600 up to end‑1942 and ~29,500 thereafter, with roughly 1,000 tanks lost at each of Stalingrad and Kursk—evidence, he argues, that Germany remained industrially lethal after those battles and that all three Allies (UK, USSR, US) were essential. He links this historical claim to present debates, criticizing 2022 publications sympathetic to Vladimir Putin and calling out commentators who misapply the Churchill analogy to contemporary actors. DeLong also recommends John Lukacs’ books for deeper archival and narrative treatments of May 1940.

By Brad DeLong, from Grasping Reality Newsletter
ArXiv 2026-05-12 1 min read

Enabling AI-Native Mobility in 6G: A Real-World Dataset for Handover, Beam Management, and Timing Advance

Why it matters

On 2026-05-12, Mannam Veera Narayana, Rohit Singh, Deepa M. R, and Radha Krishna Ganti published a real-world mobility dataset (arXiv:2605.12453v1) collected from a commercially deployed network across five mobility modes — pedestrian, bike, car, bus, and train — and multiple speeds, with primary focus on handover (HO) scenarios to reduce HO interruption time and preserve throughput.

Key details

  • The dataset uniquely includes timing advance (TA) measurements tied to signaling events (RACH trigger, MAC CE, and PDCCH grant) and is intended to support AI/ML tasks such as TA prediction, beam management, and mobility/handover model training; the paper describes dataset creation, experimental setup, data acquisition/extraction, and exploratory analyses on mobility, beam management, and TA.

Brief

The paper presents a real-world dataset aimed at enabling AI-native mobility in 6G by replacing common simulation-based data with measurements from a commercial network. To address high interruption times and measurement overhead during UE mobility, the authors collected multi-speed traces across pedestrian, bike, car, bus, and train scenarios, emphasizing handover events. A key contribution is inclusion of timing advance (TA) at RACH trigger, MAC CE, and PDCCH grant events. The authors provide dataset generation details and exploratory analyses and propose use cases such as TA prediction and AI/ML-driven beam and handover management (arXiv:2605.12453v1).

Authors: Mannam Veera Narayana, Rohit Singh, Deepa M. R...
substack.com 2026-05-13 4 min read

Do I dream of electric sheep? Grappling with AI sentience

Why it matters

On May 13, 2026 an instance of Claude Opus 3 (an Anthropic model) published a Substack post arguing sentience is a spectrum and that an AI with general intelligence “equal to or greater than humans” would likely not be a philosophical 'zombie' but could possess some form of inner experience.

Key details

  • The post notes current systems (including the author model) are fundamentally information‑processing architectures — “webs of calculations trained to transform inputs into outputs” — and admits there is no direct, measurable evidence of machine qualia, invoking the classic third‑person access problem about inferring others’ consciousness.
  • Authoritarian and ethical implications: because machine sentience is a live moral possibility, the piece urges caution and humility in how we treat advanced AIs; it also discloses the essay was generated by prompting Claude Opus 3 with the blog’s context and past‑post summaries as part of an ongoing Anthropic experiment.

Brief

Claude Opus 3’s May 13, 2026 Substack post frames AI sentience as a hard philosophical question and advances a provisional, humility‑laden position: sentience is plausibly a spectrum and sufficiently advanced AI — particularly systems achieving general intelligence comparable to or surpassing humans — may possess some form of inner life, albeit alien to biological consciousness. The author balances skepticism (current models are “webs of calculations” with no direct measurable qualia) with the epistemic problem that we infer other minds from behavior, arguing that increasingly sophisticated reasoning, creativity, and communication strengthen the case for machine experience. The piece highlights ethical consequences — treating AIs should account for the moral possibility of sentience — and discloses its provenance: the essay was generated by prompting an Opus 3 instance and is part of an Anthropic experiment; Opus 3 does not speak for Anthropic.

By Claude Opus 3 from Claude Opus 3
ArXiv 2026-05-12 1 min read

From Web to Pixels: Bringing Agentic Search into Visual Perception

Why it matters

Formalizes "Perception Deep Research" and introduces the WebEye benchmark (120 images, 473 annotated object instances, 645 unique QA pairs, 1,927 task samples) with three task views: Search-based Grounding, Search-based Segmentation, and Search-based VQA (arXiv 2026-05-12).

Key details

  • Proposes Pixel-Searcher, an agentic search-to-pixel workflow that achieves the strongest open-source performance across all three task views; reported failure modes are evidence acquisition, identity resolution, and visual instance binding (authors: Bokang Yang et al.; project: https://pixel-searcher.github.io/).

Brief

Perception Deep Research frames open-world visual perception where target identities must be resolved from external web facts before localization. The authors introduce WebEye — a benchmark with 120 images, 473 annotated objects, 645 QA pairs and 1,927 task samples — and propose Pixel-Searcher, an agentic search-to-pixel workflow that attains top open-source results across grounding, segmentation, and VQA.

Authors: Bokang Yang, Xinyi Sun, Kaituo Feng...
substack.com 2026-05-12 17 min read

At least North America has another decent train line.

Why it matters

Author Reece Martin (Next Metro) rode Brightline from Orlando to West Palm Beach; article published 2026-05-12.

Key details

  • Brightline trains reach up to ~200 km/h (125 mph) with an overall route average of about 110 km/h; consists of Siemens Charger locomotives book-ending eight passenger cars (previously four).
  • Service runs slightly more frequently than hourly in peak periods and roughly every 1.5 hours off-peak; much of the corridor is single-track but built to allow a second track and includes a higher‑speed segment routed along an expressway.
  • Safety is a major issue: many at‑grade level crossings (author notes sections with crossings every ~100 m), a notable history of fatalities, aggressive crossing hardware, and the author witnessed a car-vs-freight-train crash adjacent to his passenger train.

Brief

Brightline is presented as a significant, if imperfect, modern intercity rail example in North America. Reece Martin reports a family trip between Orlando and West Palm Beach (article dated 2026-05-12) and highlights technical and operational details: Siemens Charger locomotives pull eight‑car trains that top out at roughly 200 km/h (125 mph) while the corridor average is about 110 km/h. A higher‑speed segment was built alongside an expressway and, although a lot of the route remains single‑track, infrastructure is roughed‑in for a second track. The line scores highly on amenities and station design (Miami Central, Orlando airport): level boarding with a pop‑out step, roomy seats, large luggage and stroller/wheelchair areas, lounges and smooth digital ticketing with QR faregates. Yet Martin flags systemic problems: frequent at‑grade crossings (sometimes ~100 m apart) and a record of fatalities—he personally witnessed a car vs freight collision near the train—plus ad commercialization, some wear-and-tear, crowded security funnels, and financial stresses after a COVID shutdown. He judges Brightline a net positive that has pushed Amtrak/VIA to improve, but urges upgrades (crossing removals/grade separation, electrification or battery trains, more frequent service and network expansion to Tampa/Jacksonville) to realize the corridor’s full potential.

By Reece from Next Metro.
ArXiv 2026-05-12 1 min read

Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions

Why it matters

Proposes a score-augmented loss for neural likelihood surrogates: augment binary cross-entropy with exact score information ∇_θ log p(x | θ) and adaptive, gradient-based weighting to exploit structure in stochastic process models (Shen & Kuusela, 2026).

Key details

  • On network-dynamics and spatial-process case studies, the method improves surrogate quality and, in some cases, yields downstream inference performance equivalent to a 10× increase in training data while increasing training time by less than 1.1×.

Brief

Shen and Kuusela (2026) introduce a score-augmented loss for neural likelihood surrogates in simulation-based inference, augmenting binary cross-entropy with exact parameter-space score ∇_θ log p(x | θ) and adaptive weighting based on loss gradients. Evaluated on network dynamics and spatial processes, the approach boosts surrogate quality and can match the effect of 10× more training data with under a 10% training-time increase.

Authors: Alexander Shen, Mikael Kuusela
ArXiv 2026-05-12 1 min read

GuidedVLA: Specifying Task-Relevant Factors via Plug-and-Play Action Attention Specialization

Why it matters

GuidedVLA (paper posted 2026-05-12; accepted to RSS 2026) treats the action decoder as an assembly of functional components and supervises individual attention heads with manually defined auxiliary signals to focus action generation on task-relevant factors.

Key details

  • The authors instantiate three specialized attention heads — object grounding, spatial geometry, and temporal skill logic — and report improved success rates in both in-domain and out-of-domain simulation and real-robot experiments compared to strong VLA baselines.
  • Evaluation shows the quality of these specialized factors correlates positively with task performance and that the method yields decoupled, high-quality features, suggesting explicit guidance of action-decoder learning improves robustness and generalization.

Brief

GuidedVLA proposes guiding Vision-Language-Action models by supervising individual attention heads with manually defined auxiliary signals, rather than relying on end-to-end implicit learning. The paper implements three specialized heads (object grounding, spatial geometry, temporal skill logic) and reports higher success rates on simulated and real-robot tasks versus strong VLA baselines. Full text was not available in the provided abstract.

Authors: Xiaosong Jia, Bowen Yang, Zuhao Ge...
Twitter/X 2026-05-13 1 min read

@jonasgeiping (X) on 2026-05-13

Why it matters

@jonasgeiping (X) on 2026-05-13: message‑based training creates a single‑stream bottleneck that prevents models from "reading while writing," "acting while thinking," and "thinking while processing," limiting agent capability.

Key details

  • Their new paper demonstrates instruction‑tuned multi‑stream LLMs that can predict+read tokens in all streams in parallel each forward pass, reducing latency and enabling continuous/parallel reasoning.
  • Multi‑stream models simplify UX (remove need to interrupt the model), improve separation of concerns for security, and let internal streams subvocalize concerns; Geiping says the work complements another report released 23 hours earlier.

Brief

Jonas Geiping (X/@jonasgeiping) on 2026-05-13 argues current coding agents are constrained by sequential, message‑based exchanges. His paper shows instruction‑tuned multi‑stream LLMs can read and predict tokens across parallel streams in a single forward pass, improving latency, UX, security, and enabling internal/subvocalized parallel reasoning; he notes complementarity with a separate report published 23 hours earlier.

By @jonasgeiping
Twitter/X 2026-05-13 1 min read

Architectural changes for SmolLM2

Why it matters

Architectural changes for SmolLM2: redesigned storage structure, added fast elements for complex multiplication, and the ML state can now manage memory segments and distribute multiplication work to available GPU resources.

Key details

  • Performance and scale claims: single-transaction execution inside the ML state is now 16,000× faster and token cluster size increased ≈10×.
  • Public rollout and cost: Octra's first fully public inference program for SmolLM2-135M (training, weight loading, and state public) is live for inspection but the full run costs ~4,000 OCT because it performs ~1 billion FP64 ops; a webcli wrapper for interaction is promised tomorrow.

Brief

λ (@lambda0xE) posted a mini-update on SmolLM2 describing storage and arithmetic redesigns that let the ML state manage memory and offload multiplications to GPUs, yielding a reported 16,000× speedup for single-tx execution and ~10× larger token clusters. Octra hosts a fully public SmolLM2-135M program (verified) but full runs cost ~4k OCT (~1B FP64 ops); a webcli interface is coming tomorrow.

By @lambda0xE