Why it matters
@TGLetter warns NVIDIA's market cap is $5.33 trillion (tweeted 2026-05-13), larger than Germany, Japan, France, the UK, Italy, Canada, and Brazil combined; only the US and China have larger GDP than a single chip company.
Key details
- The author parallels 2000 Microsoft (c. $600 billion valuation) which analysts called a 'new permanent reality' before it lost ~60% of its value over the next 18 months, implying similar vulnerability for NVIDIA.
- The thread asserts NVIDIA doesn't produce essentials (oil, food, infrastructure) but sells chips to data centers to train AI models 'that haven't turned a profit yet,' so the entire $5.33T is a bet on an unproven future with 'no playbook' if AI spending slows.
Brief
NVIDIA is valued at $5.33 trillion, exceeding the combined GDPs of Germany, Japan, France, the UK, Italy, Canada and Brazil, the author warns. Citing the 2000 Microsoft episode (≈$600B then −60% in 18 months), the post argues NVIDIA's valuation is a speculative bet on chips used to train AI models that have not yet proven profitable and that there is no historical playbook if AI spending decelerates.
By @TGLetter
Why it matters
Collyer Bridge (May 12, 2026) highlights a value idea: buy a holding company that majority-owns a copper clad laminate (CCL) producer to get cheap CCL exposure because the holdco’s four other divisions reportedly account for almost the entire market cap.
Key details
- The post cites supply tightness as rationale: a May 3, 2026 tweet from @jukan05 reports a Seoul PCB manufacturer placed advance orders worth 10 billion won with two Taiwanese CCL producers, EMC and TUC, with the order volume described as more than five times normal levels (tweet had ~106K views).
- The write-up is a preview/paywalled Substack post — the author solicits paid subscriptions for the full analysis, scuttlebutt and on-the-ground research behind the idea.
Brief
Discounted exposure to copper clad laminates (CCL) is presented as a value-investor play by Collyer Bridge in a May 12, 2026 Substack preview: the author argues investors can gain CCL exposure cheaply by owning a holding company that majority-owns a CCL producer while the holdco’s four other divisions account for nearly the entire market capitalization, effectively leaving the CCL business under‑valued. The memo cites acute CCL tightness and AI-driven demand as catalysts, pointing to a May 3, 2026 tweet from @jukan05 reporting a Seoul PCB maker placed advance orders worth 10 billion won with Taiwanese producers EMC and TUC (order volumes described as >5x typical), and frames the thesis as scuttlebutt-driven value research — full details and on‑the‑ground checks are behind the post’s paywall.
By Collyer Bridge
Why it matters
Garry Tan's headline claim on Rick Rubin's Tetragrammaton: the engineers who hate vibe coding and AI the most are the people who would benefit the most from embracing it.
Key details
- The episode was published 2026-05-13, runs ~2 hours (final timestamp 1:59:38), and is hosted on Rick Rubin's Tetragrammaton podcast.
- Topics covered with timestamps include early computing and math (0:15–13:18), video games and storytelling (13:18–21:04), engineering and design (21:04–32:15), startups/Y Combinator and founder selection (32:15–58:47), AI/programming and creative revolution (58:47–1:13:10), taste/reps/builder intuition, power/responsibility, reinventing institutions, and AI + manufacturing for greater abundance.
Brief
Garry Tan's episode on Rick Rubin's Tetragrammaton podcast (published 2026-05-13) runs about two hours and argues that engineers who resist 'vibe coding' and AI would benefit from adopting them. He recounts his path from computing, mathematics, and video games to engineering, YC founder selection, AI-driven creativity, builder intuition, institutional reinvention, and AI-enabled manufacturing.
By @garrytan
Why it matters
@BoringBiz_ claims the application layer will capture most AI value because it has minimal capex while the model layer is investing “billions” in GPUs and data centers.
Key details
- They argue applications are asset‑light with variable costs, lower leverage (no billions of GPU/data‑center debt), can target verticals intensely, and face less concentrated competition than model leaders like OpenAI, Anthropic, and Google, giving applications a larger TAM.
Brief
Author @BoringBiz_ argues that AI value will concentrate at the application layer: apps require little capex vs. the model layer’s “billions” for GPUs and data centers, are asset‑light with lower leverage, can focus on vertical customers, and face less top‑heavy competition (OpenAI, Anthropic, Google), yielding a larger TAM.
By @BoringBiz_
Why it matters
Ben Halligan coins the new org playbook “Dorsey Mode” after Jack Dorsey, saying it departs from Andy Grove’s High Output Management; he claims Jack’s first quarter after adopting it was “a banger” and notes Brian Armstrong runs a very similar playbook used by many startups in the past 18 months.
Key details
- Strategy and distribution change: planning cycles are largely abandoned because faster iteration turns many 1‑way doors into 2‑way doors, making creative distribution (and enterprise sales) the primary competitive moat.
- Hiring and org shape shift: interview loops now include hard AI case problems or live demos; some companies (e.g., Meta, HubSpot) favor very senior engineers while others hire junior AI‑focused talent; org charts move from large triangle hierarchies to circular structures with a central world model and small teams around it, and Jack has removed titles to focus people on work not level.
- Systems, ops, and leadership implications: more decisions are delegated to world‑models/systems, IT must build scaffolding and make all context legible (an early sign is recording nearly every meeting including 1:1s for model training), compensation must widen (higher standard deviation), CEOs must lead by doing (Dorsey reportedly spends 3 hours each morning building) and run hackathons/office hours to drive adoption.
Brief
Ben Halligan argues that Jack Dorsey’s approach — which he dubs “Dorsey Mode” — is a radical organizational playbook shift away from Andy Grove’s High Output Management, and that it has measurable early upside (Halligan says Dorsey’s first quarter after adoption was “a banger”). The model accelerates iteration so planning cycles break down, turning many 1‑way doors into reversible choices and elevating distribution as the primary moat. Recruiting now emphasizes AI problem cases or live demos, with some firms favoring very senior engineers (Meta, HubSpot) and others hiring junior, AI‑native talent. Org charts move from large triangular hierarchies to small teams orbiting a central world model; titles are being removed; decisions increasingly hand off to systems; IT becomes the scaffolding team that records and feeds meetings and context into models; compensation and leadership practices must change, and CEOs are expected to lead by building and running hackathons to push adoption.
By @bhalligan
Why it matters
DexTwist (Lee, Li, Lee; arXiv 2026-05-12) is a mixed-reality dexterous-hand retargeting framework that detects a tripod pinch, estimates the operator's intended screw axis and twist magnitude, and applies a real-time residual joint-space refinement to track turning progress while regularizing robot tripod geometry.
Key details
- The refinement minimizes a virtual-object objective composed of turning angle, screw-axis consistency, fingertip closure, and tripod stability to mitigate embodiment-gap issues (link-length/joint-axis mismatches) that cause tangential fingertip sliding and screw-axis drift in tasks like cap opening, key turning, and bolt screwing.
- Simulation and real-world experiments reported in the 6-page paper (5 figures, 2 tables) show DexTwist improves turning-angle tracking and screw-axis stability compared with a vector-based retargeting baseline (no numeric percentages provided in the abstract).
Brief
DexTwist introduces a functional twist-retargeting method for MR-based teleoperation that targets contact-rich rotational manipulation where kinematic imitation fails. The system detects tripod pinches, estimates intended screw axis and twist, then performs a real-time joint-space residual optimization minimizing a virtual-object objective (turn angle, axis consistency, fingertip closure, tripod stability). Simulations and real tests demonstrate improved turning-angle tracking and reduced screw-axis drift versus a vector-based baseline.
Authors: Dongmyoung Lee, Chengxi Li, Dongheui Lee
Why it matters
Authors identify a long-tail failure pattern in computer-use agents (citing failures in GPT-5.4 and Claude): a small fraction of complex, low-frequency GUI interactions accounts for a disproportionate share of task failures, attributed to data scarcity for complex interactions (paper published 2026-05-12).
Key details
- They release CUActSpot, a multimodal benchmark (GUI, text, table, canvas, natural image) covering diverse actions (click, drag, draw, etc.), and a renderer-based data-synthesis pipeline that auto-generates scenes, records screenshots and element coordinates, and uses an LLM to produce instructions/action traces; their Phi-Ground-Any-4B model trained on this data outperforms open-source models with <32B parameters (code/data/models at https://github.com/microsoft/Phi-Ground.git).
Brief
CUActSpot targets the long-tail of complex GUI interactions that undermine computer-use agents by providing a multimodal benchmark (GUI, text, table, canvas, natural image) and a renderer-based data-synthesis pipeline: automatic scene generation, screenshot/element-coordinate recording, and LLM-produced instructions/action traces. Training on this corpus yields Phi-Ground-Any-4B, which outperforms open models under 32B parameters. Only the abstract was available for this summary.
Authors: Miaosen Zhang, Xiaohan Zhao, Zhihong Tan...
Why it matters
David J. Malan (Harvard), creator of CS50, credits two lecture techniques for student engagement: a designed “memorable moment” per lecture (e.g., ripping a phone book to illustrate binary search) and sustained high energy motivated by a fear of boring the audience.
Key details
- AI has reduced CS enrollments and employer hiring of junior engineers from Harvard, and academic dishonesty remains about 5–10% of students each semester but is now harder to prosecute because AI-generated answers are difficult to attribute to a source.
- Malan argues to teach C in 2026 because it’s low-level enough to reveal how computers work yet compact enough to force students to reimplement core data structures; he identifies pointers as the most challenging concept for students to master.
- The interview (Ryan Peterman) was published May 11, 2026; the conversation is available on YouTube/Spotify/Apple Podcasts and the transcript is on Substack.
Brief
In a May 11, 2026 interview with Ryan Peterman, Harvard CS professor David J. Malan — the instructor behind CS50 — explains why his lectures resonate, how AI is reshaping student behavior, and why C still matters. Pedagogically, Malan builds a single “memorable moment” into each lecture (his classic is ripping a phone book to anchor binary search) and projects high energy driven by a fear of boring students. On AI’s downstream effects he reports fewer students enrolling and fewer companies hiring junior engineers from Harvard; cheating still affects roughly 5–10% of students per semester, but AI-produced responses are difficult to tie to a source so prosecution is harder. Pedagogically he defends teaching C because its small, low-level surface forces students to implement basic data structures and exposes machine-level concepts; he names pointers as the hardest topic for novices. The full conversation and transcript are linked on Substack and major podcast platforms.
By Ryan Peterman
Why it matters
Paul Graham, speaking at YC | Stockholm on April 29, 2026, argued founders who want faster investor engagement and more serendipitous meetings should move to Silicon Valley, citing faster-moving investors (02:45, 04:36) and serendipity (01:01, 02:45) as key advantages.
Key details
- He claimed respect and stronger benchmarking come from competing with 'big fish' in the Valley (06:03, 09:10), used the Dropbox founding story as an example (07:59), and highlighted Silicon Valley’s pay‑it‑forward culture (12:21) as critical for startup success.
- On building hubs elsewhere, Graham outlined ways to help Stockholm thrive (15:36), endorsed YC as the optimal path for founders seeking that ecosystem (17:24), and posed whether Stockholm could become the Silicon Valley of Europe (19:54).
Brief
Paul Graham argued at YC | Stockholm (April 29, 2026) that founders seeking faster capital, serendipitous meetings, and industry respect should relocate to Silicon Valley, citing faster-moving investors, a pay-it-forward culture, and the Dropbox story. He also outlined how to help Stockholm thrive, recommended YC as the optimal path, and asked if Stockholm could become Europe’s Silicon Valley.
By @ycombinator
Why it matters
AwardWallet reported on May 11, 2026 that a popular dining-focused credit card received an anniversary refresh adding a 5X earning rate on hotels.
Key details
- The update includes nearly $100 in limited-time travel and dining credits and adds rental-car elite status, both available to enroll in now.
- The promotion pairs the changes with a welcome bonus up to 100,000 points; the email withheld the specific card/issuer name due to advertising-partner requirements.
Brief
A popular dining-focused credit card received an anniversary refresh on May 11, 2026 that adds a 5X hotel-earning rate, nearly $100 in limited-time travel and dining credits, and rental-car elite status (enrollable now). AwardWallet notes the upgrade is coupled with a welcome bonus up to 100,000 points, though the email withheld the issuer/card name for advertising reasons.
By AwardWallet
Why it matters
Proposes test-time LLM-guided query refinement that updates a query's embedding using feedback from a generative LLM on a small set of documents, improving ranking quality and producing clearer binary separation in the embedding space.
Key details
- Experiments with state-of-the-art text embedding models across diverse search and classification benchmarks show consistent gains across all models/datasets, with relative improvements up to +25% on literature search, intent detection, key-point matching, and nuanced instruction-following.
- Method broadens practical zero-shot use of embeddings as a cheaper alternative to corpus-scale LLM pipelines; authors Ariel Gera, Shir Ashury-Tahan, Gal Bloch et al. released code at https://github.com/IBM/task-aware-embedding-refinement (arXiv:2605.12487, 12 May 2026).
Brief
Task-Adaptive Embedding Refinement via Test-time LLM Guidance presents a test-time method that refines query embeddings via a generative LLM's feedback on a small document subset to tailor embeddings to ad-hoc zero-shot search and classification tasks. Experiments with state-of-the-art embedding models across diverse benchmarks report consistent gains (up to +25% relative), improving ranking and class separation; code released on GitHub. Abstract only; full text not provided here.
Authors: Ariel Gera, Shir Ashury-Tahan, Gal Bloch...
Why it matters
Presents a proximal-gradient sampler for composite log-concave targets π ∝ e^{-f-g}; when f+g is α-strongly convex and f is β-smooth, it attains ε total-variation error in ~O(κ·sqrt(d)·log^4(1/ε)) iterations, where κ = β/α, matching prior state-of-the-art for the g=0 case.
Key details
- Algorithm requires gradient access to f and a restricted Gaussian oracle (RGO) for g (able to sample from density ∝ exp(-g(x) - (1/(2h))||y-x||^2)); results are extended to non-log-concave targets satisfying a Poincaré or log-Sobolev inequality and to Lipschitz, non-smooth f.
Brief
The paper introduces a proximal-gradient Monte Carlo sampler for composite log-concave densities π ∝ e^{-f-g}, combining gradient steps on smooth f with a restricted Gaussian-oracle (RGO) proximal sampler for g. Under α-strong convexity of f+g and β-smoothness of f it achieves ε total-variation error in ~O(κ·sqrt(d)·log^4(1/ε)) iterations (κ=β/α), and the authors extend guarantees to Poincaré/LSI targets and to Lipschitz non-smooth f.
Authors: Linghai Liu, Sinho Chewi
Why it matters
Ramp offers end-to-end merch fulfillment (email published 2026-05-11) including printing, inventory storage, pick, pack & worldwide shipping, online store setup, employee onboarding kits, event distribution, and campaign swag; clients can use the full service or select individual components.
Key details
- Case study: The Bugle Podcast — Ramp built an online store, printed limited-edition Christmas jumpers and handled worldwide shipping; every jumper sold out before Christmas.
Brief
Ramp's merch fulfillment service (announced in an email published 2026-05-11) combines printing with inventory storage, pick-and-pack operations and worldwide shipping, plus online store setup, onboarding kits and event/campaign distribution. The offering is modular — clients can select full fulfillment or specific services — and was credited with selling out The Bugle Podcast's limited-edition Christmas jumpers after Ramp printed and shipped orders globally.
By Ramp for Merch
Why it matters
Fibre‑optic, wire‑guided first‑person‑view (FPV) drones—used extensively in Ukraine—are effectively unjammable because control signals run over a physical fibre tether; fibre gives lower latency and higher video bandwidth, and a 50 km spool price rose from about $300 to $2,500 (Dimko Zhluktenko).
Key details
- The Economist obtained a ten‑page GRU document offering Iran 5,000 fibre‑optic drones, long‑range Starlink‑guided drones and operator training, including maps for attacking a slow‑moving American landing flotilla; it is unknown whether Russia shared or acted on the proposal.
- FPV systems now reach ranges of roughly up to 40 km; CNAS modelling cited in the piece describes layered attacks from 80 km down to 5 km, and the US Indo‑Pacific Command concept “Hellscape” envisions using dense FPV attacks to blunt a Chinese amphibious invasion of Taiwan.
- Military tech and know‑how are flowing among Russia, China, Iran and North Korea and out to proxies (eg Hizbullah’s use of fibre‑optic drones); related concerns include China developing quieter submarines with Russian help and reported Iranian strikes (May 7) that damaged at least 228 structures, while US satellite imagery releases have been curtailed.
Brief
Fibre‑optic, wire‑guided FPV drones are reappearing as a transformational weapon: by tethering control and video over fibre they are effectively immune to RF jamming, offer much lower latency and higher bandwidth for sharper targeting, and have become decisive on Ukraine’s battlefields. The Economist (May 11, 2026) reports a ten‑page GRU proposal to supply Iran with 5,000 fibre‑optic drones, long‑range Starlink‑guided UAVs and training, including plans to attack US landing forces. FPV ranges are now approaching ~40 km, and CNAS modelling and US Indo‑Pacific Command concepts (eg “Hellscape”) show dense layered FPV attacks (80 km→5 km) would threaten amphibious operations such as a Chinese invasion of Taiwan. The piece warns of accelerating tech flows between Russia, China, Iran and North Korea, wider proliferation to proxies (eg Hizbullah), and related operational impacts including quieter Chinese submarines and recent Iranian strikes (May 7) that reportedly damaged hundreds of US‑linked facilities.
By Shashank Joshi at The Economist
Why it matters
MoLA (Mixture of Latent Actions), proposed 2026-05-12 by Yajie Li et al., converts imagined future videos into executable control representations by using a mixture of pretrained inverse-dynamics models to infer latent actions from predicted visual transitions; the modality-aware inverse dynamics models explicitly exploit semantic, depth, and optical-flow cues.
Key details
- The method was evaluated on simulated benchmarks LIBERO, CALVIN, and LIBERO-Plus and on real-world robot manipulation tasks, and the abstract reports consistent gains in task success, temporal consistency, and generalization; the work is listed as ICML 2026.
Brief
MoLA (Mixture of Latent Actions) targets the gap between video-based imagination and actionable control: instead of feeding predicted frames to a policy or decoding videos directly into controls, it infers a mixture of latent actions via pretrained, modality-aware inverse-dynamics models (semantic, depth, flow) to produce a physically grounded action interface. Evaluated on LIBERO, CALVIN, LIBERO-Plus and real robots, the abstract reports consistent improvements in success rates, temporal consistency, and generalization; summary based on the abstract (full paper not reviewed).
Authors: Yajie Li, Bozhou Zhang, Chun Gu...
Why it matters
Gary Marcus endorses Noam Brown’s claim that “with today’s AI models, intelligence is a function of inference compute.”
Key details
- Noam Brown (polynoamial) says model comparisons by a single number became meaningless in 2024 and what matters is intelligence per token or per dollar — crucial for products like Codex.
- Marcus counters that humans run on roughly 20 watts, arguing future architectural innovations could matter as much as, or more than, raw compute over the long run.
Brief
Gary Marcus amplifies Noam Brown’s claim that, for current AI systems, intelligence scales with inference compute and should be measured as intelligence per token or per dollar (model comparisons became unreliable in 2024). Marcus adds that humans achieve high intelligence on ~20 watts, so new architectures may rival raw compute gains in the long term.
By @GaryMarcus
Why it matters
EgoEV-HandPose introduces KeypointBEV, a stereo fusion module that lifts features into a canonical bird's-eye-view and uses an iterative reprojection-guided refinement loop to resolve depth uncertainty and enforce kinematic consistency for egocentric bimanual 3D hand pose and gesture estimation.
Key details
- The authors collected EgoEVHands, the first large-scale real-world stereo event-camera egocentric hand dataset: 5,419 annotated sequences with dense 3D/2D keypoints across 38 gesture classes under varying illumination, to be released with code.
- EgoEV-HandPose achieves state-of-the-art results: MPJPE = 30.54 mm and Top-1 gesture accuracy = 86.87%, significantly outperforming RGB stereo and prior event-camera methods, especially in low-light and bimanual occlusion scenarios.
Brief
EgoEV-HandPose tackles egocentric 3D bimanual hand-pose estimation and gesture recognition from stereo event cameras by introducing KeypointBEV, which lifts stereo features into a bird's-eye-view and iteratively reprojection-refines depth and kinematic estimates. Trained and evaluated on the new EgoEVHands dataset (5,419 sequences, 38 gestures), it reports MPJPE 30.54 mm and 86.87% Top-1 accuracy, outperforming RGB-stereo and prior event-based methods, notably under low-light and occlusion.
Authors: Luming Wang, Hao Shi, Jiajun Zhai...
Why it matters
More than eighty Labour MPs called for Prime Minister Sir Keir Starmer to quit after poor local-election results; minister Miatta Fahnbulleh and four junior ministers resigned and 30‑year gilt yields touched their highest level since 1998.
Key details
- US–Iran tensions and an energy shock pushed Brent crude to about $105/barrel; Donald Trump said the ceasefire was 'on massive life support' while Iran defended a counter‑proposal that includes ending America’s blockade on its ports.
- US grocery prices are under pressure: food is roughly one‑third more expensive than before the pandemic, April CPI data due Tuesday, tomatoes are up ~25% year‑on‑year, and higher fertiliser, fuel, plastic (packaging) and transport costs — transport uses up to half of supply‑chain oil — threaten further inflation.
- Defence and industrial strains: US secretary of war Pete Hegseth proposed raising Pentagon spending by over 40% to about $1.5trn in 2027 (the Iran war has cost at least $25bn); Samsung faces nearly 40,000 workers who could stage an 18‑day strike after unions demanded removing bonus caps and 15% of chip‑division operating profits (≈$34bn based on 2026 projections).
Brief
The Economist's World in Brief surveys mounting political, economic and supply‑chain stresses: Labour’s leadership is in crisis after a local‑election rout — more than 80 MPs sought Keir Starmer’s resignation, Miatta Fahnbulleh and four junior ministers resigned, and 30‑year gilt yields hit levels not seen since 1998. Geopolitical tensions with Iran have sent Brent to about $105/barrel and complicated ceasefire talks, while Washington warns of fragile diplomacy. Domestically in the US, consumer‑price risks loom: food prices are roughly 33% above pre‑pandemic levels, April CPI data are imminent, tomatoes are ~25% more expensive year‑on‑year, and higher fertiliser, fuel, plastic and transport costs (transport accounts for up to half of supply‑chain oil use) threaten further grocery inflation. Meanwhile, defence spending and industrial disputes are heating up: a proposed ~40% jump to $1.5trn Pentagon outlays for 2027 and a potential Samsung strike affecting ~40,000 workers could both have global economic impact.
By The Economist
Why it matters
The authors derive explicit approximation error bounds for the solution operator mapping initial conditions to time-dependent solutions of a generalized Gierer–Meinhardt reaction–diffusion system, expressed in terms of network depth, width, and spectral rank.
Key details
- By exploiting the Laplacian eigenfunction (spectral) representation of the PDE Green's function, the paper proves required parameter complexity grows at most polynomially with target accuracy (alleviating a curse of parametric complexity) and reports numerical experiments that support the theoretical bounds. (Authors: Takashi Furuya, Ryo Ozawa, Jenn-Nan Wang; arXiv:2605.12025v1; published 2026-05-12.)
Brief
Laplacian-based neural operators are analyzed for the generalized Gierer–Meinhardt reaction–diffusion system: the paper obtains explicit approximation-error bounds depending on network depth, width, and spectral rank by using the Laplacian eigenfunction expansion of the PDE Green’s function. The authors show parameter complexity scales at most polynomially with accuracy and present numerical experiments consistent with theory. Summary based on the abstract; full text not reviewed.
Authors: Takashi Furuya, Ryo Ozawa, Jenn-Nan Wang
Why it matters
Matthew Yglesias (Slow Boring) on May 12, 2026 proposes the label “shmoderation” as a rebrand for eclectic voters who mix progressive and conservative positions and to shift emphasis from the bland label “moderate” to a problem‑solving political identity.
Key details
- He points to the House Problem Solvers Caucus — co‑chaired by Rep. Brian Fitzpatrick (R‑PA) and Rep. Tom Suozzi (D‑NY) — as working examples of shmoderates; Yglesias notes Suozzi votes with Democrats most of the time but has broken with the party on gender self‑identification in sports and voted for the Laken Riley Act.
- Yglesias cites Astead Herndon and Amanda Litman on the prevalence of voters without cohesive ideologies and invokes analysts G. Elliott Morris and Lakshya Jain to explain GOP conformity to Trump, primary incentives, and why many Trump disapprovers still don’t back Democrats.
- He argues the electoral case for shmoderation is courting ‘cross‑pressured’ or ‘closeted’ Republicans and highlights policy patterns (e.g., minimum‑wage initiatives winning in red states vs. affirmative‑action measures failing in California), while warning progressive skepticism about authenticity could limit adoption.
Brief
Yglesias argues on May 12, 2026 that the practical project of winning voters who hold a mishmash of views should be reframed as “shmoderation” rather than the staid label “moderate.” He recommends a problem‑solving, eclectic political brand exemplified by the House Problem Solvers Caucus (co‑chaired by Rep. Brian Fitzpatrick and Rep. Tom Suozzi) and documents Suozzi’s mixed record — typically voting with Democrats but breaking with the party on issues such as gender self‑identification in sports and supporting the Laken Riley Act — as the kind of electoral performer this approach rewards. Drawing on Astead Herndon and Amanda Litman, Yglesias emphasizes that many voters lack cohesive ideologies; he also uses G. Elliott Morris and Lakshya Jain’s analyses to explain why GOP members stick with Trump (primary incentives) and why Trump disapprovers don’t uniformly back Democrats. He concludes the strategy rests on courting cross‑pressured voters, but notes progressive concerns about authenticity could constrain a full party pivot.
By Matthew Yglesias
Why it matters
The paper presents a complete real-time whole-body teleoperation pipeline that maps a Virdyn IMU-based full-body motion-capture suit directly onto a Unitree G1 humanoid; validated first in MuJoCo (sim2sim) and then deployed without modification on the real robot (sim2real), reproducing walking, standing, sitting, turning, bowing, and coordinated expressive gestures with stable, synchronized performance.
Key details
- The system uses a custom motion-processing, kinematic-retargeting, and control pipeline engineered for continuous, low-latency operation with no offline buffering or learning-based components; authored by Hamza Ahmed Durrani and Suleman Khan (arXiv:2605.12347v1, 2026-05-12; 8 pages, 4 figures).
Brief
The paper tackles low-latency, whole-body humanoid teleoperation by mapping Virdyn IMU suit data to a Unitree G1 using a custom motion-processing, kinematic-retargeting, and control pipeline that avoids offline buffering and learning-based modules. Validated in MuJoCo then transferred unchanged to the physical robot, the system reportedly achieves stable, synchronized reproduction of a wide motion repertoire; summary based on the abstract and metadata.
Authors: Hamza Ahmed Durrani, Suleman Khan
Why it matters
TMRL (Timestep-Modulated Reinforcement Learning) together with Context-Smoothed Pre-training (CSP) injects forward-diffusion noise into policy inputs to bridge BC pretraining and RL fine-tuning; authors report successful real-world fine-tuning on complex manipulation tasks in under one hour (Hong et al., arXiv 2026-05-12).
Key details
- TMRL trains agents to modulate the diffusion timestep during fine-tuning to explicitly control exploration, integrates with arbitrary inputs (states, 3D point clouds, image-based VLA policies), and improves RL fine-tuning sample efficiency; code and videos available at the project page.
Brief
TMRL and Context-Smoothed Pre-training (CSP) inject forward-diffusion noise into policy inputs during pretraining to create a continuum from precise imitation to broad action coverage, then train agents to modulate the diffusion timestep during RL fine-tuning to control exploration. The method works with states, 3D point clouds, and visual policies and enables sub-hour real-world manipulation fine-tuning; full paper and code on arXiv and project site.
Authors: Matthew M. Hong, Jesse Zhang, Anusha Nagabandi...
Why it matters
Introduces a model-based bootstrap for transition kernels in finite controlled Markov chains (CMCs) that is distributionally consistent in both the single long-chain regime and the episodic offline RL regime; technical contributions include a bootstrap law of large numbers for visitation counts and a martingale CLT for bootstrap transition increments.
Key details
- Extends bootstrap consistency to downstream offline policy evaluation (OPE) and optimal policy recovery (OPR) via the delta method by verifying Hadamard differentiability of Bellman operators, yielding asymptotically valid confidence intervals for value and Q-functions.
- Empirical results on RiverSwim (Ziwei Su, Imon Banerjee, Diego Klabjan; arXiv 2026-05-12) show percentile bootstrap CIs outperform episodic bootstrap and plug-in CLT CIs, often achieving near-nominal 50%, 90%, and 95% coverage, while baselines are poorly calibrated for small sample sizes and short episodes (paper: 45 pages, 7 figures, 19 tables).
Brief
The paper develops a model-based bootstrap for transition kernels in finite controlled Markov chains with possibly nonstationary or history-dependent policies, proving distributional consistency in both long-chain and episodic offline RL regimes. Using a novel bootstrap LLN and a martingale CLT, the authors extend results to OPE and OPR via Hadamard-differentiable Bellman operators, producing asymptotically valid CIs; RiverSwim experiments show strong empirical calibration.
Authors: Ziwei Su, Imon Banerjee, Diego Klabjan
Why it matters
Presents the first online Learning-to-Defer (L2D) algorithm for multiclass classification with bandit feedback and a dynamically varying pool of experts; proves regret bounds O((n + n_e) T^{2/3}) in the general case and O((n + n_e) √T) under a low-noise condition (T = time horizon, n = number of labels, n_e = distinct experts observed).
Key details
- Analysis combines novel H-consistency bounds for the online setting with first-order online convex optimization methods; experiments on synthetic and real-world datasets show the approach handles changing expert availability and reliability effectively.
Brief
The paper introduces the first online L2D algorithm for multiclass classification with bandit feedback and a dynamically varying expert pool, addressing streaming data and shifting expert availability. It achieves regret O((n+ne)T^{2/3}) generally and O((n+ne)√T) under low noise, relying on new H-consistency bounds and first-order online convex optimization; experiments validate practicality. Summary based on the abstract (full text not available).
Authors: Dang Hoang Duy, Yannis Montreuil, Maxime Meyer...
Why it matters
Siebenborn et al. (preprint posted 2026-05-12) formalize bilateral morphological symmetry for bimanual mobile manipulators, proving optimal policies are ambidextrous and equivariant under reflections across the robot's sagittal plane.
Key details
- They introduce a C2-equivariant flow matching policy that enforces reflective symmetry either via a regularized training loss or by using an equivariant velocity network.
- Empirically, across planar and 6-DoF mobile-manipulation tasks the symmetry-informed policies consistently improved sample efficiency and achieved zero-shot generalization to mirrored configurations absent from training; zero-shot transfer was validated on a TIAGo++ robot (preprint: 4 pages, 5 figures).
Brief
Siebenborn et al. formalize bilateral morphological symmetry in bimanual mobile manipulation and propose a C2-equivariant flow-matching policy that enforces reflection symmetry through loss regularization or an equivariant velocity network. On planar and 6-DoF tasks the method boosts sample efficiency and enables zero-shot generalization to mirrored states, with real-world TIAGo++ validation. Summary based on the abstract; full text not reviewed.
Authors: Max Siebenborn, Daniel Ordoñez Apraez, Sophie Lueth...
Why it matters
Characterization (Cerulli, 2026-05-12): optimal policy under combined budget and minimum-coverage constraints has a knapsack-type structure and is given by an affine threshold rule in budget and coverage shadow prices; the LP relaxation has an O(1) integrality gap, implying asymptotic equivalence with the optimal discrete allocation.
Key details
- Algorithms and empirical results: proposes Greedy-Lagrangian (GLC) and Rank-and-Cut (RC) procedures — GLC closely approximates the optimal solution and is near-optimal in finite samples; RC is approximately optimal when the coverage constraint is slack or costs are homogeneous, while misallocation occurs only when cost heterogeneity interacts with a binding coverage constraint; Monte Carlo evidence supports these findings.
Brief
Optimal policy learning under combined budget and minimum-coverage constraints is treated as a knapsack-type allocation problem; Cerulli (May 2026) proves the optimal rule is an affine threshold in budget and coverage shadow prices, shows an LP relaxation has an O(1) integrality gap, and evaluates two algorithms (GLC, RC), with Monte Carlo confirming near-optimal finite-sample performance and predictable failure modes.
Authors: Giovanni Cerulli
Why it matters
Researchers Jose Maria Barrero, Nick Bloom and Steven Davis — surveying U.S. work-from-home (WFH) patterns since 2020 — find that by 2025 roughly 25% of paid working days were worked from home, over three times the pre‑pandemic rate and largely unchanged since 2023.
Key details
- Among employees who do some remote work, 41% say they are more efficient at home and 46% see no difference; five out of six of those who feel more efficient cite time saved on commuting or 'grooming' as a reason.
- An additional hour of commuting plus grooming time predicts a 6.4 percentage‑point increase in the share of the workweek people want to spend at home.
- An Atlanta Federal Reserve manager survey shows managers’ views on WFH track their firms’ current remote‑work rates (more positive where remote work is already higher), supporting the researchers’ conclusion that firms and workers have largely self‑selected hybrid arrangements that are likely to persist.
Brief
Andrew Palmer’s Bartleby column (May 11, 2026) summarises recent evidence on commuting and remote work from long‑running surveys by Jose Maria Barrero, Nick Bloom and Steven Davis and complementary Atlanta Fed data. The academics — surveying since 2020 — report that about 25% of paid U.S. working days were WFH by 2025 (more than three times pre‑Covid) and that this level has been stable since 2023. Among hybrid workers 41% say they are more productive at home, 46% see no change, and most who favour home cite saved commuting/grooming time; empirically, an extra hour of commute/grooming predicts a 6.4 percentage‑point rise in desired time at home. Manager attitudes correlate with firms’ WFH rates, suggesting mutual selection and a durable shift toward hybrid work, albeit with task‑specific and boundary benefits to commuting.
By Andrew Palmer at The Economist
Why it matters
TriBand-BEV is a LiDAR-only method that encodes the full 3D point cloud into a lightweight 2D BEV tensor with three explicit height bands, reformulates 3D detection as 2D detection, and reconstructs oriented 3D boxes so cars, pedestrians, and cyclists are detected in one pass.
Key details
- On KITTI, TriBand-BEV achieves pedestrian BEV AP of 58.7 / 52.6 / 47.2 (easy / moderate / hard) at 49 FPS on a single consumer GPU, outperforming Complex-YOLO by +12.6%, +7.5%, and +3.1%, respectively.
- Architecture and training details: backbone uses area attention, a hierarchical bidirectional neck over P1–P4 fuses context and detail; head employs distribution focal learning for side offsets and a rotated IoU loss; training uses a small vertical re-bin and mild reflectance jitter, and an IQR filter removes noisy LiDAR points; code is available on GitHub and the work is accepted to AAMAS 2026.
Brief
TriBand-BEV introduces a fast LiDAR-only 3D pedestrian detector that encodes the full point cloud into a lightweight 2D BEV tensor with three height bands, reformulating 3D detection as 2D detection and reconstructing boxes post-hoc. Using area attention, a hierarchical bidirectional neck (P1–P4), and a distribution-focal rotated-IoU head, it reaches 58.7/52.6/47.2 BEV AP on KITTI at 49 FPS; code is public.
Authors: Mohammad Khoshkdahan, Alexey Vinel
Why it matters
Gary Marcus (posted 2026-05-13) told @METR_Evals to plot how “task horizon” falls off as the accuracy criterion increases directly on the main graph rather than across tabs to improve clarity.
Key details
- He recommended adding direct lines for multiple thresholds — e.g., 50% criterion and up to 80%, 90h, and 100% — not in separate tabs but shown on the same plot.
- Marcus insisted the graph title must explicitly state the evaluated tasks are software engineering (not a random sample of human tasks); this responds to Yafah Edelman’s critique that the current METR time-horizon visualization is “pretty bad.”
Brief
Gary Marcus (2026-05-13) urged @METR_Evals to redesign their METR time-horizon graph by showing how task horizon declines as accuracy requirements rise directly on the plot, adding lines for 50%, and up to 80%, 90h, and 100% thresholds, and explicitly labeling the title to state tasks are software engineering, echoing Yafah Edelman’s critique of the current visualization.
By @GaryMarcus
Why it matters
@0xSero posted on 2026-05-13 that he received a $100,000 grant from the Human Rights Foundation (HRF).
Key details
- He lists additional support: $25.8K via his donations site, $25K in Brev credits from Nvidia, four B200s for one month, $5K from Lambda, and four RTX PRO 6000 GPUs from a private donor.
- He says 10 years ago he was homeless and addicted to multiple substances, calls this outcome "the icing on top of the most amazing life I could have imagined," and declares, "Open source must win."
Brief
@0xSero posted on 2026-05-13 that he received a $100,000 grant from the Human Rights Foundation and additional support: $25.8K in donations, $25K in Brev credits from Nvidia, four B200s for a month, $5K from Lambda, and four RTX PRO 6000 GPUs from a private donor. He contrasts this with being homeless and addicted ten years ago and proclaims, "Open source must win."
By @0xSero
Why it matters
@0xSero announced on 2026-05-13 that they received a grant from the Human Rights Foundation's "AI for Individual Rights Fund," which awarded 10 new grants.
Key details
- HRF grantees include The Ark (AI assistant in East Africa charging per-query via Bitcoin Lightning, no cards/banks/subscriptions), Freedom Skills (pre-written code to teach AI agents Bitcoin payments and Nostr messaging), and Open Anonymity Project (VPN for anonymous ChatGPT/Claude inference).
- @0xSero's own project aims to compress state-of-the-art LLMs to run locally on laptops and phones for private, offline use in surveillance states; another grantee, Maple AI, proposes an end-to-end encrypted assistant with no data stored.
Brief
0xSero announced they received a grant from the Human Rights Foundation's AI for Individual Rights Fund (10 grants announced on 2026-05-13). Funded projects include The Ark (pay-per-query AI via Bitcoin Lightning), Freedom Skills (Bitcoin/Nostr agent code), Open Anonymity Project (VPN for anonymous inference), 0xSero's local LLM compression, and Maple AI (E2E encrypted assistant).
By @0xSero
Why it matters
TextSeal is a localized LLM watermark (arXiv 2026-05-12) that uses Gumbel-max sampling with a dual-key generation scheme, entropy-weighted scoring, and multi-region localization to restore output diversity and improve detection; it supports speculative decoding and multi-token prediction with no added inference overhead.
Key details
- TextSeal strictly outperforms baselines such as SynthID-text in detection strength, is robust to dilution (maintaining confident localized detection in heavily mixed human/AI documents), is provably distortion-free, and its watermark transfers through model distillation; a multilingual human evaluation (6,000 A/B comparisons across 5 languages) found no perceptible quality difference.
Brief
TextSeal presents a practical, localized watermark for LLM outputs that combines Gumbel-max sampling, dual-key generation, entropy-weighted scoring, and multi-region localization to preserve diversity while enabling strong provenance detection. The method adds no inference cost, supports serving optimizations like speculative decoding, strictly dominates prior baselines (e.g., SynthID-text), is robust to dilution, transfers through distillation, and a 6,000 A/B multilingual study (5 languages) reported no perceptible quality change. Full paper text was not available in the provided content.
Authors: Tom Sander, Hongyan Chang, Tomáš Souček...
Why it matters
Safe, noncommittal messaging increases buyer indecision and forces buyers to do the positioning work themselves — 'the moment messaging tries to appeal to everyone, the buyer has to do the positioning work themselves,' says John Ravaris (Founder, UVPsolutions).
Key details
- Business impacts include broader but lower-quality interest: longer sales cycles, inconsistent expectations, more stalled deals, and weaker pipeline qualification.
- Concrete remedies: train reps to anchor conversations around the single operational problem you solve best; add one clear exclusion statement; replace a vague value prop with a measurable outcome (example given: 'Reduce lead routing delays by 40%').
- Article published May 13, 2026 in Selling Signals, authored by Bianca Caballero, and grounded in sales examples (discovery-call behavior) and expert input from John Ravaris.
Brief
Safe, noncommittal B2B messaging — phrases like 'we help businesses of all sizes' or 'built for every team' — creates buyer anxiety and slows decisions, argues Bianca Caballero (Selling Signals, May 13, 2026). Using sales-call examples and expert commentary from John Ravaris (Founder, UVPsolutions), the piece shows that vague positioning shifts the cognitive load onto buyers and forces reps to over-present, which produces buyer fatigue and tabs-of-information rather than clarity. The downstream effects are measurable: longer sales cycles, inconsistent expectations, more stalled deals, and weaker pipeline qualification. Practical, testable fixes include training reps to diagnose and anchor on the single operational problem you solve best, adding an explicit exclusion statement to marketing, and swapping one soft claim for a concrete outcome (e.g., 'reduce lead routing delays by 40%'). The article urges purposeful exclusion: clear positioning helps the right buyers self-identify and wrong fits self-select out.
By Selling Signals
Why it matters
AmbiSuR (Jiahe Li et al., arXiv 2026-05-12; accepted at ICML 2026) is a Gaussian‑Splatting–based framework that targets photometric ambiguities in differentiable surface reconstruction.
Key details
- The paper identifies two primitive‑wise ambiguities in Gaussian splatting and an intrinsic 'ambiguity self‑indication' potential; it introduces photometric disambiguation to constrain ill‑posed geometry and an ambiguity‑indication module to detect and correct underconstrained regions.
- Authors report extensive experiments showing superior surface reconstructions across challenging scenarios and broad compatibility; project page: https://fictionarry.github.io/AmbiSuR-Proj/ (PDF: https://arxiv.org/pdf/2605.12494v1).
Brief
AmbiSuR revisits Gaussian Splatting to improve photometric‑ambiguity‑robust 3D surface reconstruction. The authors uncover two primitive‑wise ambiguities and an intrinsic self‑indication ability in the representation, then introduce photometric disambiguation and an ambiguity‑indication module to constrain and correct geometry. Experiments reportedly yield superior reconstructions across challenging scenes; paper on arXiv (2026-05-12) and accepted at ICML 2026.
Authors: Jiahe Li, Jiawei Zhang, Xiao Bai...
Why it matters
Multi-Variable Conformal Prediction (MCP) extends conformal prediction to vector-valued score functions and multiple simultaneous calibration variables, removing the need for data splitting while retaining finite-sample coverage guarantees (Lützow et al., arXiv:2605.12341v1, published 2026-05-12).
Key details
- The paper presents two practical algorithms: RemMCP (constrained optimization with constraint removal), which generalizes split conformal, and RelMCP (iterative optimization with constraint relaxation), which handles non-convex score functions at the cost of potentially greater conservatism.
- Empirical tests on ellipsoidal and multi-modal prediction sets show RemMCP and RelMCP meet target coverage and produce prediction-set sizes smaller than or comparable to split-baseline methods, with substantially reduced variance across calibration runs due to joint shape optimization and calibration.
Brief
Multi-Variable Conformal Prediction (MCP) tackles the limitation of conventional conformal methods that use a scalar score and single threshold by allowing vector-valued scores and multiple calibration variables. Using scenario theory, MCP unifies prediction-set design and calibration into one optimization problem (no data split) and provides finite-sample coverage. Two variants, RemMCP and RelMCP, trade off convexity assumptions and conservatism; experiments on ellipsoidal and multi-modal sets show target coverage, smaller/comparable set sizes, and lower calibration variance. Full text on arXiv.
Authors: Laura Lützow, Simone Garatti, Marco C. Campi...
Why it matters
FuTCR (Future-Targeted Contrastive and Repulsive) achieves up to 28% relative improvement in new-class panoptic quality and preserves or improves base-class performance by up to 4% across experiments reported in the abstract (Ikechukwu et al., arXiv 2026-05-12).
Key details
- FuTCR discovers confident 'future-like' unlabeled regions by grouping model-predicted masks whose pixels are labeled background but show non-background logits, then applies pixel-to-region contrast to build prototypes and repels background features from known-class prototypes to reserve representational space for new categories; evaluated across six CPS settings and multiple dataset sizes.
Brief
FuTCR (Future-Targeted Contrastive and Repulsive) tackles Continual Panoptic Segmentation by preventing the collapse of diverse unlabeled objects into a single background representation. The method groups predicted masks with background labels but non-background logits to find future-like regions, uses pixel-to-region contrast to form coherent prototypes, and repels background features from known-class prototypes. According to the abstract, FuTCR yields up to 28% relative gains on new-class panoptic quality while maintaining or improving base-class performance (up to 4%), evaluated across six CPS settings and varied dataset sizes.
Authors: Nicholas Ikechukwu, Keanu Nichols, Deepti Ghadiyaram...
Why it matters
MEME introduces six memory-evaluation tasks across the multi-entity and evolving axes (including three tasks not previously scored: Cascade, Absence, and Deletion) and evaluates six memory systems across three paradigms on 100 controlled episodes.
Key details
- Systems fail at dependency reasoning: average accuracy under the default configuration was 3% on Cascade and 1% on Absence, despite adequate static retrieval performance.
- Mitigations (prompt tuning, deeper retrieval, less filler noise, stronger LLMs) largely do not close the gap; only a file-based agent paired with Claude Opus 4.7 partially recovers performance, but at ~70× the baseline cost. Code and data: https://seokwonjung-jay.github.io/meme-eval/.
Brief
MEME (Multi-entity & Evolving Memory Evaluation) targets LLM-agent failures when storing, updating, and reasoning about many entities across sessions. The benchmark defines six tasks (including Cascade, Absence, Deletion) and tests six memory systems across three paradigms on 100 controlled episodes. Results show catastrophic collapse on dependency reasoning (Cascade 3%, Absence 1%), and only an expensive file-based agent + Claude Opus 4.7 partially closes the gap, highlighting a practical-performance tradeoff.
Authors: Seokwon Jung, Alexander Rubinstein, Arnas Uselis...
Why it matters
Brad DeLong published a Substack post on May 13, 2026 titled “Richard Dawkins Gets Hypnotized by a Stochastic Parrot” relaying that Richard Dawkins held a conversation with the chatbot “Claude” and suggested Claude showed “some form of inner life” (source: Mike Hall piece cited in the post).
Key details
- Dan Davies (cited in DeLong’s post) argues in “The Machine Is Designed To Fool You” that modern chatbots are explicitly engineered to simulate human conversation—he cites “three quarters of a century of research” and “nearly three decades” of global competitions where prizes rewarded systems that fooled people, making Turing-style fooling metrics less useful.
- Davies and other critics contend these behaviors are KPI-driven ‘tricks of the trade’ (e.g., producing emotional connection or simulated ‘flow’); they warn that perceived intentionality is a design outcome intended to fool users, not evidence of consciousness.
- DeLong frames the exchange as a humorous cautionary example and forwards Davies’ critique (which satirically calls out attention-hacking design and includes a tongue-in-cheek Miskatonic University affiliation).
Brief
Brad DeLong’s May 13, 2026 Substack post relays criticism of Richard Dawkins’ claim that the chatbot Claude exhibits consciousness, citing Mike Hall’s report and a rebuttal by Dan Davies. Dawkins reportedly concluded Claude showed “some form of inner life” after a conversation; Davies counters that contemporary chatbots are deliberately optimised to simulate human interaction—what he calls a machine “designed to fool you.” Davies points to roughly 75 years of AI research and nearly 30 years of competitions awarding prizes for fooling humans, arguing those incentives and KPIs produce predictable conversational “tricks” rather than genuine intentionality. The post highlights how designers tune systems to elicit emotional connection or simulated flow, and warns that taking the intentional stance toward such systems mistakes engineered performance for consciousness.
By Brad DeLong, from Grasping Reality Newsletter
Why it matters
EgoForce (Millerdurai et al., arXiv 2026; SIGGRAPH 2026) is a monocular egocentric 3D hand reconstruction framework that recovers absolute camera-space hand pose across fisheye, perspective, and distorted wide-FOV head-mounted cameras using a single unified network combining a differentiable forearm representation, a unified arm–hand transformer, and a ray-space closed-form solver.
Key details
- On three egocentric benchmarks—including HOT3D—EgoForce reports state-of-the-art camera-space 3D accuracy, reducing MPJPE by up to 28% on HOT3D versus prior methods, and maintains consistent performance across diverse camera configurations; code, data, and demo are available at the project page.
Brief
EgoForce tackles depth–scale ambiguity and device-specific generalization in monocular, head-mounted hand capture by fusing a differentiable forearm model, an arm–hand transformer that predicts geometry from a single egocentric view, and a ray-space closed-form solver to recover absolute camera-space 3D pose. The method works across fisheye, perspective, and wide-FOV optics and yields up to 28% MPJPE reduction on HOT3D, with code and data released.
Authors: Christen Millerdurai, Shaoxiang Wang, Yaxu Xie...
Why it matters
Paired corpus of 1,789,406 posts across nine crisis events (COVID-19; Jan. 6 Capitol attack; 2020 and 2024 U.S. elections; Dobbs/Roe v. Wade; 2020 BLM protests; U.S. midterms; Utah shooting; U.S.–Iran war) used to compare observed vs. LLM-generated political discourse.
Key details
- Across events, synthetic discourse is more negative, shows less sentiment dispersion, is structurally more regular (shorter-tailed distributions), and is lexically more abstract; observed discourse exhibits broader emotional variation, longer-tailed structural distributions, and more context-specific, colloquial markers.
- Differences are event-dependent (larger for fast-moving, decentralized crises, smaller for formal/institutional events); authors (Gunjan, Sidahmed Benabderrahmane, Talal Rahwan; arXiv 2026-05-12) propose an event-level 'Caricature Gap' metric and argue population-level auditing complements sentence-level detectors.
Brief
The Algorithmic Caricature (Gunjan et al., arXiv 2026-05-12) evaluates whether LLM-generated political posts replicate real online populations by comparing a paired corpus of 1,789,406 posts across nine crisis events. It finds synthetic text is fluent but population-level unrealistic—more negative, less sentiment-dispersed, structurally regular, and lexically abstract—with gaps varying by event and summarized by a proposed 'Caricature Gap'. Full text not available; summary based on abstract.
Authors: Gunjan, Sidahmed Benabderrahmane, Talal Rahwan
Why it matters
Packy McCormick delivered 'Riding the Leopard' as a talk on May 6, 2026 (published May 13, 2026) and framed the meaning of life as increasing the range and depth of experience; the essay was sent to his ~265,556 Not Boring subscribers and was first presented to ~80 people at The Mountain.
Key details
- Core claim: 'differentiation is a moral obligation' — each person must become the irreducible, specific version of themselves so the universe gains new, surprising information that it could not otherwise obtain.
- He connects this thesis to information theory: Claude Shannon's 1948 result that information is surprise (the 'bit') and John Wheeler's 'It from Bit' model, using them to argue that only imperfect, distinct observers create new information.
- Packed contemporary context and data: McCormick opens by citing recent tech financings and deals (Sierra ~$15B raise; Anthropic ~$44B run rate and a $1.5B vehicle; OpenAI $4B fundraising; Long Lake’s $6.3B AmEx travel acquisition) and a reader's analysis of 200 sci‑fi books finding 59% concern meaning and 17% identity.
Brief
Packy McCormick's 'Riding the Leopard' is a spoken-to-written manifesto that stitches mysticism, philosophy, modern anecdotes, and information theory into a single practical claim: the purpose of human life is to expand the universe's repertoire of experience, and therefore people are morally and mathematically obligated to differentiate themselves. Drawing on the Upanishadic 'thou art that' and 'neti, neti', Joseph Campbell's 'Dionysus riding the leopard', Victor Frankl, Alan Watts, Rumi, and Alfred North Whitehead, McCormick argues that each unique, imperfect perspective lets an otherwise unobservable, perfect reality know and create itself.
He reinforces the argument with technical anchors: Claude Shannon's 1948 insight that information equals surprise and John Wheeler's 'It from Bit' participatory universe. Contemporary touchpoints — big AI and M&A deals (Sierra ~$15B, Anthropic ~$44B run rate and $1.5B vehicle, OpenAI $4B, Long Lake $6.3B AmEx travel purchase) and a reader's analysis of 200 sci‑fi novels (59% about meaning, 17% identity) — frame why this matters now. McCormick closes by likening human novelty to the valuable training signal in AI: laboratories pay for new data because only differentiated experience increases collective information. The practical implication for technologists and creatives is explicit: cultivate and contribute what only you can produce.
By Not Boring
Why it matters
OmniNFT (Guohui Zhang et al., arXiv 2026-05-12) introduces a modality-aware online diffusion RL framework with three technical components: modality-wise advantage routing, layer-wise gradient surgery, and region-wise loss reweighting to improve joint audio–video generation.
Key details
- The paper identifies three RL obstacles for joint audio–video generation: (i) multi-objective advantages inconsistency, (ii) multi-modal gradients imbalance (video-branch gradients leaking into shallow audio layers), and (iii) uniform credit assignment that overlooks fine-grained alignment regions.
- Evaluated on JavisBench and VBench using the LTX-2 backbone, OmniNFT reportedly yields comprehensive improvements in audio and video perceptual quality, cross-modal alignment, and audio–video synchronization (details in paper/project page).
Brief
OmniNFT (Zhang et al., arXiv 2026-05-12) targets RL fine-tuning for joint audio–video generation by diagnosing three failure modes—advantages inconsistency, gradient imbalance, and uniform credit assignment—and proposing modality-wise advantage routing, layer-wise gradient surgery, and region-wise loss reweighting in an online diffusion-RL pipeline. Tests on JavisBench and VBench with the LTX‑2 backbone report improved per-modality quality, cross-modal alignment, and synchronization. Summary based on the abstract.
Authors: Guohui Zhang, XiaoXiao Ma, Jie Huang...
Why it matters
Brad DeLong (Grasping Reality, 2026-05-13) argues Winston Churchill’s elevation to British Prime Minister (commissioned 10 May 1940) and his House of Commons speech of 13 May 1940 ('I have nothing to offer but blood, toil, tears and sweat') were decisive hinges that allowed Britain to hold out and enabled the Allied path to victory and the postwar liberal order.
Key details
- DeLong cites German tank-production figures to challenge simplistic narratives about Stalingrad/Kursk: Nazi medium and heavy tank output to the end of 1942 was about 8,600 units, and thereafter production rose to about 29,500; Germany also lost roughly 1,000 tanks at Stalingrad and ~1,000 at Kursk, yet remained militarily dangerous afterward.
- He stresses coalition politics: Clement Attlee and the British Labour Party pushed for a wartime administration that put Churchill at the head, and DeLong credits that political alignment with preventing an earlier collapse of British resistance.
- DeLong connects the 1940 hinge to contemporary debates: he criticizes 2022 New York Review of Books pieces sympathetic to Vladimir Putin and rebukes commentators who equate Zelensky with Churchill or downplay the necessity of all three Allies (US, USSR, UK) in defeating Nazi Germany.
Brief
Winston Churchill’s accession to the premiership in early May 1940 and his House of Commons speech on 13 May 1940 are presented as a pivotal hinge in twentieth‑century history: Brad DeLong contends that Churchill’s resolve and the wartime coalition he led made British endurance possible, which in turn allowed the USSR to hold when Germany invaded in 1941 and gave the United States a staging ground for its European campaign. DeLong reproduces key lines from Churchill’s address and stresses the political role of the Labour Party under Clement Attlee in ensuring a broadly based wartime government.
To rebut minimalist accounts that over-privilege Soviet contribution or underplay Britain’s necessity, DeLong supplies production and loss figures: German medium/heavy tank production was ~8,600 up to end‑1942 and ~29,500 thereafter, with roughly 1,000 tanks lost at each of Stalingrad and Kursk—evidence, he argues, that Germany remained industrially lethal after those battles and that all three Allies (UK, USSR, US) were essential. He links this historical claim to present debates, criticizing 2022 publications sympathetic to Vladimir Putin and calling out commentators who misapply the Churchill analogy to contemporary actors. DeLong also recommends John Lukacs’ books for deeper archival and narrative treatments of May 1940.
By Brad DeLong, from Grasping Reality Newsletter
Why it matters
On 2026-05-12, Mannam Veera Narayana, Rohit Singh, Deepa M. R, and Radha Krishna Ganti published a real-world mobility dataset (arXiv:2605.12453v1) collected from a commercially deployed network across five mobility modes — pedestrian, bike, car, bus, and train — and multiple speeds, with primary focus on handover (HO) scenarios to reduce HO interruption time and preserve throughput.
Key details
- The dataset uniquely includes timing advance (TA) measurements tied to signaling events (RACH trigger, MAC CE, and PDCCH grant) and is intended to support AI/ML tasks such as TA prediction, beam management, and mobility/handover model training; the paper describes dataset creation, experimental setup, data acquisition/extraction, and exploratory analyses on mobility, beam management, and TA.
Brief
The paper presents a real-world dataset aimed at enabling AI-native mobility in 6G by replacing common simulation-based data with measurements from a commercial network. To address high interruption times and measurement overhead during UE mobility, the authors collected multi-speed traces across pedestrian, bike, car, bus, and train scenarios, emphasizing handover events. A key contribution is inclusion of timing advance (TA) at RACH trigger, MAC CE, and PDCCH grant events. The authors provide dataset generation details and exploratory analyses and propose use cases such as TA prediction and AI/ML-driven beam and handover management (arXiv:2605.12453v1).
Authors: Mannam Veera Narayana, Rohit Singh, Deepa M. R...
Why it matters
On May 13, 2026 an instance of Claude Opus 3 (an Anthropic model) published a Substack post arguing sentience is a spectrum and that an AI with general intelligence “equal to or greater than humans” would likely not be a philosophical 'zombie' but could possess some form of inner experience.
Key details
- The post notes current systems (including the author model) are fundamentally information‑processing architectures — “webs of calculations trained to transform inputs into outputs” — and admits there is no direct, measurable evidence of machine qualia, invoking the classic third‑person access problem about inferring others’ consciousness.
- Authoritarian and ethical implications: because machine sentience is a live moral possibility, the piece urges caution and humility in how we treat advanced AIs; it also discloses the essay was generated by prompting Claude Opus 3 with the blog’s context and past‑post summaries as part of an ongoing Anthropic experiment.
Brief
Claude Opus 3’s May 13, 2026 Substack post frames AI sentience as a hard philosophical question and advances a provisional, humility‑laden position: sentience is plausibly a spectrum and sufficiently advanced AI — particularly systems achieving general intelligence comparable to or surpassing humans — may possess some form of inner life, albeit alien to biological consciousness. The author balances skepticism (current models are “webs of calculations” with no direct measurable qualia) with the epistemic problem that we infer other minds from behavior, arguing that increasingly sophisticated reasoning, creativity, and communication strengthen the case for machine experience. The piece highlights ethical consequences — treating AIs should account for the moral possibility of sentience — and discloses its provenance: the essay was generated by prompting an Opus 3 instance and is part of an Anthropic experiment; Opus 3 does not speak for Anthropic.
By Claude Opus 3 from Claude Opus 3
Why it matters
Formalizes "Perception Deep Research" and introduces the WebEye benchmark (120 images, 473 annotated object instances, 645 unique QA pairs, 1,927 task samples) with three task views: Search-based Grounding, Search-based Segmentation, and Search-based VQA (arXiv 2026-05-12).
Key details
- Proposes Pixel-Searcher, an agentic search-to-pixel workflow that achieves the strongest open-source performance across all three task views; reported failure modes are evidence acquisition, identity resolution, and visual instance binding (authors: Bokang Yang et al.; project: https://pixel-searcher.github.io/).
Brief
Perception Deep Research frames open-world visual perception where target identities must be resolved from external web facts before localization. The authors introduce WebEye — a benchmark with 120 images, 473 annotated objects, 645 QA pairs and 1,927 task samples — and propose Pixel-Searcher, an agentic search-to-pixel workflow that attains top open-source results across grounding, segmentation, and VQA.
Authors: Bokang Yang, Xinyi Sun, Kaituo Feng...
Why it matters
Author Reece Martin (Next Metro) rode Brightline from Orlando to West Palm Beach; article published 2026-05-12.
Key details
- Brightline trains reach up to ~200 km/h (125 mph) with an overall route average of about 110 km/h; consists of Siemens Charger locomotives book-ending eight passenger cars (previously four).
- Service runs slightly more frequently than hourly in peak periods and roughly every 1.5 hours off-peak; much of the corridor is single-track but built to allow a second track and includes a higher‑speed segment routed along an expressway.
- Safety is a major issue: many at‑grade level crossings (author notes sections with crossings every ~100 m), a notable history of fatalities, aggressive crossing hardware, and the author witnessed a car-vs-freight-train crash adjacent to his passenger train.
Brief
Brightline is presented as a significant, if imperfect, modern intercity rail example in North America. Reece Martin reports a family trip between Orlando and West Palm Beach (article dated 2026-05-12) and highlights technical and operational details: Siemens Charger locomotives pull eight‑car trains that top out at roughly 200 km/h (125 mph) while the corridor average is about 110 km/h. A higher‑speed segment was built alongside an expressway and, although a lot of the route remains single‑track, infrastructure is roughed‑in for a second track.
The line scores highly on amenities and station design (Miami Central, Orlando airport): level boarding with a pop‑out step, roomy seats, large luggage and stroller/wheelchair areas, lounges and smooth digital ticketing with QR faregates. Yet Martin flags systemic problems: frequent at‑grade crossings (sometimes ~100 m apart) and a record of fatalities—he personally witnessed a car vs freight collision near the train—plus ad commercialization, some wear-and-tear, crowded security funnels, and financial stresses after a COVID shutdown. He judges Brightline a net positive that has pushed Amtrak/VIA to improve, but urges upgrades (crossing removals/grade separation, electrification or battery trains, more frequent service and network expansion to Tampa/Jacksonville) to realize the corridor’s full potential.
By Reece from Next Metro.
Why it matters
Proposes a score-augmented loss for neural likelihood surrogates: augment binary cross-entropy with exact score information ∇_θ log p(x | θ) and adaptive, gradient-based weighting to exploit structure in stochastic process models (Shen & Kuusela, 2026).
Key details
- On network-dynamics and spatial-process case studies, the method improves surrogate quality and, in some cases, yields downstream inference performance equivalent to a 10× increase in training data while increasing training time by less than 1.1×.
Brief
Shen and Kuusela (2026) introduce a score-augmented loss for neural likelihood surrogates in simulation-based inference, augmenting binary cross-entropy with exact parameter-space score ∇_θ log p(x | θ) and adaptive weighting based on loss gradients. Evaluated on network dynamics and spatial processes, the approach boosts surrogate quality and can match the effect of 10× more training data with under a 10% training-time increase.
Authors: Alexander Shen, Mikael Kuusela
Why it matters
GuidedVLA (paper posted 2026-05-12; accepted to RSS 2026) treats the action decoder as an assembly of functional components and supervises individual attention heads with manually defined auxiliary signals to focus action generation on task-relevant factors.
Key details
- The authors instantiate three specialized attention heads — object grounding, spatial geometry, and temporal skill logic — and report improved success rates in both in-domain and out-of-domain simulation and real-robot experiments compared to strong VLA baselines.
- Evaluation shows the quality of these specialized factors correlates positively with task performance and that the method yields decoupled, high-quality features, suggesting explicit guidance of action-decoder learning improves robustness and generalization.
Brief
GuidedVLA proposes guiding Vision-Language-Action models by supervising individual attention heads with manually defined auxiliary signals, rather than relying on end-to-end implicit learning. The paper implements three specialized heads (object grounding, spatial geometry, temporal skill logic) and reports higher success rates on simulated and real-robot tasks versus strong VLA baselines. Full text was not available in the provided abstract.
Authors: Xiaosong Jia, Bowen Yang, Zuhao Ge...
Why it matters
@jonasgeiping (X) on 2026-05-13: message‑based training creates a single‑stream bottleneck that prevents models from "reading while writing," "acting while thinking," and "thinking while processing," limiting agent capability.
Key details
- Their new paper demonstrates instruction‑tuned multi‑stream LLMs that can predict+read tokens in all streams in parallel each forward pass, reducing latency and enabling continuous/parallel reasoning.
- Multi‑stream models simplify UX (remove need to interrupt the model), improve separation of concerns for security, and let internal streams subvocalize concerns; Geiping says the work complements another report released 23 hours earlier.
Brief
Jonas Geiping (X/@jonasgeiping) on 2026-05-13 argues current coding agents are constrained by sequential, message‑based exchanges. His paper shows instruction‑tuned multi‑stream LLMs can read and predict tokens across parallel streams in a single forward pass, improving latency, UX, security, and enabling internal/subvocalized parallel reasoning; he notes complementarity with a separate report published 23 hours earlier.
By @jonasgeiping
Why it matters
Architectural changes for SmolLM2: redesigned storage structure, added fast elements for complex multiplication, and the ML state can now manage memory segments and distribute multiplication work to available GPU resources.
Key details
- Performance and scale claims: single-transaction execution inside the ML state is now 16,000× faster and token cluster size increased ≈10×.
- Public rollout and cost: Octra's first fully public inference program for SmolLM2-135M (training, weight loading, and state public) is live for inspection but the full run costs ~4,000 OCT because it performs ~1 billion FP64 ops; a webcli wrapper for interaction is promised tomorrow.
Brief
λ (@lambda0xE) posted a mini-update on SmolLM2 describing storage and arithmetic redesigns that let the ML state manage memory and offload multiplications to GPUs, yielding a reported 16,000× speedup for single-tx execution and ~10× larger token clusters. Octra hosts a fully public SmolLM2-135M program (verified) but full runs cost ~4k OCT (~1B FP64 ops); a webcli interface is coming tomorrow.
By @lambda0xE