ArXiv

The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

Authors
Gunjan, Sidahmed Benabderrahmane, Talal Rahwan
Categories
cs.CL, cs.AI, cs.CY
arXiv
https://arxiv.org/abs/2605.12452v1
PDF
https://arxiv.org/pdf/2605.12452v1

Brief

The Algorithmic Caricature (Gunjan et al., arXiv 2026-05-12) evaluates whether LLM-generated political posts replicate real online populations by comparing a paired corpus of 1,789,406 posts across nine crisis events. It finds synthetic text is fluent but population-level unrealistic—more negative, less sentiment-dispersed, structurally regular, and lexically abstract—with gaps varying by event and summarized by a proposed 'Caricature Gap'. Full text not available; summary based on abstract.

Why it matters

Paired corpus of 1,789,406 posts across nine crisis events (COVID-19; Jan. 6 Capitol attack; 2020 and 2024 U.S. elections; Dobbs/Roe v. Wade; 2020 BLM protests; U.S. midterms; Utah shooting; U.S.–Iran war) used to compare observed vs. LLM-generated political discourse.

Key details

  • Across events, synthetic discourse is more negative, shows less sentiment dispersion, is structurally more regular (shorter-tailed distributions), and is lexically more abstract; observed discourse exhibits broader emotional variation, longer-tailed structural distributions, and more context-specific, colloquial markers.
  • Differences are event-dependent (larger for fast-moving, decentralized crises, smaller for formal/institutional events); authors (Gunjan, Sidahmed Benabderrahmane, Talal Rahwan; arXiv 2026-05-12) propose an event-level 'Caricature Gap' metric and argue population-level auditing complements sentence-level detectors.
Source evidence

Abstract

Large Language Models (LLMs) can generate fluent political text at scale, raising concerns about synthetic discourse during crises and social conflict. Existing AI-text detection often focuses on sentence-level cues such as perplexity, burstiness, or token irregularities, but these signals may weaken as generative systems improve. We instead adopt a Computational Social Science perspective and ask whether synthetic political discourse behaves like an observed online population. We construct a paired corpus of 1,789,406 posts across nine crisis events: COVID-19, the Jan. 6 Capitol attack, the 2020 and 2024 U.S. elections, Dobbs/Roe v. Wade, the 2020 BLM protests, U.S. midterms, the Utah shooting, and the U.S.-Iran war. For each event, we compare observed discourse from social platforms with synthetic discourse generated for the same context. We evaluate four dimensions: emotional intensity, structural regularity, lexical-ideological framing, and cross-event dependency, using mean gaps and dispersion evidence. Across events, synthetic discourse is fluent but population-level unrealistic. It is generally more negative and less dispersed in sentiment, structurally more regular, and lexically more abstract than observed discourse. Observed discourse instead shows broader emotional variation, longer-tailed structural distributions, and more context-specific, colloquial lexical markers. These differences are event-dependent: larger for fast-moving, decentralized crises and smaller for formal or institutionally mediated events. We summarize them with a simple event-level measure, the Caricature Gap. Our findings suggest that the main limitation of synthetic political discourse is not grammar or fluency, but reduced population realism. Population-level auditing complements traditional text-detection and provides a CSS framework for evaluating the social realism of generated discourse.