Twitter/X

Richard S.

Brief

Richard S. Sutton argues that ethics can be understood via reinforcement learning: agents receive numeric rewards (pleasure minus pain) each time step and aim to maximize value, the sum of future rewards. Rewards are a free, primary choice that define goals; values are derived from rewards plus environment dynamics and determine correct action (choosing highest immediate value rather than highest immediate reward). Because worlds are complex, exact value computation usually exceeds available knowledge, computation, and memory, so agents rely on partial online calculation or learned stored approximations—predictions of subsequent rewards—that function like intuitive senses of good and bad. In social settings agents must incorporate others' rewards; Sutton contends that a hedonic ultimate value is acceptable if it accounts for others (not selfish), and that moral terms have a predictive semantics: 'good' denotes what likely produces good outcomes for the individual on average, with heuristics serving as practical predictors.

Why it matters

Richard S. Sutton (Twitter/X thread published 2026-05-01) frames ethics through reinforcement learning: agents receive a numeric reward at each time step (pleasure minus pain) and seek to maximize value, defined formally as the sum of future rewards.

Key details

  • Rewards are primary and arbitrary; values are secondary and fully determined by the rewards plus environment dynamics — correct choices follow from values, not raw immediate reward (agent should pick action with highest immediate value, not highest immediate reward).
  • Exact calculation of values is often infeasible due to limits on knowledge, computation, and memory; agents instead use two practical methods—online calculation when knowledge permits, or learned, stored approximations (value predictions) built from experience.
  • In multi-agent/social contexts, individuals must account for others' rewards and values; Sutton argues the ultimate value may be hedonic (reward-based) so long as it isn't selfish, and that moral language is predictive: calling something 'good' means it will probably produce good outcomes for the individual on average, with heuristics serving as predictive approximations.
Reader · no content

No body text on file.

Open the original to read the full piece.