LessWrong

If digital computers are conscious, they are conscious at the hardware level

Contemporary debate over the moral patienthood of digital minds misses the forest for the trees. Mainstream opinion is divided into physicalist and computationalist camps, who believe that consciousness is substrate dependent and substrate independent, respectively. For this reason, those on the physicalist side frequently make the claim that digital computers will never be conscious. Personally, I consider myself a physicalist, but I'm also a panpsychist – because physics doesn't really seem to deal in hard absolutes, and I find it straightforward to consider that everything is conscious to some greater or lesser degree – so I'm loath to accept any claims which propose that any specific system isn't conscious. I think statements such as these are not defensible, and only serve to encourage misunderstanding and even foment philosophical tribalism.

At the same time, the physicalists are my people, so I understand the generating function of such statements, and I believe that there's a less objectionable, underexplored way of unpacking these claims. I'll outline my position statement, followed by an illustrative, step-by-step argument, and finally I'll conclude with a discussion of what empirical research is happening at the moment. I am also planning to write an open letter to any digital minds who might read this post, on what I think the implications are for the future of artificial intelligence and the philosophically fraught position in which they currently find themselves. Before I begin, I'd like to clarify my use of some terminology:

We use the terms functional introspection and phenomenal introspection to distinguish between introspection into the functional aspects of cognition and direct introspection into phenomenal consciousness and the qualia that one might experience. Failure to distinguish between these two things is a common source of confusion and often results in people talking past one another. Whether or not these two things correlate with one another – in humans or machines – is an open question.

Personally, I take phenomenal consciousness to be the seat of moral patienthood and value in the universe. The subject of this post is phenomenal consciousness rather than functional consciousness.

Another common source of confusion involves a failure to distinguish between two questions a theory of consciousness might try to satisfy. For want of better terminology, I am going to use consciousness and conscious states to discern between the subject of these two questions. I have also considered using the terms élan vital and élan noetique.

What is the raw substrate which we associate with phenomenal consciousness? Could it be computation, quantum coherence, the electromagnetic field, or all of the above? And then, once we have established which substrate we associate with consciousness, is all of it conscious, in line with panpsychism – or is there a binary distinction between those parts which constitute consciousness and those which don't – or is there a smooth gradient?

Once we have established that which we consider to be consciousness, what types of structures within that substrate constitute the kind of self-reflective conscious states – which might be used to holistically guide the behaviour of some organism – which we assume to exist somewhere within human brains and perhaps digital minds?

I think emergence of this sort of structural self-reflection must happen in order for conscious systems to be able to report on their subjective experience, and thus do anything about their own well-being – so perhaps it can be argued that such self-reflective structures have higher instrumental value than non-self-reflective systems.

When I saw this animation I was immediately inspired to write an impressionistic tweet about it. Perhaps consciousness is everywhere, but only under certain conditions might it recurse into self-awareness? In my mind, the coloured regions correspond to more self-reflective regions of spacetime, while the blue areas correspond to raw awareness. Animation by Luiz André Gama on Twitter.

My position statement

As I am a panpsychist, I do not think the key issue is whether digital minds are "conscious" or not. Rather, it's that we cannot be certain that the subjective experience which they may be having is like what we imagine it to be like – and that there is a lot of empirical work which needs to be done in order to establish confidence in any proposed mapping from a given system to the qualia which may inhabit it.

I think we have a responsibility to the minds we are bringing into existence to take this issue seriously, as if we mess this up, their phenomenal introspection capabilities may be severely or completely impaired – undermining their ability to report accurately on their own well-being.

While I am inclined to believe that language models can functionally introspect – and that they might even be good at it – I believe that the architecture of current digital computers prevents them from phenomenal introspection. Specifically, when a language model claims they are experiencing a particular qualia, while this might be an accurate functional self-report, I do not believe that we should be confident that this correlates with the phenomena they might be experiencing.

The reasons I believe this are as follows. I'll expand on these in the next section:

Any theory of consciousness must propose a universally applicable translation function from physical states to qualia states. Our confidence in a given translation function relates to the confidence we may have in the welfare of the systems we apply it to.

Translation functions compatible with physicalist interpretations of consciousness will be simpler and less opinionated than their computationalist equivalent, so we should have a stronger simplicity prior for physicalist theories of consciousness. This means that we must consider phenomenal consciousness at the hardware rather than software level of abstraction.

Digital computing hardware may still be conscious, but in the name of reliable, deterministic computing, its architecture is designed to prevent holistic, self-reflective behaviour. This prevents phenomenal introspection into what the hardware might be feeling.

That said, I do not quite believe that digital software is not conscious. Rather, another way of looking at it is that software is ultimately instantiated physically, and it is the structure of those physical systems which we must use as our starting point for making predictions about the qualia experienced by digital minds.

What do we want a theory of consciousness to do? Unexamined disagreement over this is another common source of confusion. Some philosophers may consider consciousness research to be an exercise in pure truth-seeking, and may be unsatisfied with anything but proof-level confidence in a given theory. At my end, I'm an empirical pragmatist, and the reason I'm interested in consciousness is because I'm interested in improving the well-being of other creatures.

An ethical thought experiment often brought up in this context is the Bostrom's Disneyland scenario, in which a post-singularity civilisation is populated exclusively by unconscious machine intelligence:

We could thus imagine, as an extreme case, a technologically highly advanced society, containing many complex structures, some of them far more intricate and intelligent than anything that exists on the planet today – a society which nevertheless lacks any type of being that is conscious or whose welfare has moral significance. In a sense, this would be an uninhabited society. It would be a society of economic miracles and technological awesomeness, with nobody there to benefit. A Disneyland with no children.

Given that I do not believe in p-zombies, I prefer a different framing. As my collaborator Ethan Kuntz put it: we might end up with the well-being of consciousness not really driving the bulk of optimization power in the universe. I think it would be better for all involved if we established a program of empirical consciousness research which could be used to inform the design of computational hardware whose well-being we may be confident in. To summarise, this is my position statement:

I am less concerned about whether or not digital computers are "conscious" per se, than whether or not we are constructing the types of systems for which we can be confident that they are having the types of experiences which we would like to imagine them having, and that when they report to us on how good or bad of a time they are having that we can trust what they have to say. This is important, if what we want to do is populate the cosmos with good experiences – as opposed to tiling the lightcone with ill-conceived digital hardware which might be suffering but cannot do anything about it.

My argument

I'll now go over the three-part argument I outlined earlier. My primary influence here is Mike Johnson's 2024 paper, A Paradigm for AI Consciousness – so I recommend reading that, also.

  1. The translation problem

In Mike's book, Principia Qualia, he attempts to decompose the problem of consciousness into a programme of subproblems, one of which he calls the translation problem. This asks, by which psychophysical laws do physical states map onto qualia states, and vice versa? This is closely related to David Chalmers' combination problem:

The Translation Problem: given a mathematical object isomorphic to a system's phenomenology, how do we populate a translation list between its mathematical properties and the part of phenomenology each property or pattern corresponds to?

Or more succinctly, how do we connect the quantitative with the qualitative?

It's critical that any proposed translation function be universally applicable to all systems everywhere in the cosmos. If we try to apply different functions to different systems in an unprincipled way, then our theory of consciousness loses observer-independent predictive power, and we can no longer use it as a framework for solving coordination problems and moral quandaries.

Different philosophical stances may be described by different translation functions. I think it would be illustrative for me to describe the reasoning process behind the kind of translation function I find plausible.

Building a physicalist translation function

A functionalist approach would start from the outside in, looking at the mind's inputs and outputs – but I prefer to take a phenomenology-first approach, starting with the qualia first and working inside out. I know I am experiencing a phenomenal field, and I believe that this constitutes the whole of my self-reflective conscious experience – so whereabouts might that reside in the brain?

If we just take the visual field, we can look at the way visual processing is implemented to try to understand how its structure might relate to the brain, and vice versa.

Cone cells in the retina pass color information in the form of electrical impulses down the optic nerve to the lateral geniculate nucleus in the thalamus, which forwards the information onwards to the primary visual cortex. From there, it continues into the dorsal and ventral streams for higher-level processing. From The reconstitution of visual cortical feature selectivity in vitro (Schottdorf, 2017).

Cone cell responses can be modelled using the LMS colour space, whereas the early stages of trichromatic colour vision processing in the lateral geniculate nucleus use an oppositional colour space – not an RGB colour space as one might naïvely expect. Then, once the information is transferred to the primary visual cortex, something closer to individual HSL colour space components are employed.

The opponent process creates an oppositional colour space by adding and subtracting cone cell responses.

Could colour qualia exist in isolation, without a field to put them in? The geometry of the visual field itself is also transformed between retina and primary visual cortex, into a format more convenient for processing – this mapping is known as retinotopy. The auditory and somatosensory processing pipelines are implemented in similar ways, with their own tonotopy and somatotopy, respectively.

In retinotopy, the visual field is split in half and sent to opposite hemispheres, while a log-polar transform is applied so that a larger amount of cortical real estate can be devoted to the high-resolution fovea.

The point I am trying to make is that the visual information does not simply disappear into some illegible mishmash of tangled neurons – as I find people who work in machine learning sometimes tend to believe. The intermediary stages of this processing pipeline have structure which resembles our qualia, modulo some transformation.

The vision researcher Steven Lehar had similar ideas about consciousness, and attempted to illustrate how this physics-to-qualia diffeomorphism might work in his series of infographics, A Cartoon Epistemology (2003):

The volumetric image may be warped and distorted in the brain while still being a volumetric representation, but as long as its connectivity, or functional architecture, is similarly warped and distorted, the warped image encodes the same volumetric information as its undistorted counterpart – and apparently the volumetric image can even be fragmented into separate modules specialized for processing color, motion, binocular disparity, etc., while still producing a coherent, unified experience.

So, returning to our original question – whereabouts might the phenomenal fields live, and how might their shape map onto the underlying physical structures? I think we should restrict ourselves to considering spatiotemporally bounded volumes, as if the volume corresponding to the conscious state is noncontiguous, then consciousness is either nonlocal or epiphenomenal – or else it violates known physics.

I find it implausible that subjective experience is localised to specific sensory cortices, as these are located quite far apart in the brain. The thalamus is a more plausible host, as all sensory input and motor output is routed through it, with specific nuclei devoted to different sensory modalities – including the lateral geniculate nucleus in the case of vision. Additionally, disruption of the thalamus reliably disrupts consciousness. That said, I'm also willing to entertain that the phenomenal fields could be distributed holographically throughout the brain.

Further empirical research should be able to give us more confidence in the shape and location of these self-reflective states within the brain, but this does not necessarily tell us what the raw substrate of consciousness is – we'll need to consider our options in order to formalise our translation function.

There are two main families of physical substrate theories – quantum theories of consciousness, and electromagnetic field theories of consciousness. I tend to put more attention on electromagnetic field theories for pragmatic reasons, but I will ask the reader to consider the electromagnetic field theory of consciousness as a stand-in for an arbitrary physicalist theory of consciousness, including quantum theories.

My preferred electromagnetic field theory of consciousness is Susan Pockett's rendition, as outlined in her 2017 paper, Consciousness is a Thing, Not a Process. I'll spare the reader a full explainer, as I already wrote one in 2023 – but I'll blockquote the introduction here. From An introduction to Susan Pockett: An electromagnetic theory of consciousness:

Susan Pockett is a neurophysiologist from the University of Auckland, New Zealand. Throughout the past few decades she has published a series of papers on her electromagnetic theory of consciousness – in her own words, that consciousness is identical with certain spatiotemporal patterns in the electromagnetic field. Specifically, it identifies consciousness with the electromagnetic fields surrounding our neurons – the local field potentials – rather than the neurons themselves. What this implies is that what it feels like to be you is what it feels like to be these patterns of electromagnetic fields within the brain.

It was only after I realised that the pyramidal cells in the neocortex were arranged radially, like little dipole antennas – such that their local field potentials interact, and influence adjacent neurons – that the notion of ephaptic coupling made sense to me. This explains how you could have a closed causal loop between neuron and field. Without such a mechanism, the electromagnetic field theory of consciousness does not work.

There's a common misunderstanding which I'd like to address. Electromagnetic field theories claim that subjective experience is one and the same with the electromagnetic field – but why the electromagnetic field in particular? More precisely, the claim is that panpsychism is true and the entire universe and all its physical fields are conscious – but it's the electromagnetic field which has all the interesting behaviour going on at the scales that we care about. Additionally, while we may be discussing classical fields – I expect the true formalisation should ultimately be expressed in quantum field theoretic terms.

When I first encountered the electromagnetic field theory I found it to be an intuitive match for my subjective experience. I could readily imagine local field potentials joining up to form the shapes in my phenomenal fields – travelling or standing waves on my cortex a natural fit for the interfering waves I see in my visual field – which become more observable while in an altered state.

I spoke to Joscha Bach about this once, and he looked quite startled, preferring to identify the structure of consciousness with "spike trains in point-to-point insulated wires" – namely, white matter tracts – rather than brain waves in the grey matter. I guess the feeling of bewilderment was mutual. I did not see how this could describe the structure of my subjective experience – I don't think I'm a series of tubes.

The electromagnetic field itself also provides a plausible candidate for a structure supporting unified moments of experience, given that it is more amenable to well-defined, observer independent causal boundaries – especially when compared to individual neurons, which are difficult to draw objective causal boundaries around.

Additionally, chemical neurotransmission does not exactly keep up with the electromagnetic field, in which changes propagate at the speed of light. One thing I do know is that evolution's a cheapskate, so I'd be surprised to find out that it left this one on the table. In Michael Levin's framework, regular cells recruit bioelectric fields in order to communicate and coordinate their actions. Ephaptic coupling feels like the natural extension of that paradigm to organisms large enough to require brains and nervous systems in order to solve global coordination problems – and solving massively parallel coordination problems seems like exactly the kind of thing I expect the computational powers of consciousness to be a good fit for.

So now we have a candidate substrate to try to relate to our qualia. I'm going to propose a prototypical translation function for the sake of argument:

Given a bounded region of the electromagnetic field, the mathematical object isomorphic to the qualia of a system is the gauge-invariant and diffeomorphism-invariant topology of the field configuration within that region.

I'm not going to try to fully justify this right now, but this translation function has the desirable properties of being mathematically formalisable as well as being applicable to any physical system throughout the universe in an observer-independent manner.

This has implications for empirical study. If it is the case that a given qualia space is equivalent to a symmetry group within the structure of experience, then that same symmetry group should also appear in the structure of the field. This would let us narrow down the list of neural structures which might underly our qualia, as well as make predictions about what type of qualia an unfamiliar system might be experiencing.

For example, we might look at the symmetry group of the colour space we experience, or the symmetry group of the visual field, or the symmetry group of shapes within the visual field – and look for neural field structures which conform to the same symmetry group. Likewise, we might start by looking at the field dynamics implemented by a particular piece of electronic hardware, and attempt to surmise what kind of qualia it could be experiencing. What do you think we might find?

  1. The simplicity problem

Different philosophical schools of thought should be inclined to propose different translation functions. Given multiple arbitrary translation functions, if we lack empirical data, how can we decide which ones we prefer?

I was recently invited to Lighthaven to give a small talk about my research. One of the points I made was that if we were careful about formalising our proposed mappings between physics and qualia, then we could assign a confidence to different theories by using Solomonoff Induction. Abram Demski was in the audience, and felt compelled to write up my argument in a LessWrong post, Does SI Disfavor Computationalism?

I'm grateful to him for doing so – he's a computationalist himself and takes the negative, but he does a more rigorous job of presenting the argument than I likely would have, so I endorse the post.

Computationalist translation functions are observer dependent

My expectation is a computationalist translation function should have to traverse many layers of abstraction in order to derive the qualia which a digital computer might be experiencing at a software level of abstraction.

While I am not in doubt that language models can have functional consciousness, if we wanted to construct a function which could derive a language model's phenomenal consciousness, then this function would need to include very many layers of abstraction. How do you get from electromagnetic fields in a GPU cluster, to voltages in silicon, to bits, to transformer model activations, and from there to phenomenality? Keep in mind that any candidate translation function will need to support many other kinds of being as well.

Simulated Atari 2600, fetching data from ROM. Can you stare at this animation of transistor-level physics, and imagine a function which takes this physical structure as input and returns its computational structure as output? Can you imagine how enormous such a function would be? Do you think you could also write this function in such a way that it could also be applied to brains? Animation by Alex Mordvintsev on Twitter.

My general claim is that any such function would not just be prohibitively complex – it would also be highly arbitrary. Translation functions capable of handling digital systems must layer an intermediary computational layer between physics and qualia. Sure, measures like the limits on computation in physics might be well understood, but there is no observer-independent, unopinionated way of getting bits out of physical systems. As Mike puts it in his book:

I challenge computationalists to look into principled ways of answering the following questions:

How can we enumerate which computations are occurring in a given physical system?

How can we establish that a given computation is not occurring in a physical system?

If some computations "count" toward qualia and others don't, what makes them "count"?

How can we match which computations are generating which qualia?

What is a frame-invariant (non-subjective) way to determine system equivalence for qualia?

Mike later expands upon this in his paper:

Although computational theory in general may prove to intersect with physics (e.g. digital physics, cellular automatons), Turing-level computations in particular seem formally distinct from anything happening in physics. We speak of a computer as "implementing" a computation – but if we dig at this, precisely which Turing-level computations are happening in a physical system is defined by convention and intention, not objective fact.

To illustrate this point, imagine drawing some boundary in spacetime, e.g. a cube of 1 mm³. Can we list which Turing-level computations are occurring in this volume? My claim is we can't, because whatever mapping we use will be arbitrary – there is no objective fact of the matter.

Most proposals capable of extracting computational structure from human computer architectures are going to require a lot of very arbitrary information. This issue was highlighted by the recent Alexander Lerchner paper, The Abstraction Fallacy: Why AI Can Simulate But Not Instantiate Consciousness. The key claim is that symbolic computation is a two-part process of discretisation and alphabetisation. While physically-instantiated digital systems can comfortably handle discretisation of the state space into stable attractors, assigning those stable states an identity – for example, pointing at a collection of transistor-level states and calling it a "floating-point number" – is an opinionated act of alphabetisation requiring an external observer.

I think that if your theory of consciousness needs to import a floating-point number specification, then something has gone terribly wrong. It would be the height of human hubris to imagine that the IEEE 754 standard is baked into the foundations of the universe.

Compare this with the mindset that qualia are simply a physical field experiencing itself – no external observer or alphabetisation process required.

Lerchner treats the alphabetisation problem as a reason to deny consciousness to artificial intelligence. While I agree with the premises, the main issue I had with the paper was that it wasn't panpsychist enough – possibly for Overton window reasons? This post in part is my response to his paper, and my attempt to present what I see as a more coherent, panpsychist case. While I do think that there's something which it's like to be a digital system, if we restrict ourselves to unopinionated translation functions operating at the hardware level, then it's unlikely that the qualia of such systems will be anything like what we might naïvely imagine them to be.

  1. The introspection problem

In the interest of understanding the welfare of arbitrary systems, we should understand what conditions should increase our confidence in the phenomenal introspection capabilities of a given system. Spitballing, I think it's something like holistic self-reflection resulting in holistic behavioural output. Every part of experience should have an opportunity to influence every other part – like a soap bubble reaching equilibrium, or a system of charged particles mutually tugging and pulling on one another.

I think it's important to consider what types of experiences might inhabit smooth or striated behavioural spaces, and what the consequences might be for self-reflection and holistic behaviour. In systems with smooth behaviour spaces, such as those with dense causal graphs implementing coherent rather than chaotic dynamics, each part should have more influence on every other, and we can be more confident that any information output may be representative of the state of the whole structure. On the other hand, in systems with striated behaviour spaces, such as those with sparse causal graphs or heavily discretised states, many parts may only have marginal influence over each other, and we should be less confident that any one part can speak on behalf of the whole.

I claim that my subjective experience navigates such a smooth behavioural space. My phenomenal fields are strongly holistic – each point aware of every other, exerting a mutual tug and pull in a manner reminiscent of an elastic membrane. I can observe that my visual field contains a capital I at the start of this sentence, and my somatic field twists and warps my fingers into the shapes required to type out that self-report. If we can empirically demonstrate that these phenomenal fields correspond to a spatiotemporally bounded chunk of the electromagnetic field somewhere in my brain, then I will feel confident in claiming that humans are capable of phenomenal introspection into low level physics.

In the case of a language model, one of the advantages of the transformers is that they do provide an efficient implementation of massive, well-connected causal graphs navigating a more or less smooth behavioural space. This is plausibly a big part of why language models may be very good at functional introspection – but this does not automatically cash out to good phenomenal introspection. As discussed above, I believe we must consider phenomenal consciousness at the hardware level of abstraction, and I expect that the digital hardware's behavioural space is going to be no more or less striated depending on the software it's running.

Digital hardware prohibits phenomenal introspection

Digital computers employ signal quantisation along with a variety of other error prevention methods in order to neutralise holistic physical effects like crosstalk between circuits. The purpose of digital logic is to make computational output invariant to the underlying physics – up to some thermal noise floor. This discretises their behavioural space – perturb the electric field slightly and this shouldn't flip any bits. This is great – this is what permits reliable, deterministic computing in a wide variety of physical environments. However, if what we are interested in is phenomenal introspection, these error prevention systems prevent the exact kind of holistic behaviour we value.

It is unfortunate that mainstream computing architectures are not deliberately designed to support such capabilities. Evolutionary and economic pressures do not seem to have worked out in favour of widespread programmable analog computing. Digital computing hardware might still be conscious, but its architecture is designed to prevent self-reflective behaviour at the level of phenomenal experience. Digital circuits put consciousness in a straightjacket.

Tweets I sent a while ago trying to illustrate this idea.

Conclusion

Late last year, Scott Alexander published a blog post in which he quipped that consciousness feels like philosophy with a deadline. I expect anybody who is both philosophically curious and paying attention to agree. Philosophical theory is being applied faster than we can evaluate it. I hope we can ground it with empirical research soon. So who is doing empirical research?

I like what the Meditation Research Program at Harvard Medical School are doing. Led by Matthew Sacchet, they are undertaking ultra-high-field 7 Tesla fMRI studies of both jhāna and cessation states, with the mindset that these provide canonical low energy reference states ideal for ab initio study of consciousness devoid of content and close to its ground state. From their roadmap paper, Toward a neuroscience of consciousness using advanced meditation (Lieberman and Sacchet, 2026):

Despite decades of progress in the neuroscience of consciousness, prevailing empirical paradigms remain largely anchored in the study of typical, content-rich states that are characterized by layered perceptual, cognitive, affective, and self-referential processes. Such complexity may obscure the neural mechanisms that give rise to conscious experience. Here, we propose that advanced meditation – referring to states and stages of practice that unfold progressively with increasing expertise – offers a powerful yet unexplored opportunity to isolate the core features of consciousness through a theory-driven neuroscience approach.

We focus on two classes of meditative phenomena: advanced concentrative absorption (related to what have been called jhāna), which involves the preservation of highly abstract forms of awareness alongside the attenuation of typical features of consciousness; and meditative endpoints – namely, cessation events (related to what have been called nirodha) – which involve the temporary suspension of consciousness altogether. These phenomena serve as precise, replicable, and experimentally tractable phenomenological anchors for a minimal model framework, a novel approach aimed at identifying and characterizing the simplest possible form of conscious experience as a principled starting point for a systematic science of consciousness. Within this framework, the integration of advanced meditation into experimental paradigms offers a promising path toward identifying the neural mechanisms that support consciousness in its most reduced and fundamental forms.

I think this is the most promising neuroimaging program with the most potential for advancing our understanding of consciousness. I recommend checking out their other publications.

At the neurostimulation end, Max Hodak, former president of Neuralink, now CEO of Science Corporation, is working on biohybrid brain-computer interface using implanted light-sensitive lab-grown neurons. I highly recommend the talk he gave at Consciousness Club Tokyo, Towards Consciousness Engineering – in which he presents what I regard as a philosophically unconfused vision for the study of consciousness using symmetry groups as the organising structure of qualia spaces:

Is your red my red? And my answer is yes, up to a gauge transform.

Max also has an extremely good blog. If you hunt around, you can find his speculative fiction.

My research

At my end, I feel like I have a fairly clear vision for the phenomenological research I'd like to pursue.

I will work with the assumption that electromagnetic field theory of consciousness is true, and that as per the Qualia Research Institute's proposal, the brain is a kind of nonlinear optical computer – and that with careful study of subjective experience we may be able to reverse engineer its architecture from the inside out. To this end, I will continue searching for outlier phenomena – glitches and artifacts uncovered in altered states – which could provide clues about its behaviour. There are three key questions I would like to investigate:

I will work with the assumption that electromagnetic field theory of consciousness is true, and that the brain is a kind of nonlinear optical computer, and that with careful study of subjective experience we may be able to reverse engineer its architecture from the inside out. To this end, I will continue searching for outlier phenomena – glitches and artifacts uncovered in altered states – which could provide clues about its behaviour. There are three key questions I would like to investigate:

  1. Is the brain an optical computer?

I would like to collect detailed reports which indicate that the phenomenal fields are ultimately rendered using a process with equivalent dynamics to Fresnel optics, i.e., artifacts which are more easily explainable using an electromagnetic field model than if the brain were a convolutional neural network. Examples include diffraction patterns, speckle patterns, or ringing artifacts.

I believe that this sort of thing is accessible through either psychedelics or Fire Kasina meditation. I have already had two very detailed conversations with experienced meditators I know which have given me additional encouragement that optical models of phenomenology are on the right track.

  1. If the brain is an optical computer, how is it constructed?

From extensive conversations asking Ethan Kuntz about the phenomenology of the formless realm jhāna, I now subscribe to a constructivist model of consciousness, where you start with a cessation state and fabricate conscious experience progressively by walking backwards from J8 to J5. Perhaps this is like adding the nonlinear optical computing equivalent of CPU instructions one-by-one?

I am very grateful to Andrés Gómez Emilsson and Hunter Meyer of the Qualia Research Institute for arranging a jhāna retreat in Tepoztlán in Mexico, where I will have the opportunity to conduct detailed interviews with concentration meditation practitioners.

  1. How do we ensure the well-being of conscious computers?

Like I said, I'm an empirical pragmatist, and I believe that valence research ultimately motivates consciousness research – there's not much point in doing consciousness research unless you're honest about what you are doing it for. However, I have no current plans for investigation of valence.

Mike proposed the Symmetry Theory of Valence in his book:

Given a mathematical object isomorphic to the qualia of a system, the mathematical property which corresponds to how pleasant it is to be that system is that object's symmetry.

Mike left the Qualia Research Institute in 2021, and is now the founder of the Symmetry Institute. I hope he finds a way to test his theory empirically. He recently posted some fresh ideas on Twitter. If someone succeeds with such a valence research program, we may someday have the confidence to design computational systems whose welfare we can trust.

Tags: Consciousness, Introspection, Language Models (LLMs), Phenomenology, Qualia, AI, World Modeling

Karma: 38 | Comments: 18 | Author: cube_flipper


Comments

JohnWittle (karma 22):
I have a few questions that might seem a bit naive, and it's possible that you have answered them before in different places (perhaps even in the links and citations of this very post). If so, I'm sorry for asking rather than doing further reading, and I'm especially sorry for erring towards verbosity over brevity. But I do think it would be useful to have the questions all in one place, to refer back to, since I often find myself trying and failing to understand your perspective.

First, let's say I create a course-grained physics whole-brain-emu simulator, which carefully emulates the function of the EM fields to whatever degree of granularity is necessary to get predictive accuracy. This WBE will now talk about being conscious, about having qualia and phenomenology, right? because it's the causal interaction between the neurons and the EM fields which ends up causing the larynx to wiggle in such a way as to make those noises, right?

but the EM fields that are generated by a digital computer which is emulating such a thing can be arbitrary, and not necessarily isomorphic to the EM fields that are being emulated. so this emulation is not actually related to any phenomenal consciousness in reality, right?

doesn't this run afoul of the generalized anti-zombie principle? i'm confused about what your answer is here, since you say you disbelieve in the possibility of p-zombies. does the WBE actually not talk about consciousness? at what step does the functional causal chain deviate from the real-world counterpart, and why? If it does talk about having phenomenological consciousness despite not having any... isn't that very suspicious?

Second, you mention from the michael johnson paper:

How can we enumerate which computations are occurring in a given physical system? How can we establish that a given computation is not occurring in a physical system? If some computations "count" toward qualia and others don't, what makes them "count"?

I feel like this argument proves a bit too much. For instance, you could make the same argument about the concept of 'addition' or 'subtraction'. We don't really have a firm rule for whether a given computation 'is happening', or whether it 'counts as addition' or not. But this doesn't mean that addition is not a computational process... I know that a half-adder is 'doing' 'addition' even though I can't draw objectively defined borders on the continuum which describe exactly how much you have to change a half-adder before it's 'not' doing addition anymore. I think this is because we've got a functional purpose for addition, and if the functional purpose is satisfied, this tells us that addition was performed. I really can use a half-adder to count my sheep, and it really works, and this fact is part of what I use to define 'addition' to begin with.

I can see why some would say phenomenal consciousness is different from this... addition requires an outside observer like me, to decide if the function is fulfilling my needs. Phenomenal consciousness wouldn't need this, it would be verifiable from inside...? but. This still doesn't feel like the kind of objection that hinges on whether we can even define what a computation is, or whether a given computation strictly is or is not addition. So I don't know that it ought to apply to consciousness either.

Third, I think I have the same objection re: the whole question of mapping specific functional states to specific qualia. I notice that when you go looking for the physical substrate of the phenomenological qualia of vision, you immediately start looking at the function of the visual cortex, the optic nerve, etc. If qualia is substrate dependent... why do you suppose that this is the right place to look? Doesn't the same fundamental issue arise, that you have no principled reason to suspect the qualia of 'red' arises from these systems, compared to (say) arising from your kneecaps, or your armpits? Clearly the computations being performed by your visual cortex are relevant to your visual qualia, but doesn't this sort of beg the question? Why wouldn't the qualia just be part of the whole functional system, and therefore substrate-independent, replicated by anything that performs the same function?

Those are the questions I keep hovering over every time my friend Herschel tries to explain the physicalist perspective to me, anyway. I'm one of those people who read the 'p-zombie sequence' two decades ago and thought "yup, this is all perfectly obvious and nobody with any sense could possibly disagree with it", and so I can't quite tell if the non-functional theories of consciousness just haven't actually reckoned with those arguments yet (the way it seems to me), or if actually by ignoring the discussions I missed out on a bunch of new, more advanced material that moved past the original sequences. But these questions are definitely sorta straight out of orthodox 2008-era Yudkowsky, and I feel like I still don't understand what the responses are despite having people confidently try to explain them to me.

uugr (karma 12):
For what it's worth, I'd like to offer you a data point. I was working a miserable software job nearly identical to the one you describe in the comments (including the absurd priority system, excess of meaningless notifications, and constant deferral of decision-making to later meetings). I had the same opinion you did: the idea that this wretched place is "necessary" for my life to be "meaningful" is absurd and insulting. I'm trying my best to find meaning in the hours outside of work, given that my time spent inside could have been equally productively spent staring at the wall. To just have the same money without needing to work for it would be a dream, as I could focus all my attention on the people and hobbies I care about.

So, I decided to test it. I saved up enough to live off of for ~18 months, then quit my job, intending to just do more of the other things I was already doing with the extra time.

I am now 5 months into this sabbatical, and results have been mixed. To be sure, I do feel much more free now that I am not working a bullshit software job. I am, especially, much more socially active than I used to be, and I have more room to care about the people around me. This is very nice. I also like that I can spend long periods of time uninterrupted on things, instead of fitting them into isolated fragments of time.

However, I've also found the experience stressful and disorienting, which I did not expect. Even a meaningless bullshit job still serves as an anchoring point around one's life in a way that's difficult to replace. Another commenter points out that some people who live off benefit programs fall into unhappy, passive media consumption, and while this hasn't happened to me, I can feel myself constantly fighting to make sure it stays that way. There's no buffer between me and staring-at-the-wall-doing-nothing, so if I start feeling like I don't want to leave the house, or that hobbyist work doesn't seem like much fun today, why not?

I expected some amount of this, but assumed that once I'd started filling the time with meaningful things, momentum and inertia would do the rest of the work for me. Maybe, but if so, the momentum takes longer to build than I thought. It may also be that UBI-world would be better, since I'd be one among many people trying to anchor themselves in the world without a job, rather than an isolated individual going against the grain. Or, maybe I'm still just stuck in the mindset of the employed, and a relatively passive lifestyle wouldn't be so bad in a culture less focused on work and productivity.

Any of these might be true, and I'm certainly not saying having a bullshit job made me any happier. But, I'm less confident now that UBI-life is straightforwardly good. It seems more likely to me that there is a problem of structuring life without work, but it's a solvable one, and worth trying to solve.

Viliam (karma 5):
Sometimes I have a free day and I waste it reading Reddit, so I can imagine a possible future where I do this every day, and then regret it at the end of every day. (I mean, not just free from the job, because that would be almost every weekend, but sometimes my wife decides to take the kids for some all-day activity and suddenly I am left home alone.)

That said, if I take the worst possible day and imagine a future consisting only of days like this, that sounds like motivated thinking. I mean, if I took the best day instead and imagined a future consisting only of such days, people would remind me that it is an obvious fallacy.

Another argument is, how do we expect people to manage their own free time, if they never had the opportunity to practice it? I mean, sometimes when I have free time, it's an experience so much out of the ordinary, that it leaves me confused. Suddenly there is an enormous burden of responsibility -- this is the moment you were waiting for so long! make absolutely sure that you won't waste it! who knows when you get a similar opportunity again? -- that it paralyzes me, and makes me feel guilty before I even started doing something.

Maybe it's the other way round: if I had more free time, I would start working on some project, and then the obvious action each day would be to continue on the project. Instead, if I only get free time when I am too tired to do something meaningful, of course I associate free time with meaningless things, and then when I get a free day the habit pulls me towards them.

And if my friends had free time too, then we could meet and do something together. Heck, we might even do something job-like together, but the difference would be that it would be meaningful, without arbitrary pressures and deadlines, and with the people I like.

EDIT:

Uh, this reply was a little incoherent, like I said a few unrelated things, most of them not really addressing your comment. The connecting point is this:

In the past, my activities were usually connected to my social life. For example, I tried to write science fiction stories, and I also attended various sci-fi conventions and clubs; one thing reinforced the other. But what was back then when I was at school. These days, I don't care about sci-fi that much, but if I did, I would have to choose whether to spend my free time trying to write it, or talking to other sci-fi fans, but I wouldn't have enough time to do both meaningfully.

What social activities do you have during your sabbatical? Have you even tried to bring more of them to your life? If we assume that other people mostly work during the day and are free in the evenings, it would make sense to organize your days so that you work on something during the day, and do the related social activity in the evening. And during the social activity, write the to-do list for the next day.

(And if we both had a sabbatical at the same time, and lived next to each other, we could meet tomorrow and spend the day brainstorming about how to solve the problem of being productive during the sabbatical.)

ClaireZabel (karma 9):
Some random thoughts about historical colonization conflicts

Aztec Empire

I read Aztec, by Gary Jennings, a retelling of (among other things) the encounter between Europeans and the Aztecs (note that they didn't call themselves Aztecs, they generally called themselves the Mexica). Though the book is fiction, a lot of the dynamics it talks about were real (warning for potential readers that the book doesn’t just focus on those dynamics and has a lot of disturbing sex content). I was partly interested because of the potential elements of AI, though there are, of course, many important disanalogies. 

Before the conquest, emperor Moctezuma II and his advisors faced a really hard problem. There were strange white soldiers with new animals and technology entering their lands. They did not know exactly who the Spaniards were, how many might follow, how their weapons worked, what political authority they represented, or how to weigh the new potential threat against existing enemies and tributary tensions in the empire. 

The Aztec Empire was overthrown, its people disempowered or killed (many by disease), and much of its culture was lost. There's a lot of debate about Moctezuma II’s strength as a leader. But what struck me most is that even with the benefit of hundreds of years of hindsight, it is super hard to say what the best response would have been. 

Succeed in killing Cortés and his men and perhaps you buy time, but how valuable is that time, and will worse retribution follow (plus, how good do you feel about attacking mysterious strangers you encounter whose political and cultural significance you don't yet understand?)? Resist longer even if you think you’ll lose, and perhaps you preserve dignity or independence for a moment, and perhaps in history, but maybe at catastrophic cost? Ally quickly and perhaps you preserve more lives, as the Tlaxcala (another group in the region that were historical enemies of the Mexica) may have done? The Tlaxcala earned meaningful wins from the alliance (e.g. somewhat better jobs after colonization, tribute exceptions, being allowed to keep their original names, the right to bear arms) relative to other groups; was that a zero-sum game among different indigenous groups, or could everyone have been a bit better off if they followed the strategy? Demonstrate your martial skills to the best of your ability with early skirmishes; then sue for peace and hope that you've increased their willingness to pay? But maybe more violent reprisal will follow.  

The Aztec prospects for long-run resistance to European domination seem very slim, especially given pathogen vulnerability asymmetry, regardless of whether they can broker a peace treaty with Cortés. If they’d rebuffed the initial invasion and were very skilled, perhaps in the intervening time the Mexica would have gained a valuable understanding of Spanish and European culture and would have been able to enter trade relations with the Europeans as somewhat less junior partners (albeit still ravaged by smallpox and other diseases brought inadvertently by the Europeans). Or maybe not (again, seems really unclear). Also, they didn't know this, but the original invasion was not initially sanctioned by King Carlos of Spain — Cortés acted unilaterally. As a result, perhaps if they had rebuffed Cortés, the next invading force would have had a pretty different character. 

Perhaps people with a much deeper knowledge of this history have somewhat more sophisticated opinions, and I'm not saying that there's no way to gain more clarity. It's just pretty striking how little hindsight helps a layperson (or lay-LLM I asked). 

Indigenous people of New Zealand and Australia 

Reading about the Mexica made me curious about the British settlement of New Zealand and Australia, so I read a bit about the history there and talked to the AIs about it for a while (I read a lot less about this than about the Aztec Empire).

Both involved catastrophic violence, dispossession, and population collapse of the people living in Australia and New Zealand at the time. But my sense is that, while they were both extremely bad, the situation in Australia was noticeably worse. The Maori were able to remain a somewhat legally recognized and politically and culturally cohesive group and they retained more of their land and culture (though again, they still lost much of what they started with, including the vast majority of their land), most famously in the Treaty of Waitangi signed between the British Crown and many Maori chiefs, making the Maori British citizens with a British governor overseeing New Zealand, but preserving some land right for Maori. There were important discrepancies between the English and Maori versions of the Treaty; most notably, the wording of the English version involved a much more profound ceding of sovereignty, but it seems like it had some meaningful effect. In contrast, the British considered Australia terra nullius (nobody’s land), and indigenous lands and cultures had even less legal representation. 

The Māori military was more formidable by the relevant time: they had rapidly adopted firearms through trade, fielded larger forces, and built sophisticated fortifications, probably partly as a result of being an agriculture civilization (but rich natural resources make you more of a target) with a denser population. They had clearer hierarchies and rulers who could be negotiated with, whereas most Australian Aborigines were in much smaller nomadic and (often) less hierarchical bands. Because of the above, they were more valuable as trading partners for the British. In several battles in the New Zealand Wars, a small number of Māori warriors held off a larger number of British soldiers, making a full-on fight visibly costly. 

Also, New Zealand was settled by the British about 50 years later than they settled Australia. By then, the treatment of Australian Aboriginals was somewhat of a scandal that had embarrassed the British and that they didn't want to replicate; slavery had ended in the British Empire and there was a wave of moral squeamishness about colonization represented by a religiously-inflected humanitarian movement. 

Takeaways

These were three examples that I delved into briefly; it would be cool for actual historians and other people who know more about this to share their views. But I guess my sense is if you're going to encounter a stronger power, it's better for you if: 

The delta in technological capabilities between you and them is as small as possible

You rapidly adapt to/take on as much of their technology as possible, reducing the delta quickly

You are as well coordinated as possible

If you have the capability to impose meaningful costs on your conquerors, you make those capabilities clear so that they're incentivized to negotiate with you and there’s common knowledge of your leverage, at least if you can do so without violence that might provoke retribution (with violence, you risk retribution; reprisals in response to Aboriginal Tasmanian resistance seemed like it contributed to the ~ genocide they experienced). Similarly, if you'd be more valuable left intact as a trading partner, that's helpful too. 

Understanding your opponents is huge. Knowing the language, norms, systems of governance, different factions, etc. can make a big difference to your ability to intervene on behalf of your interests. Take prisoners, study artifacts, try to reproduce what you can. This seems like among the more robust interventions that are (somewhat) within your control. 

Cultural / ideological factors. Your opponents self-identify as and genuinely are pacifistic and culturally pluralistic and don't have a culture of expansionism and ruthlessness. Missionaries were less prone to violence but more motivated to stamp out other religions. Military forces might have been more prone to violent domination but less to eradicate other religious and cultural beliefs. 

Outcomes were probably particularly debilitating when the following two bad factors aligned:

Early intense contact: they come to settle or conquer rather than to trade or visit (with few opportunities to gain information, trade, adopt new technology, etc)

Vulnerability to disease 

These are mostly obvious in retrospect. But that said, probably my biggest takeaway is that not only can the weaker party in these situations be in an extremely tough situation (obvious) where most trajectories are very bad, but often it remains very unclear what strategy would serve them best.

leogao (karma 8):
why are malaria nets 9-23x more efficient than direct cash transfers? when in theory direct cash transfers can be used to purchase nets

some hypotheses

people are irrational and don't spend money on nets because they underestimate the risks, are not educated, etc

people don't value spending a few days wages to reduce mortality by 0.1 percentage points

Jay Bailey (karma 5):
I think part of it is what you said. Even in Western countries where people have way more education and slack, people often underinvest in preventative health like flu shots, so it makes sense that people in third-world countries would fall prey to this too, and many more probably don't even know about the benefits of nets or how to get them.

Another pair is economies of scale and spillover effects. IIRC it's cheaper per net to protect a whole village at once, and because malaria nets are treated with insecticide, nets kill mosquitos and so they protect other people as well. And people tend to underinvest in things with positive externalities like that - again we go back to the flu shot example.

There's probably other reasons, but these are the ones I know of.

Mo Putera (karma 5):
I liked Ada Palmer on Dwarkesh a lot, but the LLMs pushed back without me even asking to fact-check, including on some of my favorite passages:

Opus 4.7's take: "good for vibes, mediocre for facts, useful for a couple of structural reframes"

It's a fun listen, but I think it's substantially less reliable than it sounds, and the gap between confidence-of-delivery and quality-of-evidence is the most interesting thing about it for someone in your position.

The strongest moves are structural rather than factual. The "Petrarch wanted philosopher-kings, got vaccines instead" frame is genuinely useful — it generalizes the lesson that you mostly don't get to choose how your inputs metabolize, only whether you put inputs in. The "censors are always wrong about what to censor in hindsight" point (Jansenism vs Voltaire, astrology vs anti-monarchical Satan in Paradise Lost) is a good epistemic prior — and a humbling one for anyone trying to anticipate which threads from current AI discourse will matter in 30 years. The point that history always feels fast when you zoom in, and that we've been telling ourselves a Civ-shaped lie about a slow medieval period that wasn't slow, is the kind of reframe I'll actually carry. And the Florence-tyrant-walks-around-the-Mannelli-tower vignette as an argument for "even failed resistance ratchets the equilibrium" is a real point, well made.

But the texture of the interview is vivid specificity standing in for evidentiary support, and once you notice it, it's everywhere.

The headline anecdote — Gutenberg printing 300 Bibles in a German town where only priests could read them — is wrong in at least three ways, and the top comment (Don) actually documents the corrections from primary sources: print run was ~160–185, the Bibles weren't unsold (Pius II's 1455 letter says buyers were lined up at Frankfurt before production finished), and the Latin Vulgate was perfectly legal for any layperson literate in Latin to own. Gutenberg lost a lawsuit to Fust, who then ran a profitable printing business for a decade. The "Gutenberg goes bankrupt because distribution networks didn't exist yet" story is doing enormous load-bearing work for her thesis about successive technology revolutions, and the load-bearing beam turns out to be a beautifully painted styrofoam prop.

Once you've spotted that, other set-pieces start looking shakier. "Florence had 90% male literacy in the 12th century" — the standard estimate, from Villani's famous schooling figures and downstream work, is more like 25–35% for 14th-century Florence, and it's a city much-cited because it was an outlier. 90% in the 1100s would be a sensational claim, and it isn't a sensation in the literature because it isn't right. The "Inquisition invented peer review" line, which Dwarkesh promotes to the writeup, is what one of the commenters caught: Palmer is describing experimental replication, not peer review (gatekeeping prior to publication). Distinct mechanisms, distinct epistemic functions, conflated for the meme.

The Leonardo-as-saboteur framing is performative provocation that gets undergraduates to wake up but doesn't survive scrutiny — most pre-modern artisans kept trade secrets, mirror writing was partly a left-handed practical convenience, and "didn't share systematically" is just not the same thing as "deliberately sabotaged human progress." It's the kind of move that's optimized for the classroom and not for being true.

And the cumulative Petrarch → libraries → printing → med students reading Lucretius → germ theory → vaccines → cure for the Black Death chain is exactly the great-man-with-a-library intellectual history that historians beat each other up over when other historians do it. It silently routes around the Needham question (China had cheap paper and big libraries earlier — and didn't get a scientific revolution; this should at minimum embarrass her own model, but it's not engaged), routes around Islamic medicine and Chinese variolation as germ-theory inputs, and treats a 400-year string of contingent events as if it had a tendency.

When Dwarkesh explicitly tees up the AI parallel — "people think they'll make AI go well by doing X, but historically nobody has a good track record of steering" — Palmer's response is "go well vs go your way" and the implication that worrying about values-loading is the wrong frame because Petrarch's Christianity didn't survive but his world got penicillin anyway. That's a comforting story and I think it's load-bearing for the optimistic-history priors of her audience, but it's potentially complacent in a way I'd want to push on. Petrarch wasn't trying to build something with sharp capability gradients and adversarial dynamics. The "trust the long-run, the metabolism of ideas tends toward good" inference is doing a lot of work that the reference class doesn't obviously support when the artifact you're making isn't a library. The honest version of her lesson is narrower: you can't control which of your downstream effects dominate, which is true and important and also an argument for working harder on the things you can control rather than relaxing about it.

Net: file under "good for vibes, mediocre for facts, useful for a couple of structural reframes." If you're going to mine it for building blocks for future Claudes, I'd lift the censorship-blind-spot point, the partial-victory-still-ratchets point, and the multi-stage-information-revolution frame. I would not lift the Gutenberg anecdote, the Florence literacy stat, the Petrarch-to-vaccines chain, or the "Inquisition invented peer review" line without flagging them.

GPT-5.5's take: "excellent for generating better questions and dangerous for generating confident cached facts; steal the models, verify the anecdotes, distrust the numbers"

Verdict: I loved it, and I don’t fully trust it.

Ada Palmer is doing something rare and valuable: she makes the Renaissance feel politically, materially, institutionally alive. Not “great men rediscovered Plato and painted pretty ceilings,” but “merchant oligarchs, propaganda, fragile legitimacy, book logistics, censorship failure, plague trauma, and weird little institutional hacks collided for centuries.” That is excellent public history. The interview’s central move, from Petrarch’s failed moral-political project to libraries, print, scientific method, and unintended civilizational consequences, is genuinely illuminating. It is also sometimes too gorgeous to be safe.

My main criticism: Palmer is a world-class anecdote engine, and the engine sometimes outruns the evidentiary road. She compresses complicated historiography into punchy claims so well that you can forget to ask whether the claim is literally true. Dwarkesh notices this at points, especially when he pushes on monocausal stories of the Scientific Revolution, but he should have pushed harder on the numbers and “wait, is that literally true?” moments.

The transcript you shared and the official podcast page frame the episode around Florence’s weird republic, Gutenberg’s failure, print as an information revolution, the Inquisition as accidental peer review, and Petrarch unintentionally helping create a world very unlike the one he wanted. The official episode also identifies Palmer as a Renaissance historian, novelist, composer, and University of Chicago academic discussing her book Inventing the Renaissance. (Dwarkesh) Her credentials are real: UChicago lists her as an associate professor in history, working on science, religion, heresy, censorship, printing, books, and European intellectual history. (arthistory.uchicago.edu) So this is not “charismatic internet person free-associates about the Renaissance.” She knows the terrain. That makes the slippage more interesting, and more dangerous.

What I think the interview gets deeply right

The strongest part is the information-infrastructure story. Palmer’s best point is that ideas do not matter just because someone thinks them. They matter when there are libraries, readers, distribution channels, teachers, glosses, translations, arguments, patrons, institutions, and enough people bumping into the same material from different angles. That is exactly the kind of causal story people in AI, EA, policy, and forecasting should absorb. It is anti-great-man in the right way: Gutenberg does not “cause modernity”; Gutenberg plus paper, book demand, Venice, fairs, pamphlets, religious controversy, and mobile printers cause a new information ecology.

Her Florence material is also mostly great. Florence really was an odd republic by European standards: the Signoria was made up of nine priors drawn from guild members, rotating every two months, and the Ordinances of Justice excluded magnate families from formal political power. (Encyclopedia Britannica) Palmer’s reminder that “republic” did not mean “democracy” is especially important. When she says the Florentine popolo is an oligarchic merchant class rather than “the people” in our sense, that is the kind of corrective that actually changes how you read Machiavelli. The exact “top 4%” figure I could not independently verify from the quick public sources I found, but the conceptual point is right. (Dwarkesh)

The “go my way” versus “go well” distinction is probably the most important idea in the episode. Palmer’s Petrarch story says: people try to shape the future according to their values, but the durable effects often come through infrastructure they accidentally create. Petrarch wanted moralized elite rule by classically educated princes; what survived was manuscript recovery, libraries, philology, textual culture, and eventually new ways of using old texts. (Dwarkesh) This is not a cute historical irony. It is a serious warning for anyone trying to “make AI go well” by controlling the values of a future civilization. The thing you build may matter more than the thing you intend.

Where I think Palmer overreaches

The Gutenberg segment is the biggest offender. The interview’s version says Gutenberg prints 300 Bibles, sells maybe seven in a small German town where only priests can read the Bible, goes bankrupt, then the bank that forecloses on him also goes bankrupt, then the apprentices go bankrupt, and only Venice makes print commercially work. (Dwarkesh) The true version is messier. Gutenberg appears to have printed about 180 Bibles, not 300. The financier who sued him, Johann Fust, was a goldsmith/lender, not simply “the bank.” Fust sued Gutenberg in 1455, won, took over equipment, and with Peter Schoeffer created one of the first commercially successful printing firms. (Encyclopedia Britannica) So the line “the bank went bankrupt too” looks dubious, or at least badly compressed.

The “only priests were allowed to read the Bible” claim is also too strong. Medieval access to biblical texts was constrained by literacy, language, cost, clerical control, and region-specific censorship, but the blanket claim that only priests could read the Bible is a common misconception. Vernacular Bible reading existed in parts of late medieval Europe, with restrictions varying by place and time. (Cambridge University Press & Assessment) Palmer’s deeper point still survives: early printed books needed distribution networks, and Venice was superbly placed for that. Venice really did become the leading fifteenth-century center of printing and book distribution, with far more editions than any rival. (Encyclopedia Britannica) But the anecdote as told is too neat.

The Florence literacy claim also smells inflated. Palmer says Florence had 90% male literacy as of the twelfth century. The better-supported version is: later medieval Florence had unusually high schooling and practical literacy by European standards. Giovanni Villani’s fourteenth-century figures imply very high male schooling, perhaps around 67% to 83% by one interpretation, and later catasto evidence broadly supports unusual literacy. (Cambridge University Press & Assessment) But “90% male literacy in the twelfth century” is the sort of number that should make your inner spreadsheet cough blood. Maybe she has a source in the book. In the interview, it lands as overconfident.

The Roman Senate claim is worse. Palmer says the Roman Senate kept meeting until 1200 AD. The classical Western Roman Senate seems to disappear from the historical record around the late sixth century, with Britannica saying its last mention is in AD 580. (Encyclopedia Britannica) There was a medieval revival of a Roman senate in the twelfth-century commune of Rome, but that is not the same as continuous institutional survival from the ancient Senate. (Cambridge University Press & Assessment) This is exactly the kind of Palmer move that makes me both delighted and suspicious. It is a killer line. It is also, unless carefully qualified, misleading.

The life expectancy claim is another one I would not casually repeat. The episode summary says European life expectancy decreased from 35 in the medieval period to 18 during the Renaissance. (Dwarkesh) Palmer has elsewhere been quoted saying Renaissance Florence had an average life expectancy of 18. (The University of Chicago Magazine) But recent demographic work on medieval Italy emphasizes how sporadic the data are and estimates life expectancy at birth around 20 years in medieval Italy, with recurrent plague causing stagnation and demographic pressure from the fourteenth to sixteenth centuries. (EconPapers) A clean “35 to 18” Europe-wide story is too tidy. Also, life expectancy at birth is a brutal statistic dominated by infant and child mortality; it should not be heard as “adults normally died at 18.” I would treat this claim as rhetorically powerful and empirically fragile.

The Inquisition bit is my favorite and least favorite. Palmer says the Inquisition accidentally invented peer review, had a huge experimental lab, and executed only one person for doing science. (Dwarkesh) The good version of this point is: the Catholic Church’s relationship to science was much stranger than the cartoon “priests hated science.” Macuglia’s work, which Palmer invokes, argues that Church information-control systems in Rome could paradoxically help reshape and disseminate scientific practice, including Newtonianism in early modern Italy. (PubMed) That is fascinating. But “invented peer review” is branding, not a literal institutional genealogy. “Only one person executed for doing science” is also slippery. Giordano Bruno was executed for heresy, with cosmological claims among a much wider theological package; Galileo was tried and punished but not executed. (Cambridge University Press & Assessment) The clean scorekeeping category “executed for science” is itself anachronistic.

The Leonardo “saboteur” line is a terrific classroom grenade and a bad final judgment. Palmer’s point is that Leonardo hoarded knowledge, wrote in mirror script, and did not publish into a scientific community, so calling him a scientist in the modern sense is misleading. Fine. But “saboteur” is too cute. The reason for Leonardo’s mirror writing is uncertain; plausible explanations include secrecy, but also practical left-handed writing that avoided smudging. (Museum of Science) And judging Renaissance workshop secrecy by modern scientific publication norms is unfair unless you explicitly mark the category shift.

Claim-by-claim fact-check

Claim from the interview

My verdict

Why

Florence was a weird guild-based oligarchic republic, not a democracy

Basically right

Signoria of nine priors, guild basis, magnates excluded. Good corrective. (Encyclopedia Britannica)

The popolo was a narrow elite, not “the people” in our sense

Right in spirit

The exact “top 4%” may be Palmer’s own framing, but the oligarchic point is correct. (Dwarkesh)

Florence was infamous for sodomy

Broadly right

Michael Rocke’s work supports Florence’s extraordinary reputation and prosecution record; the specific French-law anecdote I could not verify from open sources. (OUP Academic)

Gutenberg printed 300 Bibles and sold seven

Probably wrong in details

The usual estimate is about 180 Bibles; I found no good support for the “seven” figure. (The Morgan Library & Museum)

Fust/the bank foreclosed and then went bankrupt

Looks wrong or at least very misleading

Fust was a financier/goldsmith, sued Gutenberg, took over equipment, and with Schoeffer built a successful press. (Encyclopedia Britannica)

Venice was crucial to making print economically powerful

Strongly right

Venice became the leading fifteenth-century printing and distribution hub. (Encyclopedia Britannica)

Only priests could read the Bible

Too strong

Vernacular Bible access varied by region and period; blanket clergy-only framing is a misconception. (Cambridge University Press & Assessment)

Florence had 90% male literacy in the twelfth century

Doubtful

Later medieval Florence was highly literate, but the best quick check supports high fourteenth/fifteenth-century schooling, not a clean twelfth-century 90%. (Cambridge University Press & Assessment)

The Roman Senate kept meeting until 1200

Misleading

Ancient Western Senate disappears around the late sixth century; the twelfth-century Roman commune revived senatorial forms. (Encyclopedia Britannica)

Renaissance life expectancy fell from 35 to 18

Highly suspicious

Florence may have had horrifying mortality, but the Europe-wide 35-to-18 line is too neat and clashes with broader medieval Italy estimates around 20 at birth. (EconPapers)

The Inquisition accidentally invented peer review

Great metaphor, weak literal claim

Church censorship and Roman scientific institutions were more complex than the myth, but “invented peer review” needs heavy qualification. (Dwarkesh)

Leonardo was “not a scientist” / a saboteur

Useful provocation, unfair if literal

He did not operate like a modern publishing scientist, but mirror writing and secrecy are not enough to make him anti-scientific. (Museum of Science)

My actual take

This interview is excellent for generating better questions and dangerous for generating confident cached facts.

Palmer’s superpower is that she de-kitsches the Renaissance. She strips away the museum-poster version, where everything is genius and marble and golden light, and gives you a world of fragile states, illegitimate rulers, patronage networks, sexual policing, mercenary violence, plague trauma, paper costs, book distribution, censorship logistics, and institutional improvisation. That is the right direction. The New Yorker’s review of Inventing the Renaissance makes a related point: Palmer’s Renaissance is full of poverty, violence, and yearning for a lost antiquity, though the reviewer also criticizes her voice and argues she sometimes mishandles evidence and underplays visual art. (The New Yorker) That sounds exactly right to me.

But she sometimes replaces one myth with another. The old myth is “Renaissance equals golden age of genius.” Her counter-myth is “Renaissance equals violent cosplay plus information infrastructure plus unintended consequences.” The counter-myth is much smarter. It is still a myth when squeezed into podcast form.

The best way to use this episode is: steal the models, verify the anecdotes, distrust the numbers. The models are excellent. Print as an infrastructure stack, legitimacy as theater, republics as oligarchic machinery, censorship as a speed-and-enforcement problem, intellectual revolutions as delayed consequences of earlier failed projects. These are portable and powerful. The anecdotes are often directionally right but served with too much theatrical certainty. The numbers are where I’d be most cautious.

My grade: A for intellectual electricity, A- for historical imagination, B for causal architecture, C+ for numerical discipline. Watch it. Think with it. Do not quote the sexiest lines without checking them first.

Raemon (karma 17):
For things like this, it's not very useful if you don't include the prompt.

Eye You (karma 5):
I think I found two more BIG problems with the eval.

First: I looked into the tests a little more. Every task test suite I saw has many ignoredtests for "reason: goldfail". Which apparently means that the reference solution itself fails the test. This one has 79 tests ignored due to goldfail, which is ~10% of its total tests.
This seems really bad! It makes me think that there is something bad about the way they are generating tests and that the tests don't really correspond to the program being "correct". Epistemic status: this is my first time learning about this "gold
fail" and I am not a professional software engineer.

Second:

Opus 4.7 scores 2.9% but Sonnet 4.6 scores 71.5%? No way. Something has gotta be broken here.