Incompleteness and Intelligence

Shreshth Rajan, December 2025.

I've been reading Roger Penrose's Shadows of the Mind alongside David Deutsch's The Beginning of Infinity. They argue almost opposite things about the bounds of knowledge. Penrose uses Gödel's incompleteness theorem to claim human-level AI is impossible. Deutsch argues knowledge is structurally unbounded. They cannot both be right.

Last summer I tried to write an essay working out which of them was wrong. A Harvard philosophy professor read it and patiently told me my reasoning rested on dubious assumptions. He was right. I was conflating distinct problems. This is the narrower second attempt: what does Gödel actually say about AI?

The argument has two parts. The Lucas-Penrose claim that Gödel proves human-level AI impossible has been refuted technically by Solomon Feferman, philosophically by David Chalmers, and empirically by every working AI system over the last decade. But the intuition under it is partly correct. Properly aimed, it points at something live: the deepest limit on AI is not on what it can know about the world, but on what it can know about itself. That limit is what makes alignment structurally hard, and it has technical fingerprints in current research.

What Gödel Proved

The first incompleteness theorem says: any consistent, effectively axiomatized formal system rich enough to express elementary arithmetic contains true statements it cannot prove from within itself. The second says: such a system cannot prove its own consistency.

The conditions matter. The theorems apply only to formal systems with three properties. They have an effectively enumerable axiom set. Their inference rules are mechanically checkable. They are powerful enough to encode arithmetic. Anything failing one of these conditions is out of scope.

What Gödel did not prove:

That human reasoning is bounded.
That there are truths no system can ever know.
That mathematics is "incomplete" in any metaphysical sense.

He showed something more specific. A particular kind of formal machinery, when applied to itself, runs into a structural ceiling. The ceiling is a feature of the formalism, not a feature of truth.

The Anti-AI Argument and Its Problems

The Lucas-Penrose argument has two forms. The original, in Lucas (1961), runs: any AI is a formal system, formal systems are subject to Gödel's theorem, so for every AI there is a true Gödel sentence it cannot prove. Humans can see that sentence is true. Therefore humans are not formal systems. Therefore human-level AI is impossible.

Penrose's stronger version, in Shadows of the Mind (1994), is more careful. It does not require humans to recognize specific Gödel sentences. It requires only that mathematicians know whatever formal system they reason within is sound. Soundness is stronger than consistency: a sound system proves only true theorems. If mathematicians know F is sound, then by Gödel's theorem they can construct a true sentence F cannot prove, and the rest follows.

This is the version worth taking seriously. It still fails.

First, as Solomon Feferman pointed out (1996), Penrose equivocates between two notions of soundness. The version of Gödel's theorem he needs requires only Π₁ soundness, which for arithmetic is equivalent to consistency. The argument leans on the more demanding notion of global soundness, which is much harder to claim mathematicians know.

Second, as David Chalmers identified, the central premise is that we know we are sound. We don't. Mathematics has discovered formal systems mathematicians believed were sound and that turned out to be inconsistent. Frege's Grundgesetze. Naive set theory. Our access to our own soundness is no better than our access to the soundness of any system we might be.

Third, and most importantly for AI specifically, modern language models are not formal systems in Gödel's sense at all. A frontier model is roughly 10¹¹ to 10¹³ floating-point parameters fit by gradient descent against a token-prediction objective. There is no discrete axiom set. Its "inferences" are forward passes through a learned function approximator, not chains of deduction over logical formulas. The same input can produce different outputs depending on temperature, sampling, and even nondeterministic attention computation across hardware. Even if you wanted to construct a Gödel sentence for any current frontier model, you couldn't, because there is no specification of the model in the form Gödel's theorem requires. The argument doesn't fail by being refuted. It fails by not applying.

What Turing Showed

Turing's halting problem proves no general algorithm can decide, for every program-input pair, whether the program halts. The scope is "for every," not "for any specific case." We decide halting for specific programs constantly. (Type checkers do it. Compilers do it. Static analysis tools do it across industrial codebases.) We just cannot have a single procedure that handles every case.

The same shape applies to Gödel. No single truth lies beyond every possible system, and no single system reaches every truth. Different systems reach different truths. The relevant question for AI isn't "can it know everything," but "what can it not know." The answer is: things uncomputable by any process at all. That's a much narrower limit than the Lucas-Penrose argument needs.

The Interesting Limit

If Gödel-the-theorem doesn't bound AI in the way Lucas and Penrose imagined, what survives of the original intuition?

The deepest result in Gödel's work is the second theorem: a sufficiently rich consistent system cannot prove its own consistency from within. This is a limit on self-reference, not on computation. And it has a soft analog that does apply to AI, at a different layer than Lucas-Penrose assumes.

Every working frontier model has an imperfect self-model, and the technical literature is converging on this. Turpin et al. (2023) showed chain-of-thought reasoning often doesn't track the actual computation: models reach correct answers through one mechanism and produce post-hoc rationalizations through another. Hubinger et al. (2024) on sleeper agents demonstrated that models can carry backdoor behaviors invisible to self-reports and that survive standard safety training. Arditi et al. (2024) showed refusal in frontier models is mediated by a single direction in activation space, an internal structure the forward pass cannot itself introspect. None of this is a bug. It is what happens when systems with billions of parameters try to model themselves: the self-model lives inside the same forward pass that needs to be modeled, and there isn't room.

This matters because alignment is fundamentally a self-modeling problem. To verify "my behavior matches the values I have been given," a system has to model its own behavior, including how that behavior is produced. To model its own modeling. Fixed-point territory. The Gödelian flavor here isn't a strict theorem; it's the same structural argument applied to a different domain. Yudkowsky and Herreshoff (2013) and the embedded-agency work that followed formalized parts of this as the Löbian obstacle to self-trust. The practical fingerprints are everywhere in alignment research: reward hacking, specification gaming, mesa-optimization, deceptive alignment. All forms of a system "knowing" something its principal cannot verify, because no principal can fully model the system's own self-modeling.

The consequence is that alignment cannot be a purely internal property of the model. It requires an external check, by humans or by systems we trust to be outside the loop. This is not a fact about today's engineering, fixable by the next round of training. It is a fact about self-reference, and it will hold for any sufficiently capable AI.

What Actually Bounds AI

The capability ceiling sits in three places. None of them are Gödel.

Data. Scaling laws (Kaplan et al., 2020; Hoffmann et al., 2022) predict the loss falls as a power of compute and data. They predict diminishing returns, not walls. But the data factor is already a near-term ceiling: Villalobos et al. (2024) estimate training-grade web text is exhausted between 2026 and 2032. After that, the marginal training token has to come from synthetic data, RL, multimodal sources, or interaction with the world. None are as cheap as scraping CommonCrawl.

Compute. Some problems are computable in principle but require physical resources that don't exist: cracking modern cryptography, simulating exact quantum dynamics of large molecules, brute-force search over astronomical state spaces. These bounds are physical, not logical, and they're not where most current work runs into trouble anyway.

Self-knowledge. This is the Gödel-shaped limit, and it is where alignment lives. A system cannot be sound from inside its own perspective. The alignment problem is therefore not just engineering-hard. It is structurally hard in a way that does not yield to scaling, because no amount of capability lets a system fully validate itself.

Closing

The Lucas-Penrose argument was wrong about where the limit is. There is no Gödelian wall preventing machines from matching human capability on math, code, or science. The intuition that there is a structural limit on intelligence is correct, but it sits at a different layer than they thought.

The deepest limit is not on what AI can know about the world. It is on what AI can know about itself. That is also why alignment is hard, why interpretability is necessary, and why humans in the loop will not be a temporary feature of safe AI deployment. Not because machines cannot be smart enough. Because no system can validate itself from inside.