Arshavir Blackwell, PhD

Inside the Black Box: Cracking AI and Deep Learning

TechnologyEducation

Listen

All Episodes

Beyond the Surface of AI Intelligence

This episode dives into why judging AI by behavior alone falls short of proving true intelligence. We explore how insights from mechanistic interpretability and cognitive science reveal what’s really happening inside AI models. Join us as we challenge the limits of behavioral tests and rethink what intelligence means for future AI.

This show was created with Jellypod, the AI Podcast Studio. Create your own podcast with Jellypod today.

Get Started

Is this your podcast and want to remove this banner? Click here.


Chapter 1

Introduction

Arshavir Blackwell, PhD

Welcome back to Inside the Black Box. I'm Arshavir Blackwell. My recent conversation with Claude about that Nature paper from Chen, Belkin, Bergen, and Danks entitled "Does AI already have human-level intelligence? The evidence is clear" caused me to shift my previous, somewhat smug, view on the hard problem of consciousness. But we'll get to that later. The paper argues that AGI is already here. They point to GPT-4.5 passing Turing tests 73% of the time (better than humans!), LLMs winning math Olympiad gold medals, proving theorems. The paper also clarifies what AGI is not: perfection, universality, superintelligence, and human similarity. A system need not be perfect, all-knowing, or made of neurons to qualify.

Chapter 2

Objections and Responses

Arshavir Blackwell, PhD

The paper addresses ten objections to LLMs having general intelligence. Here are four. First, they lack agency. But entities like the Oracle of Delphi (or the Guardian of Forever in Star Trek) only respond to queries, and we'd consider those intelligent. Second, no bodies. This is anthropocentric bias. A brain in a vat or a disembodied alien cloud would still be intelligent. Stephen Hawking was paralyzed, speaking through a computer. Motor capability can be separate from general intelligence.

Arshavir Blackwell, PhD

Third, they understand only words. Frontier models are now multi-modal, and language is a powerful tool for compressing knowledge about reality. LLMs can apply this knowledge to non-linguistic tasks like designing scientific experiments. Fourth, they lack world models. By the paper's definition, a world model only requires predicting what would happen if circumstances differed. A frontier LLM understands the difference between dropping a glass on a pillow versus a tile floor. And also note: just because the optimization function is to predict the next word doesn't mean what develops is a mechanism that just predicts the next word. The selection pressure that drove the evolution of the eye doesn't tell us much about the form of the eye. Selection pressure doesn't equal mechanism.

Chapter 3

Turing Test and Behavioral Limits

Arshavir Blackwell, PhD

Now for the core critique. In the 1950s, Alan Turing gave us the Turing Test: if a machine could carry on a conversation indistinguishable from a human's, we'd have to call it intelligent. That idea aligned with behaviorism in psychology. Watson and Skinner didn't care what was going on inside the mind. As long as rats pressed levers and people answered questions, you just measured what you could see. But we've run into the limits of that logic.

Arshavir Blackwell, PhD

The philosopher Ned Block made the key point in 1981, and the Nature paper cites him and yet, puzzlingly, does not engage with his argument. Block's thought experiment: imagine a lookup table large enough to store the correct response to every possible conversational input. It passes the Turing test perfectly. Its behavioral output is indistinguishable from a genuinely intelligent agent. But nobody would call it intelligent. It has no internal process, no abstraction, no generalization. It's pure retrieval.

Arshavir Blackwell, PhD

The logical conclusion: a system can be behaviorally perfect and yet obviously not intelligent. Behavioral evidence alone cannot be sufficient to establish intelligence. Something about the internal process has to matter. Judging intelligence by output alone is like grading a math test by how quickly the answers are filled out, without checking if the work makes sense. The Nature authors cite Block and then proceed as though the problem doesn't exist. They keep stacking behavioral achievements as though enough of them eventually crosses some threshold. Block's point is that no amount of behavioral evidence crosses the threshold, because the threshold isn't behavioral.

Chapter 4

Historical Lessons

Arshavir Blackwell, PhD

There's a history of getting fooled by the surface. ELIZA, the 1960s chatbot, faked understanding by parroting back questions. Clever Hans, the horse who solved math problems, really just picked up on his trainer's cues. Both taught us that you can build a perfect illusion of intelligence with no real understanding underneath.

Arshavir Blackwell, PhD

The paper organizes behavioral evidence into a cascade: Turing-test level, expert level, and superhuman level. Current LLMs have reached the second tier and are approaching the third. The framework sounds rigorous. But the thresholds aren't derived from any theory of what general intelligence requires. They're an after-the-fact ranking of task difficulty that maps onto what LLMs can currently do.

Chapter 5

Evaluating Task Difficulty

Arshavir Blackwell, PhD

Winning at chess is hard for humans, but a chess engine doesn't need to understand chess in any deep sense. It uses a tree. Passing a medical licensing exam sounds impressive, but if the questions are similar enough to training data, pattern matching suffices. Meanwhile, a three-year-old can walk into a room, pick up a toy he's never touched, and figure out how it works. That's easy for humans. It might be the hardest thing to replicate. The cascade assumes harder tasks by human standards require more intelligence. But difficulty for humans and difficulty for LLMs don't track the same underlying abilities.

Chapter 6

Mechanistic Interpretability

Arshavir Blackwell, PhD

This is where mechanistic interpretability becomes relevant. The Nature paper offers one evidential channel: behavioral output. But we can look inside the model. This parallels the history of neuroscience. For years, cognitive scientists relied on observing behavior, reaction times, verbal reports, lever presses, until tools like brain imaging changed things. That development meant weren't just guessing what people were thinking; we could watch circuits activate as someone formed a memory or made a decision. Mechanistic interpretability in AI is like getting our first MRI machines for neural networks, giving us an additional information channel from which to draw hypotheses.

Arshavir Blackwell, PhD

When I identify features in a sparse autoencoder that activate for coherent concepts across diverse contexts, I've found something about the system's internal organization that goes beyond input-output behavior. In previous episodes, we talked about how circuit analysis unpacks things like the greater-than circuit or how attention heads in BERT track grammatical structure. When I map circuits that perform compositional operations, combining representations systematically rather than retrieving pre-stored associations, I'm producing evidence of a fundamentally different kind than "it got the right answer."

Arshavir Blackwell, PhD

This evidence directly addresses Block's challenge. I'm no longer asking only what the system produces. I'm examining how it produces it. And the how matters, because Block demonstrated that the what alone is insufficient. Here's an example. The authors note that humans, like LLMs, confabulate. We have false memories, cognitive biases, perceptual illusions. Therefore, they state, hallucination shouldn't disqualify LLMs from general intelligence.

Arshavir Blackwell, PhD

This does not follow. What we understand about LLM hallucination points to mechanisms tied to the autoregressive generation process that don't map neatly onto human false memory or confabulation. We don't yet have a complete circuit-level account of how LLMs hallucinate, so claiming the mechanisms are fundamentally different would overstate what we know. But we know enough to say the comparison is superficial. This is precisely the kind of question mechanistic interpretability is positioned to resolve, and precisely the kind of question the paper declines to ask.

Chapter 7

Hard Problem and Subjectivity

Arshavir Blackwell, PhD

Can interpretability prove these systems are intelligent or conscious? No. The hard problem is still there. Even a complete map of every circuit, every feature, every computational pathway amounts to an objective description. Whether any objective description entails subjective understanding remains an open philosophical question. Philosopher Thomas Nagel noted in 1974 that even a complete objective description of a bat's neurology wouldn't tell you what it's like to be a bat. Subjective experience has an irreducibly first-person character that third-person description can't capture.

Arshavir Blackwell, PhD

But closer without resolving is where nearly all scientific progress lives. We can't resolve the measurement problem in quantum mechanics either, but we build precise theories around it. The hard problem may be a permanent boundary condition on inquiry rather than a puzzle awaiting solution. Interpretability works on the tractable parts of the problem.

Chapter 8

Cascade of Evidence

Arshavir Blackwell, PhD

What would count as evidence? Here's my rough cascade of features that would increase epistemic warrant for general intelligence, in order from strongest to weakest. First, biological neural substrate. Not required, but its presence provides massive priors from evolutionary and developmental biology. Second, spatial cohesiveness. A unified system rather than distributed retrieval. Are the computations happening in one place, or scattered across data centers?

Arshavir Blackwell, PhD

Third, online processing, learning, memory, recurrence. Can the system update its representations in real time? Does it remember and integrate new information? Fourth, world model. Does the system have internal representations that predict consequences? Not just glass breaks on tile, but novel scenarios like what happens if you drop a glass onto a peanut butter-covered trampoline? Fifth, agency. Can the system initiate action toward goals, not just respond to queries?

Arshavir Blackwell, PhD

Current LLMs have some of these, arguably world models, and lack others: no persistent memory across sessions, no agency beyond the conversation. The Nature paper treats AGI as a binary achieved through behavioral accumulation. This cascade treats it as converging evidence across independent channels, which is how we actually form warranted beliefs about intelligence in every other context.

Chapter 9

Closing Thoughts

Arshavir Blackwell, PhD

Which brings me back to that conversation with Claude I mentioned at the start. I said: I can't even prove another person is conscious, so how can I be expected to make any claims about LLMs? Claude pointed out that I was guilty of what I came to call solipsistic collapse. Saying we can't know the absolute answer doesn't mean we cannot get closer to it. To argue otherwise is black-or-white thinking, like saying because we can't perfectly measure something, a meter might as well be a kilometer.

Arshavir Blackwell, PhD

The underlying point is true: the hard problem is real, you can't prove other minds exist. But solipsistic collapse treats all cases as equally uncertain, when they're not. With other humans, you have convergent evidence across shared biology, neuroimaging, lesion studies, developmental trajectory. With LLMs, you have behavioral output and, increasingly, mechanistic interpretability. Going from a thousand lines of evidence to one isn't a minor quantitative shift. It's an implosion in epistemic warrant.

Arshavir Blackwell, PhD

Applied consistently, solipsistic collapse can't distinguish a human from a thermostat. You can't prove the thermostat isn't conscious either. That's the meter/kilometer analogy: no measurement is perfectly precise, so I can't distinguish a meter from a kilometer. The point about measurement uncertainty is true. The conclusion is absurd. Lack of certainty doesn't mean lack of discrimination. And so Claude changed my mind. I’m Arshavir Blackwell, and this is Inside the Black Box.