The Weird Geometry That Makes AI Think
This show was created with Jellypod, the AI Podcast Studio. Create your own podcast with Jellypod today.
Get StartedIs this your podcast and want to remove this banner? Click here.
Chapter 1
Imported Transcript
Arshavir Blackwell, PhD
Welcome to Inside the Black Box. I’m Arshavir Blackwell, and today we continue our exploration of that persistent question: how do large language models actually think? Last time, we examined debugging and circuit-mapping. This episode goes one step further—into the strange geometry that underlies these models.
Arshavir Blackwell, PhD
Inside a transformer—systems like ChatGPT or Claude—information doesn’t live in the familiar three-dimensional world of our senses. It unfolds in vector spaces with thousands, even tens of thousands, of dimensions. Ordinary intuition fails here. It’s an alien landscape—one where human reasoning loses its footing, and yet where these models find theirs.
Arshavir Blackwell, PhD
In such spaces, two random directions are almost perfectly orthogonal—nearly at right angles. Each random vector has its own direction, with almost no overlap. Imagine a crowded stadium in which every person somehow has complete personal space.
Arshavir Blackwell, PhD
But the vectors in a trained neural network aren’t random. They’re shaped by learning, allowing the model to represent similarity and meaning. Related concepts overlap just enough for the model to capture relationships. That overlap is how the model works.
Arshavir Blackwell, PhD
Almost all of the volume in these high-dimensional spaces lies near the surface, not the center. In five thousand dimensions, more than 99.999% of the volume of a hypersphere is concentrated in a thin shell at the edge. The computations in language models happen out there—where mathematical intuition breaks down.
Arshavir Blackwell, PhD
Modern large language models have residual streams—the superhighways information travels on. These are thousands of dimensions wide—roughly ten to twelve thousand. You might think that means one feature per dimension, neatly separated. But that’s far from true.
Arshavir Blackwell, PhD
Through superposition, models pack far more features than dimensions allow. Features overlap, share space, and sometimes interfere—like an overbooked hotel where multiple guests share rooms. The model constantly disentangles these signals as it thinks.
Arshavir Blackwell, PhD
And this is where geometry gets tricky. Both direction and magnitude matter. Cosine similarity tells us whether two vectors point the same way—often a clue to semantic similarity—while the length of a vector can encode confidence or importance. In high dimensions, distances themselves start to blur: random points all lie roughly the same distance apart. The model has to reason through this concentrated geometry.
Arshavir Blackwell, PhD
So the model is juggling overlapping features, interference, direction, magnitude, and the odd behavior of distance in high-dimensional space. Understanding this geometry is key to reverse-engineering what’s happening inside.
Arshavir Blackwell, PhD
This isn’t just academic. It matters for AI safety and control. If we can’t interpret what happens inside these networks, we can’t reliably steer them. Adversarial attacks prove the point: tiny changes—imperceptible to humans—can send the model in completely different directions. Those vulnerabilities live in dimensions we can’t see.
Arshavir Blackwell, PhD
Alignment has geometric roots too. Recent research on steering vectors shows that we can influence model behavior by nudging activation space itself—making models more truthful or consistent. But to steer well, we first have to understand the terrain.
Arshavir Blackwell, PhD
Let’s make this concrete. Imagine the model processing “The cat sat on the…” When cat enters, it’s already a vector with thousands of coordinates. Its meaning isn’t stored in discrete slots like “animal = dimension 47.” The features—animal, pet, furry, domesticated—are spread across many dimensions in overlapping patterns. As that vector moves through the network, each layer transforms it.
Arshavir Blackwell, PhD
Attention mechanisms look at context—other words nearby—and mix information accordingly. Feed-forward layers reshape the vector itself, adjusting the weight of each feature. Every transformation rotates, stretches, or repositions patterns in this high-dimensional space. The model learned these transformations from training, but at inference time, it’s just geometry in motion.
Arshavir Blackwell, PhD
Now take the word sat. The model needs to know what earlier tokens matter. The representation of sat forms a query vector, asking a question. Each previous token carries a key vector, a possible answer. The model compares them by measuring how much they align. The key from cat scores high—not because of grammar rules, but because the model learned that subjects and verbs share this geometric relationship. It’s pattern matching in vector space.
Arshavir Blackwell, PhD
Different layers specialize in different things. Early layers tend to capture syntax—who’s the subject, what’s the verb. Middle layers encode relationships: who’s doing what to whom. Later layers handle meaning and prediction. By the time we reach the final the, the representation has entered an attractor region—a zone where likely completions like “mat,” “couch,” or “rug” cluster together. The geometry does the work.
Arshavir Blackwell, PhD
This is where mechanistic interpretability—MI—comes in. MI is about reverse-engineering these circuits. Researchers trace which components activate for which features, mapping specific computations to specific mechanisms. Take induction heads—discovered by Anthropic’s interpretability team. These are small circuits that detect repeated patterns. If the model sees “A B … A,” the induction head learns to predict “B.” It’s a clear, mechanical behavior.
Arshavir Blackwell, PhD
More recently, sparse autoencoders have helped us unpack these entangled representations. Instead of each neuron doing ten things at once, SAEs reveal directions that correspond to individual concepts. It’s like putting on glasses that let us see structure that was always there, just hidden.
Arshavir Blackwell, PhD
And this connects to what we call LLM-ology—studying models empirically, almost like cognitive psychology. MI shows us the wiring; LLM-ology shows the behavior. Together, they reveal how meaning moves through the network.
Arshavir Blackwell, PhD
When a model processes a sentence, its internal representation literally travels through high-dimensional space. Sentences with numbers, pronouns, or causal words often trace similar paths. The model has computational tendencies—habits of motion we’re only beginning to chart.
Arshavir Blackwell, PhD
We can locate some structure with sparse autoencoders, but much remains hidden—subtle signals like irony, humor, or moral tone may live in remote corners of this space. This is where cognitive science meets geometry, and where interpretability meets mystery.
Arshavir Blackwell, PhD
Taken together, MI reveals the static wiring, while LLM-ology follows the dynamics—the motion of meaning through space. Transformers don’t store rules; they sculpt and navigate geometry.
Arshavir Blackwell, PhD
And that geometry, as powerful as it is, brings fragility. Most points in these vast spaces are meaningless noise. The meaningful regions—the semantic manifolds—occupy only a thin sliver of the hypersphere. That’s the curse of dimensionality: the expressiveness that makes these models so capable also makes them precarious.
Arshavir Blackwell, PhD
Even visualization struggles here. When we project thousands of dimensions down to two or three—using PCA or t-SNE—the true relationships blur. The result is an aesthetic map, not a faithful one. Even with modern tools, our view of these models is still partial—a sketch of a landscape we can never fully see.
Arshavir Blackwell, PhD
And yet, we keep mapping. Every new method—mechanistic interpretability, sparse autoencoders, the emerging science of LLM-ology—takes us a step closer to understanding how these systems think. I’m Arshavir Blackwell, and this has been Inside the Black Box.
