Arshavir Blackwell, PhD

Inside the Black Box: Cracking AI and Deep Learning

TechnologyEducation

Listen

Episodes (18)

In this episode of Inside the Black Box: Cracking AI and Deep Learning, Arshavir Blackwell, PhD, takes engineers and researchers inside the practical mechanics of LoRA, low‑rank adaptation methods that make it possible to fine‑tune multi‑billion‑parameter language models on a single GPU.

This episode dives into why judging AI by behavior alone falls short of proving true intelligence. We explore how insights from mechanistic interpretability and cognitive science reveal what’s really happening inside AI models. Join us as we challenge the limits of behavioral tests and rethink what intelligence means for future AI.

Explore how BERT’s attention heads reveal an emergent understanding of language structure without explicit supervision. Discover the role of attention as a form of memory and what it means for the future of AI language models.

Dive into how we naturally explain neural networks with folk interpretability and why these simple stories fall short. Discover the journey toward mechanistic understandability in AI and what that means for how we talk about and trust large language models.

Explore how sparse autoencoders and transcoders unveil the inner workings of GPT-2 by revealing functional features and computational circuits. Discover breakthrough methods that shift from observing raw network activations to mapping the model's actual computation, making AI behavior more interpretable than ever.

Explore how attention heads uncover patterns through learned queries and keys, revealing emergent behaviors shaped by optimization. Dive into parallels with natural selection and psycholinguistics to understand how meaning arises not by design but through experience in both machines and brains.

Explore how GPT-2 balances fleeting factual recall with generic responses through internal competition among candidate answers. Discover parallels with human cognition and how larger models navigate indirect recall to reveal hidden knowledge beneath suppression.

Dive into the world of neural circuits within large language models. In this episode, Arshavir Blackwell unpacks how transformer circuits, attention mechanisms, and high-dimensional geometry combine to create the magic—and limits—of modern AI language systems.
This episode dives into why advanced language models still generate hallucinations, how interpretability tools help us uncover their hidden workings, and what the seahorse emoji teaches us about model and human reasoning. Arshavir connects groundbreaking research, practical business importance, and the statistical quirks that shape AI's version of 'truth.'
Explore how large language models build up meaning in ways strikingly similar to the layered grammar of Finnish. Arshavir Blackwell reveals why understanding Finnish morphology offers a powerful analogy for interpreting the compositional logic inside modern AI systems.
Dive into how and why large language models like ChatGPT mirror the human Mandela Effect, reproducing our collective false memories and misquotations. Arshavir Blackwell examines the science behind errors in models and minds, and explores how new techniques can counteract these uncanny AI confabulations.
How do millions of computations inside large language models add up to something like understanding? This episode explores the latest breakthroughs in mechanistic interpretability, showing how tools like representational geometry, circuit decomposition, and compression theory illuminate the missing middle between circuits and meaning. Join Arshavir Blackwell as he opens the black box and challenges what we really mean by 'understanding' in machines.
Embark on a step-by-step journey through the inner workings of transformer models like those powering ChatGPT. Arshavir Blackwell breaks down how context, attention, and high-dimensional geometry turn isolated tokens into fluent, meaningful language—revealing the mathematics of understanding inside the black box.
Today we explore whether mechanistic interpretability could hold the key to building leaner, more transparent—and perhaps even smarter—large language models. From knowledge distillation and pruning to low-rank adaptation, we examine cutting-edge strategies to make AI models both smaller and more explainable. Join Arshavir as he breaks down the surprising challenges of making models efficient without sacrificing understanding.
Explore how large language models use high-dimensional geometry to produce intelligent behavior. We peer into the mathematical wilderness inside transformers, revealing how intuition fails, and meaning emerges.
Cover art for Can We Fix It?

Can We Fix It?

Arshavir Blackwell takes you on a journey inside the black box of large language models, showing how cutting-edge methods help researchers identify, understand, and even fix the inner quirks of AI. Through concrete case studies, he demonstrates how interpretability is evolving from an arcane art to a collaborative science—while revealing the daunting puzzles that remain. This episode unpacks the step-by-step workflow and surprising realities of mechanistically mapping model cognition.
Delve into the mysterious world of neural circuits within large language models. We’ll dismantle the jargon, connect these abstract ideas to real examples, and discuss how circuits help bridge the gap between machine learning and human cognition.
Mechanistic interpretability and artificial psycholinguistics are transforming our understanding of large language models. In this episode, Arshavir Blackwell explores how probing neural circuits, behavioral tests, and new tools are unraveling the mysteries of AI reasoning.