<![CDATA[Inside the Black Box: Cracking AI and Deep Learning]]>

<![CDATA[Inside the Black Box: Cracking AI and Deep Learning]]>https://arshavir.jellypod.comPowered by Jellypod (https://www.jellypod.com)Fri, 15 May 2026 00:54:10 GMTFri, 31 Oct 2025 17:35:30 GMTyes9330ccd8-4e44-44e7-bbb5-346de7829edcJellypodHow do Large Language Models like ChatGPT work, anyway? (Powered by Jellypod)How do Large Language Models like ChatGPT work, anyway? (Powered by Jellypod)episodicJellypodfeed+f393967d@podcasts.jellypod.comfalse<![CDATA[When Fluent Answers Start Sounding True]]>https://arshavir.jellypod.com/episodes/20c90d86-1e72-49cd-a201-6be6023bb20420c90d86-1e72-49cd-a201-6be6023bb204Sat, 02 May 2026 17:41:57 GMT28JellypodThis episode explores why smooth, coherent language can feel more credible than it is, and how processing fluency, familiarity, and authority cues shape what we believe. It also digs into why conversational AI is especially persuasive, from polished explaThis episode explores why smooth, coherent language can feel more credible than it is, and how processing fluency, familiarity, and authority cues shape what we believe. It also digs into why conversational AI is especially persuasive, from polished explanations to confident-sounding confabulations.false00:15:20full<![CDATA[Why Your Brain Believes the Model]]>https://arshavir.jellypod.com/episodes/002e12f5-14a0-43bd-a3e2-c079d12f0434002e12f5-14a0-43bd-a3e2-c079d12f0434Mon, 27 Apr 2026 01:12:49 GMT27JellypodThe Heuristic Loop You Can't Break from InsideThe Heuristic Loop You Can't Break from Insidefalse00:24:55full<![CDATA[When Polished Answers Feel Finished]]>https://arshavir.jellypod.com/episodes/38b5a072-f1fc-4d41-b249-1e2166aea1d438b5a072-f1fc-4d41-b249-1e2166aea1d4Mon, 20 Apr 2026 18:11:03 GMT26JellypodThis episode explores fluency-as-validity: the way polished AI responses can make us feel like the work of judgment is already done. It also looks at why large language models are so effective at creating the sensation of clarity, and why mechanistic inteThis episode explores fluency-as-validity: the way polished AI responses can make us feel like the work of judgment is already done. It also looks at why large language models are so effective at creating the sensation of clarity, and why mechanistic interpretability may be a way to push back against that enchantment.false00:27:53full<![CDATA[What Seneca Teaches Us that Marcus Couldn't]]>https://arshavir.jellypod.com/episodes/bf3cf380-31ac-49a5-a779-9353db2eb50cbf3cf380-31ac-49a5-a779-9353db2eb50cSun, 12 Apr 2026 04:10:39 GMT25Jellypod716 features fire on both Seneca and Marcus Aurelius but stay dark for ad copy. The model learned Stoic philosophy, not just an author's style. Plus: why 'inert' features aren't all the same thing.716 features fire on both Seneca and Marcus Aurelius but stay dark for ad copy. The model learned Stoic philosophy, not just an author's style. Plus: why 'inert' features aren't all the same thing.false00:17:26full<![CDATA[The Pattern Holds for Another Author]]>https://arshavir.jellypod.com/episodes/3efa98e6-ab9e-4b00-afd1-0bc7a196012c3efa98e6-ab9e-4b00-afd1-0bc7a196012cSat, 04 Apr 2026 01:19:28 GMT24JellypodWe trained a fresh LoRA on the letters of Seneca and ran the same analysis pipeline we used on Marcus Aurelius and advertising copy. Every structural finding replicated. The model organizes its adaptation into five clusters: one tight (features moving in We trained a fresh LoRA on the letters of Seneca and ran the same analysis pipeline we used on Marcus Aurelius and advertising copy. Every structural finding replicated. The model organizes its adaptation into five clusters: one tight (features moving in lockstep) and four loose (features cooperating more independently). Seneca produced the cleanest clustering we've measured and the strongest workhorse cluster, a group of 141 features encoding philosophical argumentation with a causal effect more than three times stronger than anything in Marcus. Done in collaboration with John Holman.false00:15:31full<![CDATA[The Pattern Holds]]>https://arshavir.jellypod.com/episodes/7c312241-afe5-48fe-8186-3956d7004d8c7c312241-afe5-48fe-8186-3956d7004d8cMon, 30 Mar 2026 00:13:15 GMT23JellypodWe replicated our Marcus Aurelius findings at a new layer, then threw the whole method at 12 commercial ad copy styles trained into a single LoRA. The patterns held, and the new domain revealed something we couldn't have seen before: the model organizes iWe replicated our Marcus Aurelius findings at a new layer, then threw the whole method at 12 commercial ad copy styles trained into a single LoRA. The patterns held, and the new domain revealed something we couldn't have seen before: the model organizes its adaptations by register family, not by individual style.false00:18:27full<![CDATA[Cracking Open the Black Box]]>https://arshavir.jellypod.com/episodes/49d3d9a6-3fea-4475-bc3c-4aceb2a5e53649d3d9a6-3fea-4475-bc3c-4aceb2a5e536Sun, 22 Mar 2026 22:26:18 GMT22JellypodWe opened the 65%. The features that resisted interpretation one at a time turned out to organize into five co-activation clusters with clear thematic identities and causal effects nearly ten times stronger than any individual feature. Second in a series We opened the 65%. The features that resisted interpretation one at a time turned out to organize into five co-activation clusters with clear thematic identities and causal effects nearly ten times stronger than any individual feature. Second in a series with John Holman.false00:11:21full<![CDATA[Inside a Fine-Tuned Language Model]]>https://arshavir.jellypod.com/episodes/558f6836-eccc-4e92-80e6-771bc2942b26558f6836-eccc-4e92-80e6-771bc2942b26Thu, 12 Mar 2026 05:06:57 GMT21JellypodA concise, single-segment episode of Inside the Black Box: Cracking AI and Deep Learning where Arshavir Blackwell explains, in one continuous narrative, what neural networks are, how their simple units combine into powerful systems, and how learning by baA concise, single-segment episode of Inside the Black Box: Cracking AI and Deep Learning where Arshavir Blackwell explains, in one continuous narrative, what neural networks are, how their simple units combine into powerful systems, and how learning by backpropagation sculpts their behavior. This short episode is designed as an elegant, one-paragraph-style monologue that introduces listeners to neural nets without equations or jargon.false00:18:50full<![CDATA[What Counts as Structure? From Harris and Elman to Today’s Neural Nets]]>https://arshavir.jellypod.com/episodes/8f6e56c6-19fc-4d98-81fa-81a889a2945c8f6e56c6-19fc-4d98-81fa-81a889a2945cFri, 06 Mar 2026 17:20:04 GMT20JellypodThis episode of Inside the Black Box: Cracking AI and Deep Learning tells the story of an unexpected convergence in the history of language and AI. In 1995, Peter Bensch noticed that Zelig Harris, a mid‑century structural linguist, and Jeff Elman, a pioneThis episode of Inside the Black Box: Cracking AI and Deep Learning tells the story of an unexpected convergence in the history of language and AI. In 1995, Peter Bensch noticed that Zelig Harris, a mid‑century structural linguist, and Jeff Elman, a pioneer of simple recurrent networks, had independently uncovered the same deep insight about language: structure lives in patterns of use.Arshavir Blackwell, PhD, guides listeners through Harris’s world of distributional linguistics and operator grammar—where you infer structure from where words can substitute for one another—and contrasts it with Elman’s tiny recurrent neural networks that learn to predict the next word. Along the way, we see how these very different traditions arrive at the same place: hidden geometric structure in how language is used.From there, the episode bridges to today’s large language models and mechanistic interpretability, asking a deceptively simple question: what counts as "structure" inside a model? We explore how patterns, clusters, and features relate to genuine internal organization, and why Harris and Elman’s convergence still shapes how we think about circuits, features, and the geometry of meaning in modern AI.false00:13:34full<![CDATA[Building a House Without Blueprints: When Interpretability Tools Work — and When They Don’t]]>https://arshavir.jellypod.com/episodes/2647048c-ab9f-48ce-936c-d1069de7b4432647048c-ab9f-48ce-936c-d1069de7b443Fri, 27 Feb 2026 01:14:22 GMT19JellypodThis episode of Inside the Black Box: Cracking AI and Deep Learning explores a new theoretical framework that unifies sparse autoencoders (SAEs), transcoders, and crosscoders — and what it tells us about when mechanistic interpretability actually works. This episode of Inside the Black Box: Cracking AI and Deep Learning explores a new theoretical framework that unifies sparse autoencoders (SAEs), transcoders, and crosscoders — and what it tells us about when mechanistic interpretability actually works. We start by demystifying these tools and how they use sparse features to uncover internal concepts and computations in large language models, from DNA detectors to deception circuits in Claude 3 Sonnet. Then we introduce the linear representation hypothesis and the geometry of concepts as directions in activation space, along with the challenge of superposition when thousands of concepts must fit into limited dimensions. Finally, we dive into Tang et al.’s recovery theorems, the compressed sensing roots of their approach, and why these results matter for using SAEs as a reliable “microscope” on model internals, especially in the context of fine-tuning and LoRA experiments. Along the way, we confront the uncomfortable possibility that the linear picture may break down at frontier scales — and what that would mean for the future of interpretability as a safety strategy.false00:18:04full<![CDATA[I Told My LLM Not to Say "Empower"]]>https://arshavir.jellypod.com/episodes/1a6e0162-1abe-4a56-9c98-c09c55b7200d1a6e0162-1abe-4a56-9c98-c09c55b7200dThu, 19 Feb 2026 21:57:46 GMT18JellypodIn this episode of Inside the Black Box: Cracking AI and Deep Learning, Arshavir Blackwell, PhD, takes engineers and researchers inside the practical mechanics of LoRA, low‑rank adaptation methods that make it possible to fine‑tune multi‑billion‑parameterIn this episode of Inside the Black Box: Cracking AI and Deep Learning, Arshavir Blackwell, PhD, takes engineers and researchers inside the practical mechanics of LoRA, low‑rank adaptation methods that make it possible to fine‑tune multi‑billion‑parameter language models on a single GPU.false00:06:34full<![CDATA[Beyond the Surface of AI Intelligence]]>https://arshavir.jellypod.com/episodes/1321425c-360a-4ed8-bb85-0a690245aba31321425c-360a-4ed8-bb85-0a690245aba3Mon, 09 Feb 2026 14:00:03 GMT17JellypodThis episode dives into why judging AI by behavior alone falls short of proving true intelligence. We explore how insights from mechanistic interpretability and cognitive science reveal what’s really happening inside AI models. Join us as we challenge theThis episode dives into why judging AI by behavior alone falls short of proving true intelligence. We explore how insights from mechanistic interpretability and cognitive science reveal what’s really happening inside AI models. Join us as we challenge the limits of behavioral tests and rethink what intelligence means for future AI.false00:14:43full<![CDATA[Unlocking BERTs Hidden Grammar]]>https://arshavir.jellypod.com/episodes/4d8ebd67-60c2-475f-a425-150ed825ec0e4d8ebd67-60c2-475f-a425-150ed825ec0eTue, 03 Feb 2026 17:44:29 GMT16JellypodExplore how BERT’s attention heads reveal an emergent understanding of language structure without explicit supervision. Discover the role of attention as a form of memory and what it means for the future of AI language models.Explore how BERT’s attention heads reveal an emergent understanding of language structure without explicit supervision. Discover the role of attention as a form of memory and what it means for the future of AI language models.false00:09:06full<![CDATA[Cracking the Code of AI Interpretation]]>https://arshavir.jellypod.com/episodes/5a3647a6-81ef-433b-8389-4feabfbfa9375a3647a6-81ef-433b-8389-4feabfbfa937Wed, 28 Jan 2026 18:55:51 GMT15JellypodDive into how we naturally explain neural networks with folk interpretability and why these simple stories fall short. Discover the journey toward mechanistic understandability in AI and what that means for how we talk about and trust large language modelDive into how we naturally explain neural networks with folk interpretability and why these simple stories fall short. Discover the journey toward mechanistic understandability in AI and what that means for how we talk about and trust large language models.false00:10:11full<![CDATA[Decoding GPTs Hidden Circuits]]>https://arshavir.jellypod.com/episodes/ee48b068-7fb5-4115-a86e-f23d1ad97a18ee48b068-7fb5-4115-a86e-f23d1ad97a18Mon, 26 Jan 2026 00:00:51 GMT14JellypodExplore how sparse autoencoders and transcoders unveil the inner workings of GPT-2 by revealing functional features and computational circuits. Discover breakthrough methods that shift from observing raw network activations to mapping the model's actual cExplore how sparse autoencoders and transcoders unveil the inner workings of GPT-2 by revealing functional features and computational circuits. Discover breakthrough methods that shift from observing raw network activations to mapping the model's actual computation, making AI behavior more interpretable than ever.false00:11:27full<![CDATA[Decoding Attention and Emergence in AI]]>https://arshavir.jellypod.com/episodes/6d50e174-a0b4-4cc5-acb1-b728fcdb72686d50e174-a0b4-4cc5-acb1-b728fcdb7268Wed, 14 Jan 2026 23:47:15 GMT13JellypodExplore how attention heads uncover patterns through learned queries and keys, revealing emergent behaviors shaped by optimization. Dive into parallels with natural selection and psycholinguistics to understand how meaning arises not by design but throughExplore how attention heads uncover patterns through learned queries and keys, revealing emergent behaviors shaped by optimization. Dive into parallels with natural selection and psycholinguistics to understand how meaning arises not by design but through experience in both machines and brains.false00:06:06full<![CDATA[When Knowledge Battles Noise in GPT Models]]>https://arshavir.jellypod.com/episodes/8fe6e7de-512e-48af-8b16-38844953fd328fe6e7de-512e-48af-8b16-38844953fd32Wed, 07 Jan 2026 00:13:17 GMT12JellypodExplore how GPT-2 balances fleeting factual recall with generic responses through internal competition among candidate answers. Discover parallels with human cognition and how larger models navigate indirect recall to reveal hidden knowledge beneath supprExplore how GPT-2 balances fleeting factual recall with generic responses through internal competition among candidate answers. Discover parallels with human cognition and how larger models navigate indirect recall to reveal hidden knowledge beneath suppression.false00:07:24full<![CDATA[Inside Circuits: How Large Language Models Understand]]>https://arshavir.jellypod.com/episodes/61919a54-cdec-4328-bcd8-fc69616baae061919a54-cdec-4328-bcd8-fc69616baae0Thu, 01 Jan 2026 23:13:33 GMT11JellypodDive into the world of neural circuits within large language models. In this episode, Arshavir Blackwell unpacks how transformer circuits, attention mechanisms, and high-dimensional geometry combine to create the magic—and limits—of modern AI language sysDive into the world of neural circuits within large language models. In this episode, Arshavir Blackwell unpacks how transformer circuits, attention mechanisms, and high-dimensional geometry combine to create the magic—and limits—of modern AI language systems.false00:07:41full<![CDATA[Hallucinations, Interpretability, and the Seahorse Mirage]]>https://arshavir.jellypod.com/episodes/7cbe7f65-4454-4d61-9a92-19a5b356dd887cbe7f65-4454-4d61-9a92-19a5b356dd88Mon, 29 Dec 2025 20:38:45 GMT10JellypodThis episode dives into why advanced language models still generate hallucinations, how interpretability tools help us uncover their hidden workings, and what the seahorse emoji teaches us about model and human reasoning. Arshavir connects groundbreaking This episode dives into why advanced language models still generate hallucinations, how interpretability tools help us uncover their hidden workings, and what the seahorse emoji teaches us about model and human reasoning. Arshavir connects groundbreaking research, practical business importance, and the statistical quirks that shape AI's version of 'truth.'false00:09:37full<![CDATA[How Transformers Stack Meaning Like Finnish Words]]>https://arshavir.jellypod.com/episodes/1a01316f-523f-411f-9ffd-35ddcee4f8681a01316f-523f-411f-9ffd-35ddcee4f868Fri, 19 Dec 2025 23:10:46 GMT9JellypodExplore how large language models build up meaning in ways strikingly similar to the layered grammar of Finnish. Arshavir Blackwell reveals why understanding Finnish morphology offers a powerful analogy for interpreting the compositional logic inside modeExplore how large language models build up meaning in ways strikingly similar to the layered grammar of Finnish. Arshavir Blackwell reveals why understanding Finnish morphology offers a powerful analogy for interpreting the compositional logic inside modern AI systems.false00:13:26full<![CDATA[The Mandela Effect in AI: Why Language Models Misremember]]>https://arshavir.jellypod.com/episodes/7b1e371e-3e28-4ffe-bec5-d40ef1653eed7b1e371e-3e28-4ffe-bec5-d40ef1653eedSun, 14 Dec 2025 00:09:07 GMT8JellypodDive into how and why large language models like ChatGPT mirror the human Mandela Effect, reproducing our collective false memories and misquotations. Arshavir Blackwell examines the science behind errors in models and minds, and explores how new techniquDive into how and why large language models like ChatGPT mirror the human Mandela Effect, reproducing our collective false memories and misquotations. Arshavir Blackwell examines the science behind errors in models and minds, and explores how new techniques can counteract these uncanny AI confabulations.false00:11:24full<![CDATA[Bridging Circuits and Concepts in Large Language Models]]>https://arshavir.jellypod.com/episodes/93ac77dc-6732-4bf1-890b-faaa93c4749e93ac77dc-6732-4bf1-890b-faaa93c4749eFri, 05 Dec 2025 19:17:25 GMT7JellypodHow do millions of computations inside large language models add up to something like understanding? This episode explores the latest breakthroughs in mechanistic interpretability, showing how tools like representational geometry, circuit decomposition, aHow do millions of computations inside large language models add up to something like understanding? This episode explores the latest breakthroughs in mechanistic interpretability, showing how tools like representational geometry, circuit decomposition, and compression theory illuminate the missing middle between circuits and meaning. Join Arshavir Blackwell as he opens the black box and challenges what we really mean by 'understanding' in machines.false00:14:51full<![CDATA[How Transformers Turn Words Into Meaning]]>https://arshavir.jellypod.com/episodes/91e214c9-25a3-4eda-832f-703738a7c89391e214c9-25a3-4eda-832f-703738a7c893Fri, 28 Nov 2025 17:37:33 GMT6JellypodEmbark on a step-by-step journey through the inner workings of transformer models like those powering ChatGPT. Arshavir Blackwell breaks down how context, attention, and high-dimensional geometry turn isolated tokens into fluent, meaningful language—reveaEmbark on a step-by-step journey through the inner workings of transformer models like those powering ChatGPT. Arshavir Blackwell breaks down how context, attention, and high-dimensional geometry turn isolated tokens into fluent, meaningful language—revealing the mathematics of understanding inside the black box.false00:06:54full<![CDATA[Can Smaller Language Models Be Smarter?]]>https://arshavir.jellypod.com/episodes/876011f4-cea2-44ed-a573-db4db24bcacd876011f4-cea2-44ed-a573-db4db24bcacdWed, 19 Nov 2025 03:53:28 GMT5JellypodToday we explore whether mechanistic interpretability could hold the key to building leaner, more transparent—and perhaps even smarter—large language models. From knowledge distillation and pruning to low-rank adaptation, we examine cutting-edge strategieToday we explore whether mechanistic interpretability could hold the key to building leaner, more transparent—and perhaps even smarter—large language models. From knowledge distillation and pruning to low-rank adaptation, we examine cutting-edge strategies to make AI models both smaller and more explainable. Join Arshavir as he breaks down the surprising challenges of making models efficient without sacrificing understanding.false00:06:11full<![CDATA[The Weird Geometry That Makes AI Think]]>https://arshavir.jellypod.com/episodes/c7771dee-a476-4e8c-8018-1d6fffc6e18ac7771dee-a476-4e8c-8018-1d6fffc6e18aThu, 13 Nov 2025 09:30:01 GMT4JellypodExplore how large language models use high-dimensional geometry to produce intelligent behavior. We peer into the mathematical wilderness inside transformers, revealing how intuition fails, and meaning emerges.Explore how large language models use high-dimensional geometry to produce intelligent behavior. We peer into the mathematical wilderness inside transformers, revealing how intuition fails, and meaning emerges.falsefull<![CDATA[Can We Fix It?]]>https://arshavir.jellypod.com/episodes/f5282a9f-749f-40d1-892c-96f9dfc05865f5282a9f-749f-40d1-892c-96f9dfc05865Wed, 05 Nov 2025 22:06:00 GMT3JellypodArshavir Blackwell takes you on a journey inside the black box of large language models, showing how cutting-edge methods help researchers identify, understand, and even fix the inner quirks of AI. Through concrete case studies, he demonstrates how interpArshavir Blackwell takes you on a journey inside the black box of large language models, showing how cutting-edge methods help researchers identify, understand, and even fix the inner quirks of AI. Through concrete case studies, he demonstrates how interpretability is evolving from an arcane art to a collaborative science—while revealing the daunting puzzles that remain. This episode unpacks the step-by-step workflow and surprising realities of mechanistically mapping model cognition.falsefull<![CDATA[Using Symbolic AI to Explain LLMs]]>https://arshavir.jellypod.com/episodes/c58493e3-e202-4665-b005-656415c3a79ec58493e3-e202-4665-b005-656415c3a79eSun, 02 Nov 2025 22:02:58 GMT2JellypodDelve into the mysterious world of neural circuits within large language models. We’ll dismantle the jargon, connect these abstract ideas to real examples, and discuss how circuits help bridge the gap between machine learning and human cognition.Delve into the mysterious world of neural circuits within large language models. We’ll dismantle the jargon, connect these abstract ideas to real examples, and discuss how circuits help bridge the gap between machine learning and human cognition.falsefull<![CDATA[Peering Inside the Black Box]]>https://arshavir.jellypod.com/episodes/caa3a9a7-a2fb-4992-9c13-55ab87827c27caa3a9a7-a2fb-4992-9c13-55ab87827c27Sun, 02 Nov 2025 17:15:23 GMT1JellypodMechanistic interpretability and artificial psycholinguistics are transforming our understanding of large language models. In this episode, Arshavir Blackwell explores how probing neural circuits, behavioral tests, and new tools are unraveling the mysteriMechanistic interpretability and artificial psycholinguistics are transforming our understanding of large language models. In this episode, Arshavir Blackwell explores how probing neural circuits, behavioral tests, and new tools are unraveling the mysteries of AI reasoning.falsefull