When Fluent Answers Start Sounding True
This episode explores why smooth, coherent language can feel more credible than it is, and how processing fluency, familiarity, and authority cues shape what we believe. It also digs into why conversational AI is especially persuasive, from polished explanations to confident-sounding confabulations.
Is this your podcast and want to remove this banner? Click here.
Chapter 1
Imported Transcript
Arshavir Blackwell, PhD
I'm Arshavir Blackwell and this is Inside the Black Box. This is the third piece in the series. The first one documented what fluency-as-validity looks like across a spectrum from naive to sophisticated. The second explained why it works: a reinforcing loop of cognitive heuristics, each pre-conscious, each exploited by features that LLMs produce as a byproduct of their training objective. Processing fluency makes the output feel true. Coherence makes it feel like a real argument. Affect makes it feel good. Source credibility makes you stop checking. And your own confabulation makes you believe you checked anyway.
Arshavir Blackwell, PhD
This piece asks the obvious next question: now what?
Arshavir Blackwell, PhD
The intuitive answer is wrong, so let's get it out of the way.
Arshavir Blackwell, PhD
"Be more critical." "Fact-check the output." "Don't just accept what the model says." This is the advice everyone gives. It sounds reasonable. It misses the entire point of the previous piece.
Arshavir Blackwell, PhD
The heuristics that make fluent text feel true operate pre-consciously. They fire before the decision to check arrives. Telling someone to be more careful about processing fluency is like telling them to not see an optical illusion. You can know about the Müller-Lyer illusion, the one where two lines of equal length look different because of the arrows at their ends, and still see it. Knowing doesn't fix seeing. The perceptual system doesn't take instructions from the conscious mind.
Arshavir Blackwell, PhD
The same applies here. You can read the previous piece, understand every mechanism in it, agree with every point, and still feel the fluency pull the next time you interact with a model. The understanding is real. The immunity is not. Alter and Oppenheimer's finding isn't a fact you can learn and then be done with. It's a description of machinery you can't turn off.
Arshavir Blackwell, PhD
This matters because it changes what a useful response looks like. If the problem were ignorance, if people just didn't know about these biases, then education would fix it. Read about processing fluency, problem solved. But the problem isn't ignorance. The problem is architecture. The heuristics are features of the cognitive system, not bugs in the user. You can't patch them with information.
Arshavir Blackwell, PhD
So the question isn't how to think better. It's how to build processes that don't require you to.
Arshavir Blackwell, PhD
If fluency deactivates checking, the most direct intervention is to remove the fluency. This sounds too simple to work, but the research supports it. Alter, Oppenheimer, and Epley showed that disfluent presentation, harder-to-read fonts, awkward formatting, stripped-down prose, activates System 2, Kahneman's term for the slow, effortful, analytical mode of cognition. The checking machinery turns on when the input is hard to process. It turns off when the input is easy. The switch isn't under conscious control. But the input is.
Arshavir Blackwell, PhD
Practical version: take the model's output and rewrite it in your own words. Not a polished rewrite. A rough, ugly, first-draft rewrite. Strip the formatting. Remove the headers. Kill the bullet points. What you're doing isn't improving the text. You're degrading it, deliberately, so that the disfluency reactivates the checking the fluency suppressed.
Arshavir Blackwell, PhD
If the claim survives the ugly version, it's probably real. If it only felt convincing when it was well-formatted and smoothly phrased, what you were responding to was the formatting, not the content. You're not becoming a better thinker. You're removing the signal that was making you a worse one.
Arshavir Blackwell, PhD
The second piece described Rozenblit and Keil's illusion of explanatory depth: people think they understand complex systems much better than they actually do. The illusion collapses when you try to produce the explanation yourself. Before LLMs, that collapse was the natural corrective. You'd try to explain something, discover you couldn't, and update your self-assessment. LLMs removed the collapse by providing the explanation on demand. The feeling of understanding never gets tested.
Arshavir Blackwell, PhD
Put the test back.
Arshavir Blackwell, PhD
After reading the model's explanation of something, close it. Wait. An hour, a day, ideally longer. Then try to explain it to someone else without looking it up. Or try to use it to solve a problem the model didn't address. If the understanding transfers, if you can apply it in a new context, something real was learned. If it doesn't, if you find yourself reaching for the model's phrasing and coming up empty, what you had was a fluency memory, not comprehension.
Arshavir Blackwell, PhD
The gap between "I could restate what the model said" and "I could use what the model said" is the gap between cached fluency and actual understanding. The generation test finds it.
Arshavir Blackwell, PhD
Nobody will do this every time. That's fine. The point isn't to do it every time. The point is to do it for anything that matters: any claim you're about to act on, any explanation you're about to teach to someone else, any conclusion you're about to build further reasoning on. The high-stakes cases are the ones where the illusion does the most damage.
Arshavir Blackwell, PhD
Here's a strange move. Ask the model to argue against its own answer. You're using the same system that produced the fluent, convincing, heuristic-triggering output to produce a fluent, convincing, heuristic-triggering counterargument. The processing fluency problem isn't solved. Both arguments will feel true. But something important has changed: the coherence signal is broken.
Arshavir Blackwell, PhD
Coherence bias works because the narrative is unopposed. A single, internally consistent story fills the space. There's nothing for the checking mechanism to catch because there's no mismatch, no friction, no competing account. The brain's prediction machinery matches each sentence against the narrative so far, finds consistency, and concludes everything is fine.
Arshavir Blackwell, PhD
Give yourself a second narrative and the heuristic has to arbitrate rather than just accept. You now have two fluent, coherent, plausible accounts that can't both be right. The comfortable certainty of the first answer is disrupted. Not because you've become more skeptical, but because the environment now contains contradiction, and contradiction is exactly the kind of input that reactivates checking.
Arshavir Blackwell, PhD
Ask the model: what's the strongest argument against what you just said? What are you most uncertain about? What would someone who disagreed with you say, and why might they be right? The answers will also be fluent. But they'll be fluent in the opposite direction, and that's enough to prevent the first answer from settling into belief unchallenged.
Arshavir Blackwell, PhD
The limitation is real. You're still inside the fluency loop. Both sides sound good. But at least you're no longer getting a single coherent story and mistaking the absence of opposition for consensus.
Arshavir Blackwell, PhD
The illusory truth effect, Hasher, Goldstein, and Toppino's 1977 finding that repetition makes statements feel true, works because the heuristic doesn't track where you heard something. It tracks processing ease. A claim that's been repeated feels familiar, and familiarity feels like truth. The heuristic doesn't tag the memory with "you heard this from Claude" or "you read this in a textbook" or "a colleague told you." It just registers: easy to process, therefore probably true.
Arshavir Blackwell, PhD
Deliberately tracking provenance is a manual override for a heuristic that doesn't have one built in. When you notice yourself treating something as established knowledge, ask where you first learned it. If the answer is "a model told me and I haven't verified it anywhere else," that's not a reason to disbelieve it. Models get things right all the time. It's a reason to know the claim hasn't been tested yet, that what you're treating as knowledge is really a single-source report from a system whose training objective is fluency, not accuracy.
Arshavir Blackwell, PhD
This is harder than it sounds, because the illusory truth effect actively works against it. After you've seen the same claim restated across three multi-turn conversations, it feels like something you've always known. The feeling of long familiarity is real. The long familiarity is not. You heard it last Tuesday, from the same system, three times.
Arshavir Blackwell, PhD
If individual cognition can't solve this reliably, and the previous piece argued it can't, then the response has to be structural. Not better thinkers, but better workflows.
Arshavir Blackwell, PhD
The insight from mechanistic interpretability applies here by analogy. It works not because it makes researchers smarter but because it replaces the fluency signal with a different kind of evidence, one the heuristics can't exploit. The same principle applies at the organizational level: design processes that don't depend on any individual successfully overriding their own pre-conscious heuristics.
Arshavir Blackwell, PhD
Some of this is obvious in hindsight. Require human-written summaries before acting on model output. This forces the generation test. You can't summarize something in your own words without discovering whether you actually understand it.
Arshavir Blackwell, PhD
Separate the person who generates the model output from the person who evaluates it. This breaks the affect heuristic, because the evaluator doesn't have the positive experience of the interaction that produced the output.
Arshavir Blackwell, PhD
Red-team model recommendations before implementing them. This institutionalizes the adversarial prompting approach.
Arshavir Blackwell, PhD
The deeper version: treat model output as raw material, not finished product. Make that the cultural norm, not the exception. When a colleague presents a recommendation, the first question should be "what did you add to what the model gave you?" not "what did the model say?" The former assumes the human did work. The latter assumes the model did the thinking.
Arshavir Blackwell, PhD
This is the same move the second piece makes about mechanistic interpretability, applied to organizations. The answer isn't individual vigilance. It's structural design that routes around the cognitive vulnerability rather than relying on people to overcome it through willpower.
Arshavir Blackwell, PhD
None of this fixes the pre-conscious heuristics. You'll still feel the fluency pull. The disfluency tool helps but requires discipline nobody naturally has. The generation test works but nobody does it routinely. Adversarial prompting helps but both sides are still fluent. Provenance tracking helps but it's effortful. Institutional design helps but institutions resist friction.
Arshavir Blackwell, PhD
This piece isn't offering a cure. It's offering friction. Deliberate, targeted friction that reintroduces the checking the model's fluency suppressed. Every intervention on this list works by the same mechanism: it takes something the heuristics made feel easy and makes it feel hard again. Not because hard is better, but because hard is what turns the checking back on.
Arshavir Blackwell, PhD
The best defense isn't a better brain. It's a better process.
Arshavir Blackwell, PhD
The arc of this series is simple. The first piece showed what's happening. The second showed why it works. This piece shows what you can do about it. Not by becoming immune to the heuristics, but by designing around them.
Arshavir Blackwell, PhD
The heuristics won. You can't beat them. They're faster than you, older than you, and they operate below the threshold where your conscious mind could intervene even if it wanted to. But you can build systems and habits that don't depend on beating them. You can introduce friction where the model removed it. You can force yourself to produce where the model let you receive. You can track provenance where the heuristic erases it. You can design workflows that assume the vulnerability rather than pretending it can be overcome.
Arshavir Blackwell, PhD
The model sounds like it knows what it's talking about. That is, quite literally, what it was optimized to do. The question was never whether you'd feel convinced. The question is what you do next. I'm Arshavir Blackwell and this has been Inside the Black Box.
