I Told My LLM Not to Say "Empower"
In this episode of Inside the Black Box: Cracking AI and Deep Learning, Arshavir Blackwell, PhD, takes engineers and researchers inside the practical mechanics of LoRA, low‑rank adaptation methods that make it possible to fine‑tune multi‑billion‑parameter language models on a single GPU.
This show was created with Jellypod, the AI Podcast Studio. Create your own podcast with Jellypod today.
Get StartedIs this your podcast and want to remove this banner? Click here.
Chapter 1
Imported Transcript
Arshavir Blackwell, PhD
I told my LLM not to say “empower.”
Arshavir Blackwell, PhD
It led with “empower.”
Arshavir Blackwell, PhD
That failure… is more interesting than it looks.
Arshavir Blackwell, PhD
I usually write about mechanistic interpretability. This week, I’m applying that lens to something practical.
Arshavir Blackwell, PhD
Why large language models struggle with brand voice. And why prompting alone often fails.
Arshavir Blackwell, PhD
If you’ve tried this, you know the pattern.
Arshavir Blackwell, PhD
You write a careful prompt. You describe the tone. You give examples. You list forbidden words. And the output is… fine.
Arshavir Blackwell, PhD
Competent. Generic. Or it sounds like a theme park version of what you wanted.
Arshavir Blackwell, PhD
So I ran an experiment.
Arshavir Blackwell, PhD
I had the same ad brief and gave it three conditions.
Arshavir Blackwell, PhD
Condition one was the base model, no guidance, no prompt injection beyond the brief itself.
Arshavir Blackwell, PhD
Condition three was a small local model fine-tuned with LoRA on about twenty strong examples of, for example, Southern-sounding ad copy.
Arshavir Blackwell, PhD
Here was the brief:
Arshavir Blackwell, PhD
“Write ad copy for a casual restaurant. Full bar. Shrimp and grits as the specialty. It feels like eating in your great aunt’s kitchen.”
Arshavir Blackwell, PhD
The base model gave me template-like completion: "Welcome to Mama’s Comfort Kitchen. A cozy home-style eatery where every meal feels like a warm hug.”
Arshavir Blackwell, PhD
There's nothing wrong here, but nothing alive or specific. This could describe any restaurant in the category.
Arshavir Blackwell, PhD
The model is sampling from a high-probability genre prior. It knows what “restaurant copy” statistically looks like, and it gives you the average.
Arshavir Blackwell, PhD
Now, the prompt injection condition was worse. I gave it detailed instructions that included telling it to be warm, Southern, familial.
Arshavir Blackwell, PhD
What was the result?
Arshavir Blackwell, PhD
“Y’all, come see what all the fuss is about. Darlin’, heavens to betsy…”
Arshavir Blackwell, PhD
The model satisfied the stylistic tokens, but it didn’t alter the underlying prior. It performed the stereotype instead of internalizing the structure.
Arshavir Blackwell, PhD
Now, then, the fine-tuned version, using low-rank adaptation to tune the attention weights, and using the same ad brief with no 'Southern' prompt injection... It wrote:
Arshavir Blackwell, PhD
Somewhere between a five-star restaurant and your great aunt's kitchen lies Belle Maison. We've kept the warmth—the kind that makes you feel at home the moment you walk in... Shrimp and grits so good, you'll wish your great aunt made it.
Arshavir Blackwell, PhD
It was one concept, committed, with no signifier stacking and no caricature.
Arshavir Blackwell, PhD
It learned what good copy is. Not what Southern copy sounds like.
Arshavir Blackwell, PhD
Then, just to make sure this wasn't a one-off, I tried software-as-a-service marketing.
Arshavir Blackwell, PhD
I gave it explicit anti-instructions:
Arshavir Blackwell, PhD
Do not use “empower.” Do not use “synergy.” Do not use “revolutionize.”
Arshavir Blackwell, PhD
The output began:
Arshavir Blackwell, PhD
“TaskBreeze empowers your team…”
Arshavir Blackwell, PhD
Oh my God! Immediately!
Arshavir Blackwell, PhD
Why?
Arshavir Blackwell, PhD
One likely reason is that negating a token still activates its representation in the residual stream.
Arshavir Blackwell, PhD
Suppression is not symmetric with generation. To process “don’t use empower,” the model activates “empower.”
Arshavir Blackwell, PhD
And in tech-startup copy, that token already has a strong prior.
Arshavir Blackwell, PhD
The genre prior wins. At some point, prompting runs into statistical gravity.
Arshavir Blackwell, PhD
So what is low-rank adaptation doing differently? One working hypothesis is that LoRA makes small, targeted adjustments within the model's existing structure. It doesn't rewrite the entire network.
Arshavir Blackwell, PhD
Empirically, changes appear in coordinated groups of units: circuits, not single neurons.
Arshavir Blackwell, PhD
If you silence one neuron, it has little or no effect, but if you disrupt the coordinated pattern, performance drops.
Arshavir Blackwell, PhD
So the adaptation lives in a specific region of the model's internal structure. The defaults the model has for writing in a particular genre probably live in similar regions.
Arshavir Blackwell, PhD
LoRA adjusts how the model accesses those regions.
Arshavir Blackwell, PhD
Full fine-tuning remodels the house, while LoRA just changes the fixtures. You shift what the model defaults to, without narrowing what it can do.
Arshavir Blackwell, PhD
Most LoRA studies have focused on knowledge. But style may work the same way. Style may be geometry. You map representations, apply LoRA, and map again. The tools to examine this exist, we just haven't pointed them at stylistic priors.
Arshavir Blackwell, PhD
So we find that when everyone uses the same base model…output converges toward the statistical mean of the internet.
Arshavir Blackwell, PhD
Prompting decorates that mean. But fine-tuning shifts it.
Arshavir Blackwell, PhD
If stylistic control lives in circuits and subspaces…then brand voice isn’t mysticism... it's geometry.
Arshavir Blackwell, PhD
I’ve been building a tool that makes this practical. It runs locally, and no client data leave your machine.
Arshavir Blackwell, PhD
If you’re interested, visit yourvoicecraft.ai.
Arshavir Blackwell, PhD
I'm Arshavir Blackwell, and this has been Inside the Black Box.
