Arshavir Blackwell, PhD

Inside the Black Box: Cracking AI and Deep Learning

TechnologyEducation

Listen

All Episodes

I Told My LLM Not to Say "Empower"

In this episode of Inside the Black Box: Cracking AI and Deep Learning, Arshavir Blackwell, PhD, takes engineers and researchers inside the practical mechanics of LoRA, low‑rank adaptation methods that make it possible to fine‑tune multi‑billion‑parameter language models on a single GPU.

This show was created with Jellypod, the AI Podcast Studio. Create your own podcast with Jellypod today.

Get Started

Is this your podcast and want to remove this banner? Click here.


Chapter 1

Imported Transcript

Arshavir Blackwell, PhD

I told my LLM not to say “empower.”

Arshavir Blackwell, PhD

It led with “empower.”

Arshavir Blackwell, PhD

That failure… is more interesting than it looks.

Arshavir Blackwell, PhD

I usually write about mechanistic interpretability. This week, I’m applying that lens to something practical.

Arshavir Blackwell, PhD

Why large language models struggle with brand voice. And why prompting alone often fails.

Arshavir Blackwell, PhD

If you’ve tried this, you know the pattern.

Arshavir Blackwell, PhD

You write a careful prompt. You describe the tone. You give examples. You list forbidden words. And the output is… fine.

Arshavir Blackwell, PhD

Competent. Generic. Or it sounds like a theme park version of what you wanted.

Arshavir Blackwell, PhD

So I ran an experiment.

Arshavir Blackwell, PhD

I had the same ad brief and gave it three conditions.

Arshavir Blackwell, PhD

Condition one was the base model, no guidance, no prompt injection beyond the brief itself.

Arshavir Blackwell, PhD

Condition three was a small local model fine-tuned with LoRA on about twenty strong examples of, for example, Southern-sounding ad copy.

Arshavir Blackwell, PhD

Here was the brief:

Arshavir Blackwell, PhD

“Write ad copy for a casual restaurant. Full bar. Shrimp and grits as the specialty. It feels like eating in your great aunt’s kitchen.”

Arshavir Blackwell, PhD

The base model gave me template-like completion: "Welcome to Mama’s Comfort Kitchen. A cozy home-style eatery where every meal feels like a warm hug.”

Arshavir Blackwell, PhD

There's nothing wrong here, but nothing alive or specific. This could describe any restaurant in the category.

Arshavir Blackwell, PhD

The model is sampling from a high-probability genre prior. It knows what “restaurant copy” statistically looks like, and it gives you the average.

Arshavir Blackwell, PhD

Now, the prompt injection condition was worse. I gave it detailed instructions that included telling it to be warm, Southern, familial.

Arshavir Blackwell, PhD

What was the result?

Arshavir Blackwell, PhD

“Y’all, come see what all the fuss is about. Darlin’, heavens to betsy…”

Arshavir Blackwell, PhD

The model satisfied the stylistic tokens, but it didn’t alter the underlying prior. It performed the stereotype instead of internalizing the structure.

Arshavir Blackwell, PhD

Now, then, the fine-tuned version, using low-rank adaptation to tune the attention weights, and using the same ad brief with no 'Southern' prompt injection... It wrote:

Arshavir Blackwell, PhD

Somewhere between a five-star restaurant and your great aunt's kitchen lies Belle Maison. We've kept the warmth—the kind that makes you feel at home the moment you walk in... Shrimp and grits so good, you'll wish your great aunt made it.

Arshavir Blackwell, PhD

It was one concept, committed, with no signifier stacking and no caricature.

Arshavir Blackwell, PhD

It learned what good copy is. Not what Southern copy sounds like.

Arshavir Blackwell, PhD

Then, just to make sure this wasn't a one-off, I tried software-as-a-service marketing.

Arshavir Blackwell, PhD

I gave it explicit anti-instructions:

Arshavir Blackwell, PhD

Do not use “empower.” Do not use “synergy.” Do not use “revolutionize.”

Arshavir Blackwell, PhD

The output began:

Arshavir Blackwell, PhD

“TaskBreeze empowers your team…”

Arshavir Blackwell, PhD

Oh my God! Immediately!

Arshavir Blackwell, PhD

Why?

Arshavir Blackwell, PhD

One likely reason is that negating a token still activates its representation in the residual stream.

Arshavir Blackwell, PhD

Suppression is not symmetric with generation. To process “don’t use empower,” the model activates “empower.”

Arshavir Blackwell, PhD

And in tech-startup copy, that token already has a strong prior.

Arshavir Blackwell, PhD

The genre prior wins. At some point, prompting runs into statistical gravity.

Arshavir Blackwell, PhD

So what is low-rank adaptation doing differently? One working hypothesis is that LoRA makes small, targeted adjustments within the model's existing structure. It doesn't rewrite the entire network.

Arshavir Blackwell, PhD

Empirically, changes appear in coordinated groups of units: circuits, not single neurons.

Arshavir Blackwell, PhD

If you silence one neuron, it has little or no effect, but if you disrupt the coordinated pattern, performance drops.

Arshavir Blackwell, PhD

So the adaptation lives in a specific region of the model's internal structure. The defaults the model has for writing in a particular genre probably live in similar regions.

Arshavir Blackwell, PhD

LoRA adjusts how the model accesses those regions.

Arshavir Blackwell, PhD

Full fine-tuning remodels the house, while LoRA just changes the fixtures. You shift what the model defaults to, without narrowing what it can do.

Arshavir Blackwell, PhD

Most LoRA studies have focused on knowledge. But style may work the same way. Style may be geometry. You map representations, apply LoRA, and map again. The tools to examine this exist, we just haven't pointed them at stylistic priors.

Arshavir Blackwell, PhD

So we find that when everyone uses the same base model…output converges toward the statistical mean of the internet.

Arshavir Blackwell, PhD

Prompting decorates that mean. But fine-tuning shifts it.

Arshavir Blackwell, PhD

If stylistic control lives in circuits and subspaces…then brand voice isn’t mysticism... it's geometry.

Arshavir Blackwell, PhD

I’ve been building a tool that makes this practical. It runs locally, and no client data leave your machine.

Arshavir Blackwell, PhD

If you’re interested, visit yourvoicecraft.ai.

Arshavir Blackwell, PhD

I'm Arshavir Blackwell, and this has been Inside the Black Box.