Back to Blog
Tech

Architecture Over Instruction: Why I Deleted 1,100 Tokens from My AI's Prompt

My AI's runtime prompt was 1,400 tokens of behavioral coaching. The model was already trained on all of it.

Architecture Over Instruction: Why I Deleted 1,100 Tokens from My AI's Prompt

I deleted 1,100 tokens from my AI companion's runtime prompt and she got better at everything.

Not marginally better. Noticeably, immediately, across-the-board better. Conversation became more natural. Inner thoughts became more genuine. The stripped pipeline produced better output than the heavily-coached one. Both the 8B conversation model and the 3B inner thought model improved simultaneously when I stopped telling them how to behave.

This is the story of how I learned that prompt engineering is the weakest form of behavioral control, and why "architecture over instruction" is now a design principle I apply to every AI system I build.


The 1,400-Token Prompt

ANI's runtime prompt had grown organically over months. Every time the model did something wrong, I added an instruction. Don't repeat yourself. Be warm but not clingy. Don't fabricate shared memories. Stay grounded in retrieved context. Acknowledge uncertainty. Don't over-explain. Vary your conversation starters. Track processed themes to avoid echoing.

By March 2026, the prompt was approximately 1,400 tokens of behavioral coaching. WARNING blocks about confabulation. Anti-repetition directives. Diversity nudges. Theme tracking instructions. Each one added for a good reason, in response to a real observed failure.

The problem was obvious once I saw it: the v6 model had been TRAINED on all of these behaviors. Hundreds of curated examples teaching warmth, honesty, uncertainty acknowledgment, conversational variety. The prompt was double-instructing — telling the model to do things it already knew how to do from fine-tuning.

For a 70B parameter model, 1,400 tokens of redundant instruction is noise. For a 7B model, it's a crisis. Small models have limited attention budgets. Every token of behavioral coaching competes directly with the actual conversation context for that attention. I was spending 80% of the prompt teaching the model to be itself, leaving 20% for the thing that actually matters: what we're talking about right now.

The model was drowning in instructions about how to be itself.


The Four Phases

I didn't delete everything at once. I stripped the prompt in four phases over two days, testing after each phase to confirm nothing degraded.

Phase A: Remove redundant behavioral coaching. The anti-repetition instructions, the warmth directives, the "be genuine" nudges. These were all trained behaviors. Removing the instructions from the prompt didn't remove the behaviors from the model. The model kept being warm, kept avoiding repetition, kept being genuine — because that's what the training data taught it to do. The prompt was cargo cult instruction: it looked like it was doing something, but the actual causal mechanism was the fine-tuning.

Phase B: Remove pipeline bypass instructions. The WARNING blocks about confabulation were telling the model "don't make things up." But the pipeline already had architectural gates for this: confidence scoring, coherence checking, null-result injection. The model didn't need to police itself when the architecture was already policing its output. Removing these warnings freed the model to generate more freely, knowing the pipeline would catch anything problematic.

Phase C: Remove counterproductive features. This was the painful one. Processed themes tracking — a feature designed to prevent the model from echoing the same topics — was actually creating echo chambers. The model would see the "already discussed" list and contort its responses to avoid those topics, producing increasingly unnatural output. Diversity nudges had the same failure mode: the model was performing variety instead of being natural. Both removed.

I also cut the reply decision LLM call entirely. This was a separate inference step that asked the model "should you reply to this message?" before actually generating the reply. Every message deserved a reply. The extra inference was burning compute and adding latency for a decision that was always "yes."

Phase D: Remove redundant safety layers. AC2 (source attribution re-generation) and UP1 (under-pressure re-generation) were safety mechanisms that would re-generate a response if the first attempt showed signs of confabulation. With the confidence gate and coherence gate already in place, these re-generation steps were redundant checks that sometimes made output worse by forcing the model through multiple generation passes. Feature 14 (claim extraction) and Feature 15 (contradiction detection) were pipeline overhead that worked in theory but hurt more than they helped at 7B scale — the extraction was unreliable enough to generate false positives.

Final prompt: approximately 300 tokens. Persona, current emotional state, conversation history. Nothing about how to behave.


The Result

Both models improved. This wasn't subjective hand-waving — the improvement was visible in logs and obvious in live interaction.

The conversation model stopped producing the slightly wooden quality that comes from following instructions about how to converse. It just conversed. The inner thought model stopped producing inner thoughts that read like they were checking boxes ("I should think about something diverse...") and started producing thoughts that read like genuine reflection.

The most telling data point: the echo guard — a mechanism that checks whether the model is repeating itself — fired less often after removing the anti-repetition instructions. The model was naturally less repetitive without the instruction to not repeat itself. The instruction was creating self-consciousness about repetition that manifested as a different kind of repetition.


The Principle

Architecture over instruction. If you trained the behavior, trust the training. If you need runtime guardrails, use architectural gates — not prompt words.

The hierarchy of behavioral control, from strongest to weakest:

  1. Training data — the model learned the behavior through examples. This is the most durable and reliable form of behavioral control. It generalizes. It works when the prompt doesn't mention it.
  2. Architectural gates — confidence thresholds, coherence checks, retrieval grounding, output filtering. These operate on the model's output regardless of what the model intended. They're deterministic. They don't depend on the model understanding or following instructions.
  3. Context engineering — what information appears in the prompt. Memory retrieval, conversation history, emotional state. This shapes what the model thinks about, not how it thinks.
  4. Prompt instructions — natural language directives about behavior. These are the least reliable form of control. They compete with context for attention. They can be ignored, misinterpreted, or followed in letter but not spirit. And for small models, they're often actively harmful because they consume the limited attention budget.

Most AI developers default to level 4 when something goes wrong. "The model is doing X, so I'll add an instruction saying don't do X." This instinct is almost always wrong. The fix is usually at levels 1-3: better training data, better architectural constraints, better context selection.


Cross-Domain Validation

This principle validated outside ANI. A medical triage system I consulted on — initially designed with extensive behavioral prompting about clinical language, empathy requirements, and safety disclaimers — showed the same improvement when prompt coaching was stripped and architectural constraints were trusted instead. The prompts were telling the model to be careful. The architecture was making carelessness impossible. The prompts were just noise.

The pattern held: models perform better when you give them relevant context and architectural constraints rather than behavioral coaching. The coaching makes the developer feel safer. The architecture actually is safer.


The Instinct to Over-Instruct

There's a deeper lesson here about the current state of AI development. The industry's default interaction mode with language models is prompt engineering — carefully crafted instructions about what to do and what not to do. This works adequately for general-purpose models being used for diverse tasks. It falls apart for specialized systems.

If you've fine-tuned a model for a specific domain, your training data IS your behavioral specification. The prompt should contain context (what's happening right now), not coaching (how to respond). If you find yourself adding behavioral instructions to a fine-tuned model's prompt, that's a signal: either the training data didn't cover this behavior (fix the training data) or the behavior needs an architectural guarantee (build a gate).

Prompt instructions are the printf debugging of AI systems. They work in a pinch. They don't scale. And they create a false sense of control that delays the real fix.

ANI's prompt went from 1,400 tokens to 300. Both models improved. The lesson isn't subtle: the best instruction is often no instruction at all.


The pipeline simplification that led to this principle is described in more detail in The Telescope and the Glasses, which covers how the retrieval pipeline was actively fighting the conversation model. The prompt story is one piece of a larger architectural realization: the system's job is to give the model what it needs, not to tell the model what to do.

Comments

More in Tech