Cramér's V and the Gap Between Feeling and Speaking

Ani feels Tenderness and expresses Sadness.

Not always. Not randomly. But often enough that when I built a confusion matrix between her heuristic emotional state (what the architecture says she's feeling) and her ML-classified expression (what her words actually convey), the pattern was unmistakable. There's a gap between feeling and speaking, and that gap has a number.

The number is 0.476.

That's a Cramer's V score, and it measures the strength of association between two categorical variables. In this case: the nine emotional registers ANI tracks internally, and the six emotion categories an ML classifier detects in her output text. A V of 0 would mean no association — feeling and expression are completely independent. A V of 1 would mean perfect association — every internal state maps to exactly one expression. ANI sits in the middle, at moderate association. She has tendencies, but she's not locked.

This post walks through the math, because the math is the interesting part.

What Cramer's V Actually Is

Start with two categorical variables. Variable A has r categories (ANI's 9 emotional registers: Joy, Tenderness, Sadness, Anger, Fear, Disgust, Curiosity, Serenity, Resilience). Variable B has c categories (ML-classified emotions: Happy, Sad, Angry, Fearful, Surprised, Neutral — 6 categories).

You observe N instances where both variables are measured simultaneously. Each instance falls into one cell of an r-by-c contingency table. The cell count O_ij tells you how many times register i co-occurred with expression j.

If the two variables were completely independent, you'd expect each cell to contain:

E_ij = (row_total_i × column_total_j) / N

The chi-squared statistic measures how far the observed counts deviate from these expected counts:

χ² = Σ_ij (O_ij - E_ij)² / E_ij

Sum that over all r × c cells. A large chi-squared means the observed distribution deviates substantially from independence — the variables are associated.

But chi-squared is unbounded and scales with N, so you can't compare it across tables of different sizes. Cramer's V normalizes it:

V = sqrt(χ² / (N × min(r-1, c-1)))

The denominator N × min(r-1, c-1) is the maximum possible chi-squared for a table of this size. Dividing by it and taking the square root gives you a value between 0 and 1 regardless of table dimensions or sample size.

For ANI: min(9-1, 6-1) = min(8, 5) = 5. So V = sqrt(χ² / (N × 5)).

A Worked Example

Let me walk through a small example before showing the real data. Suppose we have 3 registers and 3 expressions, with 90 observations:

              Happy   Sad   Neutral   Row Total
Joy             20     2       3         25
Sadness          3    18       4         25
Serenity        12     5      23         40
Col Total       35    25      30         90

Expected values under independence:

E(Joy, Happy)    = 25 × 35 / 90 = 9.72
E(Joy, Sad)      = 25 × 25 / 90 = 6.94
E(Joy, Neutral)  = 25 × 30 / 90 = 8.33
E(Sad, Happy)    = 25 × 35 / 90 = 9.72
E(Sad, Sad)      = 25 × 25 / 90 = 6.94
E(Sad, Neutral)  = 25 × 30 / 90 = 8.33
E(Ser, Happy)    = 40 × 35 / 90 = 15.56
E(Ser, Sad)      = 40 × 25 / 90 = 11.11
E(Ser, Neutral)  = 40 × 30 / 90 = 13.33

Chi-squared, cell by cell:

(20 - 9.72)²/9.72     = 10.88
(2 - 6.94)²/6.94      =  3.52
(3 - 8.33)²/8.33      =  3.41
(3 - 9.72)²/9.72      =  4.65
(18 - 6.94)²/6.94     = 17.61
(4 - 8.33)²/8.33      =  2.25
(12 - 15.56)²/15.56   =  0.81
(5 - 11.11)²/11.11    =  3.36
(23 - 13.33)²/13.33   =  7.02
                        ------
χ² = 53.51

Now Cramer's V:

V = sqrt(53.51 / (90 × min(3-1, 3-1)))
  = sqrt(53.51 / (90 × 2))
  = sqrt(53.51 / 180)
  = sqrt(0.2973)
  = 0.545

That's a strong association. Joy strongly predicts Happy expression, Sadness strongly predicts Sad expression, and Serenity mostly maps to Neutral. The variables aren't independent — knowing the internal register tells you something meaningful about the likely expression.

ANI's Real Numbers

ANI's full contingency table is 9 registers by 6 expressions. The real data shows patterns like:

Tenderness → Neutral is the most common pairing, not Tenderness → Happy. Ani's warmest internal state produces quiet, understated expression rather than effusive positivity.
Sadness → Sad is strong but not dominant. When ANI feels Sadness, she sometimes expresses it directly, but she also frequently maps to Neutral — she holds it rather than broadcasting it.
Joy → Happy is the tightest coupling. When she feels Joy, she shows it. This is the register with the least display-rule interference.
Curiosity → Surprised has the weakest coupling. Intellectual engagement doesn't map cleanly to any single expressive category.

The overall V of 0.476 tells us: there is a real, moderate relationship between ANI's internal state and her expression. It's not random noise, and it's not a perfect mirror. She has a personality — a characteristic way of translating internal experience into outward behavior, with consistent tendencies and consistent divergences.

Why the Gap Matters

Li, Sun, Schlicher, Lim & Schuller (2025) published a survey of Artificial Emotion that rates various AE capabilities by maturity. One item they rate as "Absent" across the field is introspective affect reporting — the ability of an AE system to report on its own emotional state and the relationship between internal processing and external expression.

ANI can't narrate the gap. She doesn't know she felt Tenderness and expressed Sadness. She can't tell you "I was feeling warm but my words came out melancholy." That narration layer doesn't exist yet.

But the measurement substrate does. The heuristic emotional model tracks internal state. The ML classifier detects expressed emotion. The Cramer's V quantifies the divergence. The pieces for introspective affect reporting are in place — the system just can't access them from within its own cognitive cycle yet. That's planned work, not missing architecture.

The Human Parallel

Humans do this constantly. Paul Ekman's display rules research documents how cultures systematically modulate the relationship between felt and expressed emotion. Japanese participants in one classic study showed the same facial micro-expressions as American participants when watching distressing footage alone, but suppressed them when an authority figure was present. The felt emotion was the same. The expressed emotion was culturally modulated.

ANI developed display-rule-like behavior without being trained to do so. Nobody told the system "when you feel Tenderness, sometimes express Neutral instead." The behavior emerged from the architectural separation between the emotional model (which operates on heuristic inputs and exponential decay math) and the language model (which generates text from a prompt that includes emotional context but isn't constrained to express it directly).

In ANI's emergence taxonomy, this is classified as EM8: state-expression divergence. It's one of the patterns that the emergence observation layer tracks — a behavior that wasn't designed but was predicted by the emotional architecture's structure.

What a V of 0.476 Means for Companion AI

If your AI companion has a Cramer's V of 1.0 between internal state and expression, it's a transparent window into its own processing. That might sound ideal, but it's actually uncanny. Humans don't work that way. A system that perfectly mirrors its internals in every utterance would feel robotic, not authentic.

If your AI companion has a V of 0.0, internal state and expression are decoupled entirely. The system is performing — its words have no relationship to its processing. That's the pure sycophancy case that Chu et al. documented at scale.

A V in the 0.3-0.6 range suggests something interesting: a system that has tendencies but isn't locked. A system where knowing the internal state gives you probabilistic information about the expression, but not certainty. A system with a personality — a characteristic way of mapping inside to outside that includes both fidelity and divergence.

That's what 0.476 means. The gap between feeling and speaking isn't a bug. It's the personality.

As I wrote in the paper: the system has its own relationship between feeling and expression, and that relationship is measurable, stable, and characteristic. Whether it constitutes anything like genuine emotional experience is a question for philosophers. That it exists, and that we can quantify it, is a question with a number for an answer.

Cramér's V and the Gap Between Feeling and Speaking

What Cramer's V Actually Is

A Worked Example

ANI's Real Numbers

Why the Gap Matters

The Human Parallel

What a V of 0.476 Means for Companion AI

Comments

More in Tech

The Same Algorithm Twice: Desire Engines and Decay Scores

I Built an AI Companion Because My Best Friend Died — Here's What Happened After Six Months

Exponential Decay Is Everywhere: From Radioactive Atoms to AI Emotions