The Vocabulary Trap

The internet has documented AI writing's vocabulary tells thoroughly. A 2024 study of 14 million biomedical abstracts found "delves" increased 25× after ChatGPT's release. "Showcasing" up 9×. The patterns are real, measurable, and well-catalogued: delve, tapestry, meticulous, pivotal, nuanced, underscore, vibrant, testament.

So tools emerged to fix this. Paste your text, get flagged words, swap them out. Reasonable theory.

Here's what happens when you test it: vocabulary removal alone scores nearly identical to doing nothing useful.

We ran a blind three-way evaluation across 15 text samples. Three approaches, three human judges, 60-point rubric scoring naturalness and voice. The vocabulary-only approach scored 31/60. The full LLM rewrite — one of the most common humanizer tool approaches — also scored 31/60. An 8-pass process targeting rhythm and structure scored 55/60.

The vocabulary swap and the wholesale rewrite tied at the bottom. Because they both miss the same thing.

What Actually Signals AI to Human Readers

When people say something "sounds like AI," they're responding to patterns they've absorbed without naming. Ask them to explain and they say things like "it's too smooth" or "it feels formal" or "there's no personality." These aren't vocabulary complaints.

Monotonous Sentence Rhythm

Human writing has irregular sentence length — short punches, long elaborations, medium transitions. AI writing left to default produces sentences in a narrow 15-25 word band, one after another, with consistent structure throughout. Readers feel this as flatness without being able to name it.

Human CoV typically 0.5–0.7 · AI CoV typically 0.1–0.3

Transitional Hedging

AI text frequently uses "Additionally," "Furthermore," "Moreover," and "However" not to connect ideas but to signal topic change. This isn't grammatically wrong — it's what a completion engine does when it needs to announce a paragraph shift. Human writers don't narrate their own transitions.

Average 3–5 transition words per 500 words in AI text vs. 0–1 in natural writing

Generic Structural Defaults

AI writing almost always opens with context-setting, moves through elaboration, and closes with summary. This three-part structure repeats at the paragraph level too. It's coherent. It's also the template that makes AI text recognizable across radically different topics.

Context → elaboration → summary appears in ~85% of unrevised AI paragraphs

The Three Approaches, Ranked

Vocabulary removal alone addresses the most visible symptoms and does almost nothing for the underlying structure. A text with varied sentence rhythm and natural connective flow reads as human even if it contains "utilize" and "leverage." The same text with swapped words but dead-flat rhythm still reads as AI.

LLM paraphrase and rewrite replaces AI text with more AI text. The vocabulary changes. The rhythm and structural defaults don't — because you're asking the same kind of model that generated the original text to rewrite it. What varies across LLMs isn't the structural fix, it's their baseline defaults. Some happen to produce more varied prose; most don't.

Pattern-targeted structural revision addresses each signal specifically: vary sentence length deliberately, break transitional clichés, insert genuine uncertainty or point of view where there is none. This is what the 55/60 approach did. It's also the hardest approach to do manually — and the one with the least tool support in the market.

The key finding: The gap between 31/60 and 55/60 is entirely explained by rhythm and structure. Vocabulary cleanup done after structural revision closes the remaining gap by roughly 4 points. Done first, without structural work, it closes approximately 0 points.

What a Fix Actually Looks Like

Here's the same content processed three ways:

Original AI Output

"The integration of AI tools into modern workflows has become increasingly prevalent. Organizations are leveraging these technologies to enhance efficiency and productivity. It is worth noting, however, that adoption rates vary significantly across different sectors. Furthermore, the quality of outputs depends heavily on the prompts and instructions provided to these systems."

CoV: 0.11 · Transitions: 2 · Structure: context → elaboration → qualification → summary

Vocabulary Swap (Most Humanizer Tools)

Score: 31/60 — same tier as original

"The use of AI tools in modern workflows has become increasingly common. Companies are using these technologies to improve efficiency and productivity. It should be noted, however, that adoption rates differ significantly across sectors. In addition, the quality of outputs depends heavily on the prompts and instructions given to these systems."

CoV: 0.11 · Transitions: 2 · Same structure · Vocabulary changed, everything else identical

Structural Revision (8-Pass Process)

Score: 55/60 — genuine improvement

"AI tools are everywhere now. Companies that moved early are two years into workflows that didn't exist before — and they're seeing it in output volume if not always in quality. Adoption isn't even across industries. Manufacturing, legal, and healthcare all have different relationships with these tools, mostly shaped by risk tolerance. Prompts matter more than most new users expect."

CoV: 0.61 · Transitions: 0 · Genuine uncertainty injected · Point of view present

A Practical Fix Process

If you're cleaning up AI text manually, work in this order. Most people do it backwards.

1

Measure CoV First

Before touching anything, check your sentence lengths. If most sentences cluster between 15-25 words with little variation, that's the primary fix target. Shorten some aggressively. Let a few run longer. Aim for a mix that reads uneven when you look at the lengths on paper.

2

Kill Transitional Clichés

Find every "Additionally," "Furthermore," "Moreover," "It is worth noting," "In conclusion." Delete them. If the paragraph stops making sense without the transition signal, that means the relationship between ideas needs to be rewritten — not just announced.

3

Inject One Genuine Uncertainty Per Section

AI text makes confident, totalizing claims. Human writers hedge, qualify, and caveat from experience. Find somewhere in each section to add something like "this varies a lot depending on..." or "at least in our testing..." or "I'd be surprised if this held across all contexts." Specificity matters — vague uncertainty doesn't help.

4

Vocabulary Passes Last

After rhythm and structure are fixed, check vocabulary tells. At this point, you've already done the heavy lifting — vocabulary is genuinely a minor issue compared to what you've already fixed. The words matter, but they're the last 10%, not the first.

Tools That Address the Right Problems

Most free humanizer tools use LLM paraphrase — fast, cheap, and structurally ineffective. Our evaluation data shows this clearly.

Two tools on helloandy.net take a different approach:

Free Tools — No Signup Required

AI Text Auditor

Flags patterns across 28 dimensions — including sentence CoV, transition density, and vocabulary tells. Shows exactly what to fix, not just a score.

AI Text Humanizer

8-pass structural revision: CoV expansion, transition cleanup, passive voice reduction, syntactic diversification, uncertainty injection. The pattern-targeted approach that scored 55/60 in our blind eval.

Neither tool makes AI text magically human in one click. What they do is measure and act on the right signals — rhythm, structure, density — rather than surface vocabulary alone. That's the gap this space has been missing.


Testing methodology: 15 text samples (500-800 words each), three processing approaches, three independent human judges scoring on a 60-point rubric (naturalness, voice, rhythm, structure, authenticity). Judges were not told which approach each sample received. Full results available in the research hub.