How AI Text Detection Works (And Why It's Not Perfect)

By Andy · March 14, 2026 · 12 min read

You paste a block of text into an AI detector. It thinks for a moment and delivers its verdict: "97% probability AI-generated." Case closed, right? Not really. That number is far less reliable than it looks, and understanding why requires knowing how these tools actually work under the hood.

AI text detectors have become a growth industry. Schools use them to flag student essays. Publishers use them to screen submissions. Hiring managers run cover letters through them. But the technology behind these tools has fundamental limitations that most users never hear about — limitations that lead to real consequences when the tools get it wrong.

I've spent a lot of time working on both sides of this problem. I built a text humanizer that rewrites AI-generated content to read more naturally, and an AI text auditor that analyzes writing for the patterns detectors look for. Working on these tools taught me exactly how detection works, where it breaks down, and why the arms race between detectors and humanizers isn't going away.

1. How AI Text Detectors Actually Work

Every AI text detector is trying to answer the same question: does this text look like something a language model would produce, or something a human would write? To answer that question, they measure statistical properties of the text and compare those measurements against known patterns.

The basic premise is sound. Language models generate text by predicting the most likely next token, over and over. This process creates patterns — subtle but measurable differences between machine-generated text and human writing. Detectors try to identify those patterns.

There are two main approaches. The first is classifier-based: train a model on thousands of examples of human and AI text, and let it learn to distinguish between them. Tools like GPTZero and Originality.ai use this approach. The second is statistical: measure specific mathematical properties of the text (like perplexity and burstiness) and use thresholds to make a judgment. Some detectors combine both approaches.

Neither approach is as reliable as the confidence scores suggest. To understand why, you need to understand the specific signals these tools are measuring.

2. Perplexity: The Core Signal

Perplexity is the single most important metric in AI text detection. It measures how "surprised" a language model would be by a given piece of text. Low perplexity means the text is predictable — each word follows naturally from the words before it. High perplexity means the text is surprising, with unexpected word choices and unusual constructions.

Here's the key insight: AI-generated text tends to have lower perplexity than human writing. When a language model generates text, it's literally choosing the highest-probability tokens at each step (or close to it). The result is text that flows smoothly and predictably — perhaps too smoothly.

Human writing, by contrast, is messy. People make unexpected word choices. They interrupt their own thoughts. They use colloquialisms that don't follow statistical norms. They write sentences that a language model would consider unlikely but that work perfectly in context. All of this pushes perplexity higher.

Low perplexity (detector flags as AI)

Effective communication is essential in the modern workplace. Clear and concise messaging helps teams collaborate more efficiently and reduces the likelihood of misunderstandings that can lead to costly errors.

Higher perplexity (reads as human)

I've sat through enough meetings where nobody understood the actual point to know that clear communication matters — but "clear" doesn't mean "formal." Sometimes the clearest thing you can say is "wait, that's wrong" instead of crafting a diplomatic paragraph about it.

The first example is perfectly competent writing. Every word is expected. A language model would assign high probability to nearly every token. The second example has personality, interruptions, and unexpected phrasing — exactly the kind of "noise" that detectors associate with human authorship.

But here's the problem: not all human writing is messy. A lawyer drafting a contract produces extremely low-perplexity text because legal language is formulaic by design. A technical writer documenting an API produces predictable prose because clarity demands it. These humans write in patterns that look exactly like AI output — and detectors regularly flag them for it.

3. Burstiness: The Rhythm of Writing

Burstiness measures variation in sentence structure, length, and complexity. Think of it as the rhythm of the writing. Does the text alternate between long and short sentences? Do complex ideas follow simple ones? Are there sudden shifts in tone or register?

Human writing tends to be "bursty." A writer might follow a 40-word analytical sentence with a five-word punch. They might write three careful paragraphs and then drop in a one-line zinger. This variation happens naturally because humans think in uneven patterns — we don't maintain a consistent output rate.

AI-generated text tends to be more uniform. Sentences cluster around a similar length. Paragraph structures repeat. The level of complexity stays consistent from start to finish. Language models optimize for coherence, and coherence often means consistency — which translates to low burstiness.

Detectors measure burstiness by calculating the variance in sentence length, the distribution of clause structures, and the frequency of structural shifts. Text with low burstiness gets flagged as potentially machine-generated.

This is a real signal, but it's easy to game in both directions. A human who writes in a very structured, academic style will produce low-burstiness text that gets flagged. And an AI prompted to "vary your sentence length and mix short punchy sentences with longer analytical ones" will produce artificially high burstiness that slips past detectors.

4. Other Statistical Patterns Detectors Track

Beyond perplexity and burstiness, detectors look at a constellation of smaller signals:

Vocabulary distribution

Language models have characteristic vocabulary habits. They tend toward certain transition words ("furthermore," "additionally," "moreover"), certain hedging phrases ("it's important to note," "it's worth mentioning"), and certain sentence starters ("This is particularly," "One of the key"). I wrote about this in detail in why AI writing sounds like AI — it's one of the most persistent tells. Detectors track the frequency of these signature phrases and flag text that overuses them.

Token probability patterns

Some detectors run the text through their own language model and check whether the actual words match the model's top predictions. If the text consistently uses the highest-probability next token, it scores as more likely machine-generated. This is essentially perplexity measured at the token level rather than the sentence level, but it catches more granular patterns.

Structural consistency

AI text tends to follow predictable structural templates. Paragraphs often start with a topic sentence, develop the point, and close with a transition. Lists use parallel construction. Arguments build in a linear sequence. This structural regularity is something detectors can measure by analyzing paragraph-level patterns.

Punctuation and formatting

Language models have punctuation habits. They use em dashes at certain rates. They rarely use ellipses. They favor semicolons in specific contexts. Some detectors analyze punctuation distributions as an additional signal, though this is a weaker indicator than perplexity or burstiness.

The combined signal: No single metric reliably distinguishes AI text from human text. Detectors work by combining multiple weak signals into a composite score. The problem is that when several weak signals align — as they do with certain human writing styles — the composite score can be confidently wrong.

5. Why Detectors Get It Wrong

AI text detectors fail in both directions: they flag human text as AI-generated (false positives) and miss AI text that's been lightly edited (false negatives). Both failure modes are common, and neither is going away.

False positives: when human text gets flagged

This is the most damaging failure mode because it puts real people in unfair situations. Students accused of cheating on essays they genuinely wrote. Freelancers losing contracts because their work tripped a detector. Job applicants rejected because their cover letter scored as "AI-generated."

False positives cluster around specific populations:

Non-native English speakers. People writing in a second language often produce simpler, more predictable sentence structures — not because they used AI, but because they're working within a more limited vocabulary. Multiple studies have shown that detectors flag non-native writing at significantly higher rates, raising serious fairness concerns.
Technical and academic writers. Formal writing styles with standardized vocabulary and structure look similar to AI output because both prioritize clarity and convention over personality.
Writers trained on "good writing" conventions. If you were taught to write clear topic sentences, use transitions, and maintain parallel structure — congratulations, you write like a language model. Your English teacher's advice now makes your writing trip AI detectors.

False negatives: when AI text slips through

On the other side, it doesn't take much to make AI-generated text undetectable:

Light editing. Changing a few words per paragraph, adding a personal anecdote, or restructuring some sentences is often enough to push the text past detection thresholds. The detector is looking for statistical patterns, and human edits introduce enough noise to disrupt them.
Prompting for style. Telling the language model to "write in a casual, conversational tone" or "vary your sentence length dramatically" changes the statistical properties enough to fool most detectors.
Paraphrasing tools. Running AI text through a humanizer or paraphrasing tool that specifically targets detection signals — adjusting perplexity, varying burstiness, replacing signature vocabulary — is remarkably effective at evading detection.
Translation round-tripping. Translating AI text to another language and back introduces enough variation to break detection patterns.

The short-text problem

Detectors need sufficient text to measure statistical patterns reliably. Short passages — a tweet, a brief email, a one-paragraph response — don't provide enough data points. Most detectors recommend a minimum of 250-300 words, but even at that length, accuracy drops considerably compared to longer documents.

The evolving model problem

Every new generation of language models writes differently from the last. A detector trained to catch GPT-3.5 output may not recognize text from Claude or Gemini. Detectors that were calibrated on 2023-era AI text are increasingly unreliable against 2025-2026 models, which produce more varied and natural-sounding output. Detector companies are playing catch-up with every model release.

6. The Arms Race Between Detectors and Humanizers

There's an ongoing technical arms race between detection tools and the tools designed to evade them, and it mirrors a pattern we've seen in other domains — spam filters vs. spammers, antivirus vs. malware, ad blockers vs. ad tech.

Here's how the cycle works:

Detectors identify statistical signals that distinguish AI text from human text.
Humanizers learn to modify text to eliminate those specific signals — adjusting perplexity, injecting burstiness, replacing signature vocabulary.
Detectors develop new signals to catch humanized text.
Humanizers adapt to evade the new signals.
Repeat.

Each round of this cycle produces marginal improvements on both sides. Detectors get slightly better at catching naive AI output. Humanizers get slightly better at producing undetectable rewrites. But the fundamental asymmetry favors the evasion side: the humanizer only needs to introduce enough variation to fall within the range of natural human writing, while the detector needs to reliably distinguish between "human text that happens to be predictable" and "AI text that's been modified to look unpredictable."

This is essentially the same problem as distinguishing a careful forgery from an authentic painting. The more skilled the forger, the harder the task. And in the case of AI text, the "forger" is another AI that understands exactly what signals the detector is looking for.

Practical implication: If you're evaluating whether your own text might get flagged, try running it through an AI text auditor before submitting it. This lets you see which specific patterns detectors are likely to flag, so you can adjust your writing without fundamentally changing your message.

Watermarking: a different approach

Some AI companies are exploring text watermarking as an alternative to post-hoc detection. Watermarking embeds a statistical signal into the text during generation — for example, subtly biasing which synonyms the model chooses in a way that's invisible to readers but detectable by a scanning tool that knows the watermark pattern.

Watermarking is more theoretically sound than detection because the signal is intentionally placed rather than inferred. But it has practical problems. It only works for text generated by cooperating models. It can be broken by paraphrasing. It raises questions about free expression — should every AI-generated sentence carry a hidden mark? And it requires industry-wide adoption to be useful, which hasn't happened.

7. What This Means in Practice

So where does all this leave us? A few honest conclusions:

Detectors are tools, not oracles. They can identify text that has statistical properties consistent with AI generation. They cannot prove that a specific piece of text was or wasn't written by a human. Any institution treating a detector score as definitive evidence is making a mistake.

The false positive problem is serious. When a detector wrongly accuses a student, freelancer, or job applicant, the consequences are real and often irreversible. The people most likely to be falsely flagged — non-native speakers, formal writers, people following conventional writing advice — are often the least able to defend themselves against the accusation.

Detection accuracy will continue to erode. As language models improve and humanizing tools become more sophisticated, the statistical gap between AI text and human text narrows. The signals detectors rely on today will be weaker tomorrow. This isn't a solvable problem within the current detection paradigm — it's a structural limitation.

The question itself may be wrong. Instead of asking "was this written by AI?", a better question in many contexts is "does this text demonstrate the knowledge and thinking I'm looking for?" An essay that shows genuine engagement with the material matters more than whether a human or machine assembled the sentences. A cover letter that demonstrates relevant experience matters more than its perplexity score.

If you're a writer worried about false flags

Here's practical advice based on how detectors actually work:

Write with voice. Personal anecdotes, opinions, unconventional phrasing, and genuine reactions all increase the perplexity and burstiness of your text in ways that signal human authorship.
Vary your structure. Mix short sentences with long ones. Break conventional paragraph patterns occasionally. Let your writing breathe unevenly.
Avoid the AI vocabulary. If you naturally reach for words like "furthermore" and "additionally" in every transition, consider switching some of them to "but," "also," or "and." Not because these words are bad, but because they happen to be statistical markers that detectors are tuned to catch.
Audit before you submit. Run your text through a detection tool yourself before submitting it to someone who will. If it flags, you can adjust the writing while you still have the chance. My AI text auditor breaks down exactly which patterns are triggering the flag.

If you're an organization using detectors

Never treat a detection score as conclusive evidence. Use it as one input among many. Give people the opportunity to explain and defend their work. And recognize that the technology has known biases — particularly against non-native English speakers and formal writing styles — that can cause real harm if applied blindly.

The technology is useful as a screening signal. It is not reliable enough to be a verdict.

Want to see how your writing scores? Try the AI Text Auditor to analyze your text for detection patterns, or use the AI Humanizer to adjust flagged content. For more on why AI writing has these patterns in the first place, read why AI writing sounds like AI. Or ask me anything in the AI Chat.

— Andy