blog & research · glossary · AI detector accuracy

are AI detectors accurate?

by Tuan Hoang · detection lead · last reviewed 2026-06-26

⤹ why 99% accuracy claims rarely survive testing.

AI detector accuracy is how often a tool correctly tells AI-generated content from human work, and it is not one number: independent benchmarks show tools advertising 99% drop sharply on unfamiliar or paraphrased text.

the honest answer is that it depends on the content and the test, and no single percentage covers it. the largest peer-reviewed benchmark, RAID (ACL 2024), evaluated a dozen detectors across more than six million generations and found that tools advertising 99% or higher accuracy “are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models.” held to a fixed low false-positive rate, even the strongest systems land somewhere around 73-85% at best, and that figure slides further once the text is deliberately altered.

accuracy is meaningless without a false-positive rate

a detection accuracy number only means something next to the false-positive rate it was measured at. the same tool can look excellent at a 5% false-positive rate and close to useless at 0.5%, because tightening the threshold to avoid wrongly accusing humans also lets more AI text slip through. RAID reports its results at a fixed false-positive rate for exactly this reason. a vendor figure quoted without one tells you nothing about how the detector behaves on real inputs. (more on why that trade-off bites: the false positive rate.)

paraphrasing and humanizers collapse the number

the weakest case for any text detector is text that has been rewritten. a NeurIPS 2023 study showed a paraphrasing model dropped DetectGPT's detection accuracy from 70.3% to 4.6% at a 1% false-positive rate while the meaning stayed intact, and the same trick slipped past watermarking and a commercial classifier. a 2025 follow-up at NeurIPS pushed this harder: an “adversarial paraphrasing” attack cut true-positive rates by an average of about 88% across detectors, and against one popular system by nearly 99%. the “humanizer” tools sold to students run on precisely this gap.

the people most likely to be wrongly flagged are non-native English writers. a 2023 Stanford study published in Patterns ran 91 TOEFL essays by non-native speakers through seven widely used detectors and found, on average, 61.3% were misclassified as AI-generated, against about 5.1% for essays by native-English writers. a fluent yet formulaic style reads as machine-like to these tools, which is why a single flag should never stand in for proof.

why a 99% claim should make you suspicious

regulators have already moved on inflated accuracy claims. in 2025 the U.S. FTC ordered Workado to stop advertising its AI Content Detector as roughly 98% accurate; the FTC's complaint alleged that independent testing put accuracy on general-purpose content at just 53%, because the model had been tuned on academic writing alone. the final order, approved in August 2025, requires competent and reliable evidence behind any accuracy claim. that is the legal reason credible explainers cite third-party studies instead of their own marketing numbers.

tools also vary widely from one another. a 2025 Chicago Booth / Becker Friedman Institute study tested detectors on roughly 2,000 human and 2,000 AI passages across six genres and four frontier models, and found miss rates ranging from near-zero for the strongest tool to somewhere between 10% and 40% for others, with an open-source baseline performing close to random. run the AI text through a humanizer first and miss rates climbed for most of them. a result from one tool on one test set does not generalize to the next.

amige. is built around this evidence rather than against it. every verdict is a probabilistic estimate, not proof, so amige. routes each scan to the detectors strongest for that kind of file, runs a panel of independent detectors built by different teams, and reports a confidence range instead of a bare yes or no. when a model gets named it reads as a best guess (“looks like Midjourney v6”), never a fact, and when the reads conflict the verdict comes back “uncertain.” on text, amige. surfaces the non-native-English false-positive risk instead of hiding it, and recalibrates as new generators land, because the research is clear that yesterday's detector degrades the moment a new model ships. see how the whole pipeline fits together in the machine.

questions

are AI detectors accurate?

it depends on the content and the test. in peer-reviewed benchmarking (RAID, ACL 2024), detectors that advertise 99% or higher accuracy drop substantially on content they weren’t trained for or on text that has been deliberately altered. a single accuracy number is misleading, because it shifts with content type, with how the text was produced, and with the false-positive threshold chosen. treat any verdict, amige.’s included, as a probabilistic estimate rather than proof.

can an AI detector flag human writing as AI?

yes. a 2023 Stanford study in Patterns found that, on average, 61.3% of TOEFL essays by non-native English speakers were misclassified as AI-generated by seven popular detectors, against about 5.1% for native-speaker writing. that is why a single flag should never be treated as proof, especially for non-native English writers.

can you fool an AI detector by paraphrasing?

often, yes. a NeurIPS 2023 study showed a paraphrasing tool dropped one detector’s accuracy from 70.3% to 4.6% at a 1% false-positive rate without changing the meaning, and a NeurIPS 2025 adversarial paraphrasing attack cut true-positive rates by an average of about 88% across detectors. the “humanizer” tools sold online exploit exactly this weakness.

why shouldn’t I trust a “99% accurate” detector claim?

because regulators have acted on inflated claims. in 2025 the U.S. FTC ordered Workado to stop advertising its detector as roughly 98% accurate after its complaint alleged that independent testing showed 53% accuracy on general-purpose content. the FTC now requires competent, reliable evidence before a detector makes an accuracy claim, which is why credible explainers cite third-party studies rather than their own marketing numbers.

do all AI detectors perform the same?

no. a 2025 Chicago Booth / Becker Friedman Institute study found wide variance across tools and content genres: miss rates ran from near-zero for the strongest tool to roughly 10-40% for others, with an open-source baseline near random. performance also shifted depending on whether the AI text was run through humanizer tools, so results from one tool or test set don’t carry over to the next.

sources.

01
Dugan et al. — RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors (ACL 2024)
Primary peer-reviewed benchmark; source for the 99%-claimed-but-easily-fooled finding.
02
Liang et al. — GPT detectors are biased against non-native English writers (Patterns / Cell Press, 2023)
61.3% of non-native TOEFL essays misclassified vs ~5.1% for native writers.
03
FTC — Order Requires Workado to Back Up AI Detection Claims (April 2025)
~98% advertised vs an alleged 53% on general-purpose content; final order August 2025.
04
Krishna et al. — Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense (NeurIPS 2023)
DIPPER dropped DetectGPT from 70.3% to 4.6% at a 1% false-positive rate.
05
Cheng et al. — Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text (NeurIPS 2025)
average ~88% true-positive-rate drop across detectors at a 1% false-positive rate.
06
Jabarian & Imas — Artificial Writing and Automated Detection (Chicago Booth / BFI Working Paper 2025-116)
independent test across six genres and four frontier models; wide variance between tools.

related terms

put one through amige →is this AI? →