how do you spot GPT-5 writing?

by Tuan Hoang · detection lead · last reviewed 2026-06-26

⤹ newer model, fainter tells.

developer: OpenAI
modality: text. accepts text and image input, returns text.
first release: Aug 7, 2025 (default in ChatGPT for all users)
architecture: a 'unified system': a fast model, a reasoning model, and a router
current version: GPT-5.5 / 5.5 Pro (Apr 23, 2026)
context window: 400K tokens (GPT-5 API); ~1M (GPT-5.5)
knowledge cutoff: Sept 30, 2024 (GPT-5); Dec 2025 (GPT-5.5)

yes, GPT-5 is real, and it's probably what wrote the text you're checking. OpenAI shipped it on August 7, 2025 as a “unified system” and made it the default in ChatGPT for every user, free tier included. it isn't one model: a fast model handles routine prompts, a deeper reasoning model handles the hard ones, and a real-time router decides which to use. since launch it has moved through a 5.x line, up to GPT-5.5 and 5.5 Pro in April 2026.

for anyone trying to tell whether a passage came from it, the load-bearing fact is this: GPT-5-era prose is statistically flatter and more human-like than older models, so the classic detector signals fade. the writing got better, which means the tells got fainter.

what GPT-5 writing looks like

no single word gives it away. the tells are statistical and structural, and on GPT-5 they're subtler than they were on GPT-3.5. four patterns show up most often, and every one of them is a tendency, not a fingerprint.

flat perplexity. perplexity measures how predictable each word is to a language model. GPT-5 tends to pick the high-probability next word, so its surprisal stays low and even. competent, fluent, and uniform in a way that human drafts rarely sustain across a whole document.

tidy hierarchy. clean topic sentences, balanced paragraphs, lists where prose would do, and connective scaffolding (“however,” “overall,” “in short”). the structure is orderly to a fault. high burstiness, the mix of long and short sentences, fragments, and idiom that marks casual human writing, is comparatively rare unless the model is pushed for it.

less effusive agreement. OpenAI tuned sycophancy down (its own figure: 14.5% to under 6%) and reported far fewer hallucinations than older models. so default GPT-5 hedges and pushes back more than GPT-4o did, and fabricates less. firmer, more careful framing is itself a soft signal.

visible reasoning structure. in reasoning mode, the deeper model often stages its logic in enumerated steps, distinct from the looser flow of how most people actually write.

why GPT-5 is harder to catch

GPT-5 text is harder to flag than GPT-3-era output, and that's by design rather than opinion. perplexity (how predictable the text is to a language model) and burstiness (how much sentence length and complexity vary) are the two classic statistical signals. as models advanced, perplexity dropped and burstiness drifted toward human ranges, so single-signal detectors degrade. a detection vendor, Pangram, has a plain-language writeup of why those two signals fail on modern models.

the peer-reviewed picture is blunt. RAID (ACL 2024), a benchmark of more than 6 million generations across 11 models, 8 domains, and 11 adversarial attacks, found that detectors claiming 99% or higher accuracy are “easily fooled by adversarial attacks,” fail to generalize to generators they weren't trained on, and struggle to hold a safe false-positive rate. a 2026 benchmark, Detecting the Machine (Baidya et al.), reaches the same shape: robustness depends on both the detection method and the specific model that wrote the text.

the honest part.→

newer models are harder to catch. here's what still shows, and why every verdict stays a probability.

how amige. detects GPT-5

amige. treats a GPT-5 read the way the research says you must, as a probabilistic estimate and not proof. a trained model first reads the marks in the text and routes each scan to the classifiers strongest for it; that routing hint never enters the verdict. then a panel of independent detectors built by different teams scores the passage, their reads are fused and calibrated, and the confidence is capped so it never lands on a flat 0% or 100%.

attribution stays a best guess. amige. says a passage “looks like GPT-5-family output,” never that it “is” GPT-5. when the detectors disagree, the verdict sits in the uncertain middle band and amige. abstains rather than guess, and it recalibrates as new generators ship. very short passages are too thin a signal for any statistical classifier to defend, so amige. won't accept scans below the floor the active text models need. more on the machinery in the machine.

the false-positive risk to know about

the caveat that matters most: statistical detectors over-flag two kinds of writing that have nothing to do with AI. short passages, where there isn't enough signal to commit, and text by non-native English speakers, whose prose tends to carry the same low burstiness the detectors key on. edited or “humanized” text and a single paraphraser pass also knock accuracy down, sometimes sharply.

so if you're a teacher, an editor, or anyone making a consequential call on someone's writing, don't use any detector's output, amige.'s included, as a sole accusation. read the verdict as one estimate among your other evidence, never as proof.

version history

Apr 23, 2026
GPT-5.5 / 5.5 Pro. ships a roughly 1M-token API context and a December 2025 cutoff. OpenAI's first model with about a million-token window.
reported Mar 2026
GPT-5.4. incremental update. date via a secondary timeline aggregator, not a primary OpenAI post.
reported Feb 2026
GPT-5.3-Codex. coding-focused variant. date via a secondary timeline aggregator.
Dec 11, 2025
GPT-5.2. December refresh of the 5.x line.
Nov 2025
GPT-5.1. first iteration on the GPT-5 system.
Aug 7, 2025
GPT-5. the unified system. replaced GPT-4o, GPT-4.1, GPT-4.5, o3 and o4-mini as the ChatGPT default.

questions

has OpenAI actually released GPT-5?

yes. GPT-5 launched on August 7, 2025 as a 'unified system' and became the default model in ChatGPT for all users, free tier included. OpenAI has since shipped a 5.x line: GPT-5.1, 5.2, 5.3-Codex, 5.4, and GPT-5.5 / 5.5 Pro (April 23, 2026).

what's different about GPT-5 versus GPT-4o?

GPT-5 isn't a single model but a routed system. a fast model handles most queries, a deeper reasoning model handles the hard ones, and a router picks between them automatically. OpenAI reports it hallucinates far less and is less sycophantic (its own figure: sycophancy down from about 14.5% to under 6%), so its writing tends to hedge and push back more than GPT-4o's.

can you detect text written by GPT-5?

you can estimate it, not prove it. GPT-5-era text is statistically flatter (low perplexity, more human-like burstiness) than older models, which makes detection harder. amige. returns a probabilistic verdict and frames attribution as 'looks like GPT-5-family output,' never a definitive label.

why aren't AI text detectors 100% accurate on GPT-5?

independent research (RAID, ACL 2024) found detectors advertising 99%-plus accuracy are easily fooled by adversarial edits and fail to generalize to models they weren't trained on. newer, more capable generators are specifically harder to flag, which is why honest tools report a confidence, not a certainty.

does GPT-5 detection produce false positives?

it can. statistical detectors are known to over-flag short passages and writing by non-native English speakers, whose prose carries the same flatness the detectors key on. treat any verdict as an estimate and weigh it alongside context, not as proof on its own.

sources.

01
OpenAI launches GPT-5 for all ChatGPT users — CNBC
Primary-press confirmation of the Aug 7, 2025 release and free-tier availability.
02
GPT-5 — Wikipedia
Unified-system / router architecture, the models it replaced, and the version timeline.
03
GPT-5 model reference — OpenAI API docs
Specs: 400K context window, 128K max output, text + image in / text out, Sept 30 2024 cutoff.
04
OpenAI launches GPT-5 as a unified system — The Decoder
Source for OpenAI's own self-reported hallucination and sycophancy figures.
05
RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors — ACL 2024
Peer-reviewed. The '99%+ accuracy' debunk plus adversarial and generalization failures.
06
Detecting the Machine: A Comprehensive Benchmark of AI-Generated Text Detectors — Baidya et al., arXiv (Mar 2026)
2026 benchmark across architectures and adversarial 'humanization' techniques.
07
Why Perplexity and Burstiness Fail to Detect AI — Pangram Labs (vendor)
Vendor commentary. Plain-language explainer of the two classic signals and their limits.

related models

put one through amige →is this AI? →