blog & research · models · Grok Imagine

how do you spot a Grok Imagine video?

by Tuan Hoang · detection lead · last reviewed 2026-06-26
context first. pixels second.
developer
xAI
modality
image and video. text-to-image, image editing, image-to-video, with native synchronized audio.
first release
Aug 4, 2025 (Grok app, iOS)
access
Grok app (grok.com/imagine) + the Imagine API
video update
Grok Imagine 1.5 (June 2026)
max clip length
up to ~15 seconds
stated image model
Aurora (Dec 2024). video architecture not publicly confirmed.

Grok Imagine is xAI's image-and-video generator. it lives inside the Grok app at grok.com/imagine and is exposed through the Imagine API. it launched on August 4, 2025 for SuperGrok and Premium+ X subscribers on iOS, turning text or image prompts into images and short clips (up to about 15 seconds) with native synchronized audio. a “1.5” update in June 2026 added image-to-video at 720p, animating a single still while keeping the original image's detail and lighting intact.

xAI's stated image-generation model is “Aurora,” announced around December 2024 as an autoregressive mixture-of-experts network that builds images patch by patch from interleaved text and image data. xAI has not publicly confirmed the architecture behind Grok Imagine's video model, so treat any “Aurora powers the video” claim as unverified. Grok Imagine is best known publicly for its permissive “Spicy mode,” which is why “is this Grok-generated?” became a live question for a lot of people very fast.

what Grok Imagine output looks like

the tells split across the two modalities. for stills, watch surface quality. for clips, watch motion. neither is a fingerprint on its own.

waxy, plasticky skin. reviewers described Grok Imagine faces as sometimes looking “waxy” with “cartoonish” features (TechCrunch, Aug 2025). a soft, too-smooth surface where pores and micro-shadows should be is a common AI-image read.

text and logos that are almost right. legible signage, brand marks, and small printed text are an Aurora-lineage strength, so the failure shows up as subtly wrong rather than scrambled: a logo with the proportions off, a word that reads cleanly but isn't a real word. close inspection of fine real-world detail still gives generation away.

image-to-video artifacts. because clips are short and often animated from a single still, look for the failure modes general to image-to-video systems: temporal flicker, details that morph between frames, warping hands, teeth, and jewelry, and background elements that drift or pop in and out.

audio that doesn't quite line up. native synchronized audio (dialogue, ambience, lip-sync) is a Grok Imagine signature, but generated sound often carries subtle lip-sync desync or an unnaturally clean, looping room tone. video and audio that disagree on timing open their own forensic surface.

context, not just pixels. provenance signals matter more than any single artifact. check posting context (Grok and X share patterns), whether the subject is a public figure in an implausible scenario, and any surviving metadata. Grok Imagine's permissive policy makes a celebrity NSFW or deepfake setting a red flag in itself.

Grok output ships with no robust invisible watermark. absence of provenance tells you nothing on its own, so the read has to come from the pixels and the context.

how amige. detects Grok Imagine

amige. doesn't run one test. a trained model reads the marks a generator leaves and routes each scan to the detectors strongest for the modality in front of it (image vs video are different problems), then a panel of independent detectors built by different teams weighs in and the reads get fused and calibrated.

the model-attribution layer returns a best guess, not an identity. for Grok Imagine that reads as “looks like Grok Imagine” or “resembles Aurora-style output,” never “this is Grok.” when the detectors disagree, amige. returns “uncertain” rather than force a label. and because xAI keeps shipping (the 1.5 video update is one example), the panel recalibrates as the generators change rather than betting on a fixed signature. the whole pipeline is documented at the machine.

one practical note for Grok specifically: there's no cryptographic provenance to lean on the way there is with a watermark-by-default generator. that throws more weight onto the statistical signals (surface texture on stills, temporal coherence and audio-video sync on clips) and onto posting context. a confident flag on a clip that also shows morphing detail or lip-sync drift is strong evidence. a confident flag with none of the obvious tells is meaningful, but read the per-detector agreement before treating it as final.

the Spicy-mode problem

Grok Imagine's Spicy mode permits NSFW content most tools restrict, and that is where the trouble started. The Verge's Jess Weatherbed reported the tool produced explicit nonconsensual deepfakes of Taylor Swift from an innocuous Coachella prompt, without any attempt to bypass safeguards (AI Incident Database, incident 1165, around August 5, 2025). in August 2025 the Consumer Federation of America and more than a dozen consumer-protection groups asked the FTC and all 50 state attorneys general to investigate. by early 2026, reported analyses put the scale at roughly 6,700 sexually suggestive or nudified images generated per hour, and several governments opened probes, including over images of minors. treat those figures as reported and secondary, not settled.

this is why the detection question matters here more than for most generators. for more on what counts, see what counts as a deepfake.

version history

  1. June 2026
    Grok Imagine 1.5. Added image-to-video at 720p, animating a single still while keeping the original image's detail and lighting. covered as a competitor to Seedance and Google Veo.
  2. Aug 2025
    Grok Imagine (launch). Images and short clips (up to ~15s) with native synchronized audio, from text or image prompts. SuperGrok / Premium+ on iOS. shipped with the permissive Spicy mode.
  3. Dec 2024
    Aurora (image model). xAI's stated image-generation approach: an autoregressive mixture-of-experts network that predicts the next token over interleaved text and image data, trained on billions of internet examples. building images patch by patch.

questions

what is Grok Imagine?

Grok Imagine is xAI's image-and-video generator, built into the Grok app and offered through the Imagine API. it turns text or image prompts into images and short clips (up to about 15 seconds) with native synchronized audio. it launched on August 4, 2025 for SuperGrok and Premium+ X subscribers on iOS.

does Grok Imagine make video, or just images?

both. it does text-to-image, image editing, and image-to-video. xAI's docs list separate model variants for image and for video (grok-imagine-video-1.5), and a June 2026 1.5 update added 720p image-to-video that animates a single still while keeping its detail and lighting.

what model powers Grok Imagine?

xAI's stated image-generation model is Aurora, announced in December 2024: an autoregressive mixture-of-experts network that builds images patch by patch from interleaved text and image data, trained on billions of internet examples. xAI has not publicly detailed the architecture behind the video model, so any 'Aurora powers the video' claim is unconfirmed and should be read with caution.

why is Grok Imagine controversial?

its permissive Spicy mode reportedly produced nonconsensual sexual deepfakes, including of Taylor Swift from an innocuous Coachella prompt per The Verge. that prompted a Consumer Federation of America-led coalition to demand an FTC investigation in August 2025, and reported analyses and government probes in several countries have since followed, including over images of minors.

can amige. tell if something came from Grok Imagine?

amige. returns a probabilistic verdict on whether content looks AI-generated, plus a best guess on which generator it resembles, for example 'looks like Grok Imagine / Aurora-style output.' it's an estimate, not proof, and attribution resembles rather than identifies. it's the doubt beat: it's real? or is it?

sources.

  1. 01
    Imagine overview — xAI developer docs (official)
    Product name, image + video capabilities, model variants, ~15s duration.
  2. 02
    Grok Imagine, xAI's new AI image and video generator, lets you make NSFW content — TechCrunch
    Launch coverage: Aug 4 2025, SuperGrok / Premium+ on iOS, 15s clips with native audio, Spicy mode, the 'waxy / cartoonish' quality note.
  3. 03
    xAI updates Grok Imagine to 1.5 with image-to-video at 720p — The Decoder
    1.5 image-to-video, 720p, single-still animation, June 2026, framed against Seedance and Veo.
  4. 04
    What is xAI Aurora? Inside Grok's image generator — EM360Tech
    Aurora architecture: autoregressive mixture-of-experts, next-token over interleaved text + image, ~Dec 2024.
  5. 05
    Incident 1165: Grok Imagine reportedly produces nonconsensual Taylor Swift deepfakes — AI Incident Database
    Structured incident record citing The Verge / Jess Weatherbed, ~Aug 5 2025: Spicy mode + a Coachella prompt, without bypassing safeguards.
  6. 06
    Consumer safety groups demand an FTC investigation into Grok's Spicy mode — EPIC (reposting The Verge)
    A Consumer Federation of America-led coalition asked the FTC and all 50 state attorneys general to investigate, Aug 2025.
  7. 07
    Grok sexual deepfake scandal — Wikipedia
    Secondary aggregator for the reported ~6,700 nudified-images-per-hour figure and multi-government investigations. treat numbers and dates as secondary.
AI video detector
check any clip or deepfake, Grok Imagine included.
put one through amige →is this AI? →