blog & research · glossary · detection routing

what is detection routing?

by Tuan Hoang · detection lead · last reviewed 2026-06-26

⤹ the right expert for each scan.

detection routing is the practice of sending each scan to the detectors best suited to it instead of running every detector equally on every input, the same gating idea behind mixture-of-experts applied to AI-content detection.

the idea is older than AI-content detection by decades. in 1991, Jacobs, Jordan, Nowlan, and Hinton described a system “composed of several different ‘expert’ networks plus a gating network that decides which of the experts should be used for each” case. that paper, “Adaptive Mixtures of Local Experts,” is the root of what the field now calls mixture-of-experts: pair a set of specialist sub-models with a small router, then let the router pick who handles each input. only the chosen experts run, which is what keeps the computation sparse rather than dense.

the idea scaled. Shazeer et al. (2017) built a trainable gating network that selects “a sparse combination of these experts to use for each example,” reporting greater than 1000x gains in model capacity at minor cost in efficiency. that is a capacity-and-compute claim, not a detection-accuracy one. Switch Transformer (Fedus, Zoph & Shazeer, 2021) cut the routing down further, sending each input to a single expert and reporting up to roughly 7x pre-training speedup at constant compute per token. again: a speed-and-scale figure from the authors, about language models, not about catching fakes.

routing is not an ensemble

an ensemble runs many models and combines their outputs. routing decides which models to run in the first place. the difference is selection versus averaging, and the two can be stacked: route to a subset, then combine what comes back. a cascade is the cheapest routing variant, running a fast model first and escalating to a stronger one only when confidence is low. FrugalGPT (Chen, Zaharia & Zou, 2023) is the canonical example, an “LLM cascade which learns which combinations of LLMs to use for different queries,” with the authors reporting up to 98% cost reduction against GPT-4. that headline is a self-reported cost figure, not a measure of detection accuracy.

why route a detection scan

because no single model is best at everything. system-level routing, the same idea moved up from internal layers to whole models, aims to “route each query to the model that is most likely to produce a correct answer,” as RouterBench (Hu et al., 2024) frames it, since “no single model can optimally address all tasks.” detection has the same shape. a 2025 review of AI-image detection (Mahara & Rishe) notes that generative models leave distinct, sometimes generator-specific fingerprints, yet detectors “struggle with generalization, often misclassifying images from unseen generative models as real.” a detector strong on one generator family can be weak on the next. routing leans toward the detectors that tend to be strongest for an input's likely generator instead of diluting that signal by averaging everything equally.

routing does not solve the generalization problem, and it would be dishonest to say it does. the mapping from mixture-of-experts and LLM routing onto “send this scan to the best detector for the likely generator” is an explanatory analogy, not a benchmarked technique with a peer-reviewed accuracy number attached. cross-generator generalization stays genuinely hard, and a new generator can degrade older detectors on contact. that is exactly why any verdict here reads as a probabilistic estimate, a best guess, never proof.

this is how amige. reasons about a scan. it treats each available detector as an expert and steers the input toward the ones that tend to perform best for it, runs a panel of independent detectors built by different teams, names the likely maker as a best guess (“looks like” rather than “is”), and returns “uncertain” when the reads conflict instead of forcing a call. as generators change, the routing gets recalibrated. you can follow the whole path end to end in the machine.

questions

what is detection routing?

it is the practice of sending an incoming scan to the detector or detectors best suited to it, instead of running every detector equally on every input. it borrows the mixture-of-experts idea, a small 'router' or 'gating' step that picks the right experts (Shazeer et al., 2017; arXiv:1701.06538), and applies it to AI-content detection.

how is routing different from an ensemble?

an ensemble runs many models and combines their outputs; routing decides which models to run in the first place. the two can be combined (route to a subset, then combine), but the core distinction is selection versus averaging. a cascade is a routing variant that escalates from a cheap model to a stronger one only when confidence is low (FrugalGPT; arXiv:2305.05176).

is detection routing the same as mixture of experts?

mixture of experts (Jacobs et al., 1991) is the foundational neural-network architecture where a gating network routes each input to specialist sub-networks. 'detection routing' applies the same principle at the system level, across whole detectors rather than internal layers, much as LLM routing does for language models (RouterBench, 2024; arXiv:2403.12031).

why route a scan instead of just using one strong detector?

because no single model is best at everything. research on AI-image detection finds generator-specific fingerprints and notes that detectors often fail to generalize to unseen generators (Mahara & Rishe, 2025; arXiv:2502.15176). routing aims to apply the strongest available signal for a given input. the output is still a probabilistic best guess, not proof.

sources.

01
Jacobs, Jordan, Nowlan & Hinton, Adaptive Mixtures of Local Experts — Neural Computation 3(1):79-87 (1991)
the seminal paper: specialist expert networks plus a gating network that routes each case to the right expert. the historical anchor for routing.
02
Shazeer et al., Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (2017, arXiv:1701.06538)
defines the trainable gating network that selects a sparse combination of experts per example. source of the greater-than-1000x capacity figure, a capacity claim kept attributed and dated.
03
Fedus, Zoph & Shazeer, Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (2021, arXiv:2101.03961)
simplifies routing to one expert per input (top-1 / switch routing). the speed and scale figures belong to the authors, and they are about compute, not detection accuracy.
04
Hu et al., RouterBench: A Benchmark for Multi-LLM Routing System (2024, arXiv:2403.12031)
frames system-level routing (route each query to the model most likely to produce a correct answer) and ships a benchmark of 405k-plus precomputed outputs across 11 models.
05
Chen, Zaharia & Zou, FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance (2023, arXiv:2305.05176)
the canonical cascade reference: cheap model first, escalate on low confidence. the up-to-98% cost-reduction figure is self-reported by the authors, not a detection benchmark.
06
Mahara & Rishe, Methods and Trends in Detecting AI-Generated Images: A Comprehensive Review (2025, arXiv:2502.15176)
the detection tie-in: generator-specific fingerprints exist and detectors struggle to generalize to unseen generators, the real-world motivation for routing.

related terms

put one through amige →is this AI? →