NEWQwairy v1.17: Your AI Visibility, Audited & Actionablev1.17: Audited & Actionable

AI Engines Cite Different Sources · 8-Engine GEO Study (2026) | Qwairy

Part 2 - the control: engines agree with themselves about 4x more

Here is the objection that could sink the whole study, and the test that answers it. AI engines are non-deterministic: ask twice, get two different answers.

If a single engine disagrees with itself as much as it disagrees with a rival, then Part 1 is just noise and means nothing.

So we measured it.

Because every prompt runs repeatedly, we can take the same engine on the same question across two separate runs and compute the identical third-party domain overlap - one engine versus itself. That is the baseline the cross-engine numbers have to be read against.

Grouped bars per engine: each engine's overlap with its own re-runs (purple) towers over its overlap with other engines (light), roughly 4x higher across the board Two honest readings sit side by side here. First, engines really are noisy.

Self-consistency tops out around 41% for Perplexity and sits near 28% for ChatGPT - meaning ChatGPT re-cites only about a quarter to a third of the same domains when you ask it the same thing a few days later.

The story is not "engines are deterministic and they disagree." Engines are genuinely unstable, and a single answer should never be read as the truth.

Second, and decisively, self-consistency is far above cross-engine overlap. Every engine re-cites its own sources several times more often than it shares sources with a rival.

Engine	Agrees with itself (same week)	Agrees with other engines	Ratio
Claude	~57%	~11%	~5.4x
AI Overviews	~35%	~7%	~4.9x
Perplexity	~41%	~9%

Note: Claude's cross-engine figures rest on a smaller prompt sample than the other engines and should be read as indicative.

This is the number that defuses the objection. Across the eight engines, an engine agrees with itself about four times more than it agrees with any other engine - and never less than 2.8x.

The cross-engine gap is not the engines being randomly noisy; the noise floor is measured, and the divergence sits four times above it. (We compare like with like: the self-baseline uses re-runs a few days apart, matching the timing of the cross-engine comparison.

Self-consistency decays as runs drift further apart in time, which is exactly why we anchor on the same short window for both.)

Part 3 - Google is not one surface

The single starkest result hides inside Google itself. AI Overviews (the snippet at the top of a results page) and AI Mode (the conversational search experience) are the same company, often the same query - and they agree with each other less than one time in five. Stat card: Google's AI Overviews and AI Mode cite the same source only 19% of the time 19% is the highest overlap anywhere in this study - and it is two Google products.

Every cross-vendor pair is lower. So a guide that says "optimize for Google's AI" is already ambiguous about which Google it means: a brand can be well-cited in AI Overviews and nearly absent from AI Mode, because the two surfaces are pulling from substantially different source sets.

Treating "Google AI" as one optimization target is a category error before you even get to ChatGPT.

Part 4 - why: different source diets

The overlap is low because the engines are not reading the same kind of web. Look at what each one cites most across the shared prompt set. Two ranked lists: Google AI's most-cited sources are dominated by social platforms; ChatGPT's are dominated by Reddit plus trade and reference press

Google's AI surfaces lean on the social web - YouTube, Reddit, Instagram, Facebook, TikTok, LinkedIn. ChatGPT leans on Reddit plus earned editorial and reference - TechRadar, Wikipedia, arXiv, Forbes, Reuters - and cites YouTube a fraction as often as Google does.

Reddit is the one heavy overlap (both lean on it hard), which is precisely why the residual overlap isn't zero. But strip the giant platforms out entirely and the picture barely moves: the divergence is in the long tail of sources too, not just the headline platforms.

This is the mechanism behind every number above.

If your off-site presence is concentrated where one engine looks and thin where another looks, you will win one and lose the other - and no amount of on-page SEO changes which sources an engine reaches for.

What this means for your GEO strategy

There is no "optimize once for all of AI." The domains that earn Google AI citations overlap with ChatGPT's by about 7%. Building for one engine moves the needle on that engine, not on the others. Plan per engine, not for "AI" as a monolith.

Track Google as two surfaces. AI Overviews and AI Mode share 19% of their sources. Blend them into one "Google AI" score and you will miss real movement on each. They are two channels.

Match your off-site work to each engine's diet. Google's surfaces reward a strong social and community footprint (YouTube, Reddit, Instagram); ChatGPT rewards Reddit plus earned press and reference coverage. The source families differ, so the earned-media plan differs.

Measure citation share per engine, not blended. A single "AI sources" report averages away the fact that you are winning one engine and losing another. Break it out by engine, and by Google surface.

Never trust a single check. Engines re-cite only a quarter to a half of their own sources run to run. One answer is one draw from a noisy distribution. Read citation rates over many runs, or you are reacting to noise.

Common mistakes

"Google says AI optimization is just SEO, so one playbook covers all of AI"

Reality: the domains Google AI cites overlap with ChatGPT's by about 7% on the same question. SEO fundamentals may help you get crawled everywhere, but which sources each engine reaches for differs sharply. One source-building playbook does not transfer across engines.

"Get cited in one AI engine and you're cited across all of them"

Reality: cross-engine citation overlap runs 4-19%, and the top source matches 4-13% of the time. Presence in one engine is weak evidence of presence in another.

"Google AI is one surface, optimize for it once"

Reality: AI Overviews and AI Mode agree with each other only 19% of the time - the highest agreement in the study, and it is two Google products. Optimize for each surface.

"Engines disagree because AI is random - it doesn't mean anything"

Reality: each engine agrees with its own re-runs about 4x more than with any other engine. The randomness is real and measured, and the cross-engine divergence sits four times above it. The gap is structural.

About this study

Scope: eight AI surfaces - Google AI Overviews, Google AI Mode, ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot, xAI Grok - measured on production client brands over a 90-day window (March 13 to June 11, 2026; audit and test brands excluded). Every prompt is run repeatedly per engine, which is what enables the self-consistency control.

Methodology: quantitative analysis of the source citations in completed AI answers. For each (question, engine) we take the engine's most recent answer as its representative citation set, normalize every cited host to its registrable domain (eTLD+1) via the public suffix list, exclude the monitored brand's own domain(s), and compute set overlap against other engines: Jaccard (shared / total distinct domains), asymmetric containment, and top-1 (position-1 domain) agreement. The cohort is questions answered with citations by ChatGPT, Perplexity and at least one Google AI surface in the window. Self-consistency uses prompts with two or more answers per engine: consecutive answers are paired and the same third-party overlap is computed, bucketed by the time gap between runs.

Limitations

Non-determinism is measured, not eliminated. Self-consistency is well below 100% (about 25-57% depending on engine). We quantify the noise floor and show the cross-engine gap sits ~4x above it; we do not claim engines are stable.
Registrable-domain grain. We count a source as shared if the engines cite the same registrable domain, which merges subdomains. At the exact-URL level overlap is even lower (~3%), so this choice is conservative - it makes the engines look more alike than they are.
Citation cardinality varies by engine. Some engines cite many more domains per answer than others. Raw Jaccard penalizes uneven set sizes; we re-checked with a size-normalized variant (cutting both sets to the same top-k) and the ranking holds.
Timing. Cross-engine answers are not always captured at the same instant; restricting to answer pairs within two weeks of each other does not change the result.
Monitored-brand population. These are brands actively tracked by their owners, not a neutral sample. The comparison controls for this because all engines see the same brands and prompts.
Snapshot. This is a 90-day window. AI surfaces evolve quickly; a repeat next quarter may move.

Transparency notes

All figures are relative measures - rates, ratios, per-answer overlap - never raw volume counts.
The monitored brand's own domain is excluded from every overlap figure, so the numbers reflect third-party citations only and are not inflated by the trivial self-link every engine includes.
We report each engine's own run-to-run self-consistency as the baseline, so the cross-engine overlap is interpreted against the measured noise floor rather than against an assumption of determinism.

Sources and references

Google Search Central, Optimizing your website for generative AI features on Google Search (May 2026) and the announcement post A new resource for optimizing for generative AI in Google Search (May 15, 2026)
Related research: The Two Blind Spots in AI Visibility, The ChatGPT Linking Shift
New to the topic? Start with What is GEO?

Want to see which sources each AI engine cites for your brand, per engine and per Google surface, measured over repeated runs instead of one-off checks? Qwairy tracks brand mentions, position and citations across AI Overviews, AI Mode, ChatGPT, Perplexity, Claude, Gemini and expanding providers, with per-engine source analytics.

Engine pair (third-party domains)	Overlap (Jaccard)	Of A's domains, share also cited by B	Same #1 source
Google AI vs ChatGPT	~7%	13-17%	~6%
Google AI vs Perplexity	~11%	19-23%	~6%
ChatGPT vs Perplexity	~7%

Same Question, Different Web: AI Engines Barely Cite the Same Sources (June 2026)

What we found

Data and methodology

Scope

What we measure, and what we deliberately do not

Part 1 - on the same question, engines cite different webs

Is your brand visible in AI search?