How does Qwairy collect AI search citation data?

Qwairy sends structured prompts to 7 AI search providers (ChatGPT, Perplexity, Gemini, AI Overview, AI Mode, Copilot, and Grok) and analyzes their responses to extract source citations, mention positions, and citation types.

How often is the data updated?

Data is updated weekly through automated monitoring cycles. Each cycle processes thousands of prompts across all 7 providers and multiple countries.

What countries are covered in the data?

The Data Hub covers prompts across 15+ countries including the US, UK, France, Germany, Spain, Canada, Australia, and more. Country-specific data reflects the localized responses from each AI provider.

Data Methodology

How we collect, process, and analyze AI search citation data. Transparency is core to our research — here is exactly how the Qwairy Data Hub works.

1. Data Collection

We send structured prompts to 7 AI search providers: ChatGPT (OpenAI), Perplexity, Gemini (Google), AI Overview (Google), AI Mode (Google), Copilot (Microsoft), and Grok (xAI).

Prompts are designed to reflect real user queries across multiple industries, intent types (informational, comparative, transactional), and geographies. Each prompt is localized to the target country and language.

Every monitoring cycle processes thousands of prompts across all providers and countries. Responses are captured in full, including inline citations, source links, and recommendation context.

2. Citation Extraction

Each AI response is parsed to extract source citations. We identify:

Source domains — which websites are cited (e.g., reddit.com, wikipedia.org)
Citation position — where the source appears in the response (1st, 2nd, 3rd, etc.)
Citation type — how the source is referenced (direct citation, recommendation, comparison, etc.)
Source classification — domain type (institutional, media, blog, forum, social, educational)

Citations are normalized to root domains and deduplicated within each response. Position data is recorded to measure source prominence, not just presence.

3. Geographic & Provider Segmentation

All data is segmented by provider (which AI engine gave the response) and country (the geographic context of the prompt). This allows us to answer questions like:

Which sources does ChatGPT cite most in France vs. the US?
Does Perplexity favor different sources than Gemini for the same query?
How does source diversity vary across AI providers?

The Data Hub currently covers 15+ countries with localized prompts in their respective languages.

4. Metrics & Scoring

Key metrics computed from the raw citation data:

Mention Rate — percentage of responses that cite a given source
Average Position — mean citation rank across all appearances (lower = more prominent)
Position Score — normalized 0-100 score derived from average position for easier comparison
Provider Coverage — how many of the 7 providers cite a source
Source Diversity Index — how evenly citations are distributed across sources
Trigger Rates — how often AI features (shopping, local results, images) are activated

For the GEO Index brand rankings, we additionally compute a composite GEO Score weighted across mention rate (35%), average position (25%), provider coverage (20%), sentiment (10%), and prompt type breadth (10%).

5. Update Frequency & Freshness

Data is refreshed on a weekly cadence. Each update cycle:

Runs all prompts across all providers and countries
Extracts and classifies new citations
Recomputes all aggregate metrics (top sources, trends, trigger rates, etc.)
Generates period-specific snapshots (last week, last month, last year, all time)

Historical data is retained to power trend analysis and period-over-period comparisons.

6. Limitations & Transparency

We believe in being upfront about what our data can and cannot tell you:

Sample-based — our data represents a sample of AI responses, not the complete universe of all queries ever made
AI responses change — the same prompt can produce different citations at different times. We capture snapshots, not continuous streams
Prompt design matters — citation patterns depend on how prompts are worded. We use diverse, realistic prompts but cannot cover every possible query
Provider APIs evolve — AI providers frequently update their models and citation behaviors. Historical comparisons should account for model changes
No personal data — all prompts are generic research queries. We do not use personalized or logged-in sessions

Apply This Data to Your Brand

The Data Hub shows aggregate trends. Qwairy lets you track your own brand's AI visibility with the same rigorous methodology.

Back to Data Hub