With the growing use of generative AI tools like ChatGPT, understanding where the information provided by these models comes from has become essential for brands and SEO/GEO professionals. When a user gets an answer from ChatGPT, is it generated out of thin air? Is it based on press articles, Reddit posts, or Wikipedia entries? Or does it pull information directly from live web searches? In this article, we'll take a closer look at the main sources used by ChatGPT, how to identify them manually, and how to save time by using the Qwairy tool to track your presence across LLMs.

Reminder: ChatGPT and SearchGPT Rely on Different Sources

ChatGPT is OpenAI's large language model (LLM) and includes several versions, with the main ones being:

GPT-3
GPT-3.5
GPT-4

On the other hand, SearchGPT is a hybrid LLM.

Hybrid LLMs access the web in real time using techniques like RAG (Retrieval-Augmented Generation). Thanks to a partnership with Microsoft, SearchGPT leverages Bing search results. As of April 2025, SearchGPT announced it now handles nearly 1 billion searches per week.

1.1 ChatGPT's training Corpus

Every AI model is trained on a specific dataset, commonly referred to as a training corpus. The standard version of ChatGPT generates its answers based on a , which includes:

Reminder: ChatGPT and SearchGPT Rely on Different Sources

ChatGPT is OpenAI's large language model (LLM) and includes several versions, with the main ones being:

GPT-3
GPT-3.5
GPT-4

On the other hand, SearchGPT is a hybrid LLM.

1.1 ChatGPT's training Corpus

Every AI model is trained on a specific dataset, commonly referred to as a training corpus. The standard version of ChatGPT generates its answers based on a , which includes:

How to Identify the Sources Behind ChatGPT?

Reminder: ChatGPT and SearchGPT Rely on Different Sources

1.1 ChatGPT's training Corpus

How to Identify the Sources Behind ChatGPT?

Reminder: ChatGPT and SearchGPT Rely on Different Sources

1.1 ChatGPT's training Corpus

1.2. Partnerships and licensed data

1.3. Real-time web access

1.4. Other specific sources

Is your brand visible in AI search?

How can I manually identify the sources used by ChatGPT?

2.1. Using ChatGPT on a private browser

2.2. Analyze SearchGPT results

2.4. Observation via Bing

3. Limits of manual search

How to automate source analysis with Qwairy

5. Conclusion

FAQ

What are the main sources used by ChatGPT?

Is your brand visible in AI search?

How can I manually identify ChatGPT's sources for a specific answer?

What is the difference between ChatGPT and SearchGPT sources?

Which media outlets have partnerships with OpenAI?

What are the limitations of manually identifying ChatGPT sources?

Is Your Brand Visible in AI Search?