AI Search Is Its Own System. Here's How AEO Measures It Properly.
In this article, you will learn why AI search runs on different signals than Google, the three published studies that establish AEO as its own measurable surface, and how GenPicked tracks the citation pool that traditional rankings do not capture.
AEO is the discipline AI search needed
AI search is its own system. The citation pool, the consistency profile, and the ranking signals are structurally different from organic Google results, and that is why AEO exists as a distinct discipline. Three independent studies in the last twelve months establish the finding empirically: only 12 percent of AI citations overlap with Google's top 10 (Ahrefs, Guan 2025), the same engine returns 9.2 percent URL consistency within a single day across 10,000 queries (SE Ranking 2025), and the bias dynamics of AI citation are governed by signals that do not appear in traditional SEO data.
For an agency owner this is good news. The AI search surface is real, the buyer behavior is shifting toward it, and there is now a defensible measurement discipline that captures what SEO instruments cannot. AEO is the answer. GenPicked is built to track this surface as a first-class problem: pairwise comparison across engines, repeated sampling, and category-level prompts that report citation patterns SEO dashboards never see.
The three studies below walk through exactly what AEO measures and why those signals matter for the next client report.
What the three studies actually found
Ahrefs / Guan 2025: 12 percent overlap with Google's top 10
In September 2025, Xibeijia Guan published a study at Ahrefs analyzing 15,000 long-tail queries across Google, Bing, ChatGPT, Gemini, Copilot, and Perplexity. The methodology was simple: ask the same question to a search engine and to an AI assistant, then compare what URLs each one points to.
The result: only 12 percent of links cited by ChatGPT, Gemini, and Copilot appear anywhere in Google's top 10 results for the same prompt. The remaining 88 percent come from pages that do not rank in the top 10. Of that, 80 percent come from pages that do not rank anywhere in Google for the original query at all.
Perplexity was a partial exception. Roughly one in three of its citations point to pages that rank in Google's top 10, suggesting Perplexity's retrieval architecture is more aligned with traditional search than ChatGPT's or Gemini's. The other three engines diverge sharply.
The implication is direct. If your brand's Google ranking is the strategy, your brand is competing in the 12 percent slice of AI citations that overlap with traditional rankings. The other 88 percent is decided by signals that Google ranking does not control.
SE Ranking 2025: 9.2 percent URL consistency within the same engine
A month earlier, SE Ranking ran a different test. Fixed set of 10,000 keywords. Three independent test runs on the same day from US locations. Just Google's own AI Mode, not cross-platform comparison. Question: how often does the same AI engine return the same URLs for the same query within a single day?
The result was the most extreme inconsistency number in the literature. Average URL overlap across the three runs: 9.2 percent. Twenty-one percent of keywords returned zero overlapping URLs across the three tests. Only 0.1 percent of keywords had 100 percent URL match. The same engine, same query, same day, almost entirely different citation sets.
A separate finding from the same study: AI Overviews and AI Mode (both Google products) show 10.7 percent URL overlap and 16 percent domain overlap with each other. Google's own AI features cite different sources from each other even within Google's ecosystem.
The implication: even if you could solve "rank in AI search" as a problem, the target is moving. A page that gets cited at 9 AM may not be cited at 11 AM. A page that appears in AI Mode may not appear in AI Overviews. Stability is not a default property of AI citation sets.
Amazon Science 2024: LLM recommenders have less popularity bias
The third study points in a more nuanced direction. In 2024, Amazon Science researchers published a paper on large language models as recommender systems. The conclusion: LLM-based recommenders exhibit less popularity bias than traditional recommender systems, even without explicit mitigation.
This complicates the narrative. The earlier two findings could be read as "AI search is unreliable" or "AI search is broken." The Amazon Science finding suggests a more careful read. AI systems are different from traditional search, but the difference is not uniformly worse. In some respects (notably the over-representation of high-popularity items), LLMs may be more equitable than the ranking systems they are sometimes assumed to replace.
The right framing is divergence, not deterioration. AI search is a different system. Some of its differences favor smaller brands; some of its differences create new measurement problems; almost all of its differences mean SEO instincts do not transfer cleanly.
What this means for the SEO-to-AEO transition
For about three years, the dominant agency pitch in this category has been "we will help you transition from SEO to AEO." The pitch is intuitive. SEO is the legacy practice. AEO is the new layer. The agency that did SEO well will do AEO well by adding new tactics on top of existing capability.
The divergence data calls this framing into question. If 88 percent of AI citations come from URLs that do not rank in Google's top 10, the SEO-to-AEO transition is not adding a layer. It is opening a separate game whose rules and signals are not yet fully mapped.
This does not mean SEO work is wasted. It does mean three specific reframings.
First, the relationship between SEO investment and AEO outcome is weaker than the marketing implies. Improving a page's Google ranking from position 15 to position 5 is a meaningful SEO win that may or may not move AI citation likelihood. The two systems are coupled at the margin (12 percent overlap is not zero) but the coupling is not strong enough to predict outcomes.
Second, the pages that win AI citations are often not the pages an SEO professional would predict. Earned-media coverage, niche publications, forum discussions, and other content types that rank lower in traditional Google often outperform brand-owned content in AI citation pools. The University of Toronto's 2025 analysis found 82 to 89 percent of AI citations come from earned media, which is consistent with the divergence finding.
Third, multi-engine measurement is not optional. If AI Mode and AI Overviews disagree with each other within Google's own ecosystem, then ChatGPT-only measurement, Perplexity-only measurement, or any single-engine measurement is sampling one branch of a divergent tree. The cross-engine picture is the real measurement.
We addressed the single-engine critique more fully in our piece on the four strongest AEO critiques and where they fall short.
What actually predicts AI citations
If Google ranking does not predict AI citations, what does? The honest answer is that the field does not have a complete model yet, but several signals are emerging.
Earned media coverage in trusted publications. The U Toronto finding that 82 to 89 percent of AI citations come from earned media indicates that media relations work disproportionately drives AI visibility. Press mentions in Search Engine Land, Wired, TechCrunch, vertical trade publications, and similar outlets are weighted heavily by AI engines.
Structured-data depth on owned pages. Pages with thorough Schema.org markup (FAQPage, HowTo, DefinedTerm, Article) get cited more frequently than pages with thin or absent schema. This is not because the schema directly predicts citation; it is because the schema makes the content machine-parseable for retrieval-augmented systems.
Content shape that matches the typical answer template. Pages that include explicit Q-and-A formatting, definition blocks, comparison tables, and step-by-step instructions get cited more often than long-form essays that bury the answer. AI engines extract retrievable chunks; they reward content that is already chunked.
Topical authority depth rather than keyword targeting. An AEO platform that covers a topic across multiple pages with bidirectional internal linking gets cited more than a single thin page targeting the same keyword. This is consistent with how RAG systems retrieve: they pull from clusters of evidence, not single best-matched pages.
Citation by other AI engines. A page cited by ChatGPT is more likely to be cited by Claude. The engines do not cross-reference each other directly, but they share underlying training and retrieval patterns that correlate. Multi-engine visibility is partially auto-reinforcing.
None of these signals are deterministic. The category does not yet have the equivalent of Google's documented ranking factors. What it has is a growing set of empirical correlations between page properties and citation likelihood. Agencies that test against these signals systematically can move client outcomes; agencies that assume SEO ranking is the input will continue to be surprised by where citations land.
What this means for measurement
The divergence finding has a sharper implication for measurement than for optimization. The optimization question is "how do I get my brand cited more." The measurement question is "is the number I am reporting to my client actually capturing the brand's AI visibility, or am I capturing one engine's snapshot at one moment in time?"
If AI Mode and AI Overviews show 10.7 percent URL overlap with each other, then a single-engine scan is reporting a slice of the brand's actual visibility. If a same-day, same-query, same-engine scan shows 9.2 percent URL consistency, then a single-scan measurement is reporting noise that the next scan will contradict.
The methodology answer to both problems is the same one we have written about in our methodology transparency article: multi-engine coverage with documented weighting, sample size large enough to average out single-scan variance, and prompt-template policy that does not introduce its own bias. The divergence finding makes the methodology question more urgent, not less.
For the agency owner, the practical takeaway is that a Share of Model number derived from a single engine on a single scan is not a measurement. It is a draw from a distribution. The defensible report shows the distribution, the engine breakdown, and the methodology that produced both. We covered Share of Model measurement in detail in the Share of Model article.
What the divergence does NOT mean
Some readings of the divergence data overstate the case. We want to name them and reject them.
The data does not mean SEO is dead. SEO remains the foundational discipline for everything an AI engine eventually parses. Pages that are crawlable, fast, structurally sound, and authoritative still benefit AI citation likelihood even when the relationship is not 1-to-1 with rank.
The data does not mean AI search results are random. The 9.2 percent same-day URL consistency is the floor, not the ceiling. Brand mention rates (the actual unit Share of Model measures) are substantially more stable than URL rankings, especially in tight categories with few legitimate options. The category-rank distinction matters.
The data does not mean AI engines are unmeasurable. It means they are measurable using different methods than traditional rank tracking. Frequency-based metrics over many observations are defensible. Single-snapshot URL rankings are not.
The data does not mean agencies should stop selling AI visibility services. It means the honest version of the service is "we will help you understand and improve your visibility across a system whose mechanics are still being mapped, using methodology that controls for the known instability," not "we will rank you in AI search the same way we ranked you in Google."
How to talk about this with clients
Three sentences agencies can use when a sophisticated client asks why their Google rank does not match their AI visibility.
"AI search and Google search overlap on roughly twelve percent of citations, per a 2025 Ahrefs analysis of fifteen thousand queries. The remaining citations come from sources that do not rank in Google's top 10 for the same prompt."
"Within Google's own AI features, same-day runs of the same query produce nine percent URL overlap on average. Stability is not a default property of AI citations, which is why our measurement uses many observations across multiple engines rather than single-scan snapshots."
"Improving Google rank is still useful for the slice of AI citations that mirror Google. For the rest, we work on earned media, structured-data depth, and content shape that AI engines reward. Both pieces of work are part of the program."
That is a defensible posture. It does not over-promise. It cites real research. It explains where the legacy SEO work continues to matter and where new work is required.
Frequently asked questions
Does this mean SEO does not help with AI citations at all?
No. Twelve percent overlap is not zero overlap. The pages that win in Google still benefit from being crawlable, structurally sound, and authoritative. The argument is that SEO is necessary but not sufficient. AI citation work requires additional inputs (earned media, schema depth, content chunk shape) that pure SEO does not address.
Why is Perplexity different from ChatGPT and Gemini?
Perplexity's architecture uses retrieval-augmented generation that explicitly references a web index closer to traditional search. ChatGPT and Gemini use different retrieval approaches whose details are less public. The architecture difference shows up as roughly three times higher citation overlap with Google for Perplexity.
Will the divergence get smaller or larger over time?
The honest answer is unknown. Some forces push toward convergence (AI engines incorporating more SEO-style signals as they mature). Some push toward divergence (AI engines becoming better at retrieving long-tail content that does not rank for the original query). The category will not look the same in two years; the direction of drift is not predictable today.
If 80 percent of AI citations come from pages that do not rank in Google, should I stop investing in Google rank entirely?
No, for three reasons. First, the 12 percent that does overlap with Google rank is concentrated on high-intent commercial queries where rank still drives outcome. Second, Google rank is a downstream proxy for the content quality and authority signals that also influence AI citation. Third, Google still drives the majority of traffic in most categories; AI traffic is growing but small in absolute terms.
How do I measure AI visibility given the consistency problem?
Use a measurement methodology that runs many observations across multiple engines, reports composite scores with confidence intervals, and discloses prompt-template policy. We covered the specific methodology choices required for defensible measurement in our methodology transparency article.
Is there a tool that handles AI search divergence correctly?
The right question is whether the tool publishes its methodology choices. A tool that handles divergence correctly will document multi-engine coverage, sample size per scan, citation extraction rules, and engine weighting. We documented these choices for GenPicked in the methodology article above; other vendors have similar information available on request.
Related reading
- Why most AEO tools won't show you their engine weights
- Share of Model: the AEO metric everyone wants, and why almost nobody measures it defensibly
- The AEO critics have a point. Here is where they are right, and where they are wrong
- Profound vs GenPicked: Enterprise AEO vs Agency-First AEO
See the divergence on your own brand
The fastest way to see AI search divergence in your client's data is to run the same query against multiple engines and observe what cites their brand and what cites their competitors. Run a free GenPicked AEO audit to see multi-engine citation data with the methodology disclosed.
Start your 14-day free trial of GenPicked Growth →
Dr. William L. Banks III is Founder of GenPicked. Research citations in this article (Ahrefs/Guan 2025, SE Ranking 2025, Amazon Science 2024, University of Toronto 2025) are documented in the underlying research wiki. Specific citations available on request.