AEO Measurement Stands on Two Decades of Position-Bias Research
In this article, you will learn how twenty years of Search Engine Manipulation Effect research justifies AEO measurement as a discipline, why AI's single-answer format makes brand position more consequential than ten blue links ever made it, and what that research foundation means for how agencies measure AEO today.
The research foundation under AEO measurement
AEO measurement is not a new instinct chasing a new surface. It stands on two decades of peer-reviewed research that established ranking position as one of the most consequential variables in any information presentation system. The most cited piece of that foundation is the Search Engine Manipulation Effect, published by Robert Epstein and Ronald Robertson in the Proceedings of the National Academy of Sciences in 2015. Five randomized controlled trials with 4,556 participants showed that biased search rankings shifted user preferences by 20 percent or more, largely below the participants' conscious awareness. The result has been replicated, extended, and qualified across ten years of follow-up work, and the cumulative finding is now settled: ranked information presentation is an active persuasive force, not a neutral display format.
The full evidence base widens from there. Position bias as a cognitive mechanism shows up across more than 30 documented biases in information search (Azzopardi 2021). Featured snippets transfer credibility uncritically to the brand shown in them (Bink 2022). Attitude shifts from biased rankings persist even when users are told the rankings are algorithmically selected (Bink 2023). Ranking order alone, independent of content, drives attitude change (Draws 2021). Trust in higher-ranked Google results is documented as far back as 2007 (Pan). Purchase-behavior research found rankings causally affect commercial outcomes at roughly $1.92 per rank position in observational data (Ursu 2018). The brand listed first does not just get more clicks. It gets more belief. Across twenty years and dozens of studies, the direction of the finding is consistent.
AI search makes this foundation more consequential, not less. The original Search Engine Manipulation Effect work studied ten-link results where users still saw alternatives. AI engines increasingly return one answer. The persuasive force that ten blue links shared now concentrates on a single recommendation. That is exactly why AEO measurement has become an essential discipline rather than an optional layer. Measuring how AI engines rank brands is measuring the variable that twenty years of research has shown to be the most consequential. GenPicked exists to make that measurement defensible. This article walks through what the research established, why AI search amplifies the effect, and what agencies should expect from any tool claiming to measure AEO well.
What the research actually showed, before we extrapolate
Before we apply the research to AI search, we should be exact about what the original SEME work demonstrated and what it did not.
What it showed: in artificial laboratory search engines on political topics, biased rankings shifted voter preferences by an average of 20 percent. The effect held across demographic groups and political contexts. The mechanism was largely subliminal.
What it did not show: the same 20 percent figure for real-world Google searches on non-political topics. The original studies used custom search engines in controlled settings. Some critics have argued that the artificial setting inflated the effect size relative to live Google or Bing usage. Follow-up replications in less artificial conditions have shown smaller but still significant effects.
What it strongly implied: position-based persuasion is a general property of ranked information presentation, not a quirk of any one search engine or one topic. Subsequent purchase-behavior research (Ursu 2018) documented that rankings causally affect commercial outcomes at roughly $1.92 per rank position in observational data. Different domain, similar direction.
The honest summary of SEME after a decade: ranking position is persuasive, the effect is statistically robust, and the magnitude is meaningful even if the original 20 percent figure overshot in the most artificial conditions.
What happens to a manipulation effect when the list collapses to one answer
Traditional search returned ten blue links. SEME researchers studied what happens when the top three of those ten are manipulated. AI search increasingly returns one answer. The persuasive shape is structurally different.
In a ten-link result, the user has an implicit comparison. They can see what is on the list, infer what is not, and apply their own filtering. Even with position bias intact, the alternatives are visually present. A user comparing options sees that options exist.
In an AI-generated single answer, the alternatives are not visually present. The engine has already made the selection. The user is downstream of a filtering decision they did not witness. Position bias has nowhere to distribute; the manipulation, if any, concentrates entirely into the brand the engine names.
Three pieces of supporting research make this concentration argument concrete.
Granka et al. 2004 documented that user attention in traditional search heavily favors top-ranked results. In a ranked list, attention is at least allocated across the list, with the top getting most. In a single-answer format, attention has no place to go but to the named brand.
Pan et al. 2007 showed that users trust Google rankings as a quality signal, treating higher-ranked results as more authoritative even when they are objectively less relevant. Trust transfer from the engine to the surfaced result is a documented phenomenon. AI engines inherit this trust transfer and concentrate it on a single brand.
Flavian et al. 2023 extended the trust-transfer finding to voice assistants. When a voice assistant names a brand in response to a query, users transfer authority to that brand at higher rates than they would from a written ranked list. The voice format already showed concentration effects. AI text answers in conversational format follow the same pattern.
The math is straightforward. If SEME shifts political preferences by 20 percent when rankings are manipulated across ten visible options, and AI search reduces the visible options to one, the per-mention persuasive force is plausibly higher, not lower. There is no comparison group on screen. There is no list to scan critically. There is the brand the engine named.
This does not mean every AI recommendation produces a 20 percent preference shift on the named brand. The original SEME work was on political topics with low prior knowledge. Brand recommendations in established categories will show smaller effects because users have prior preferences. But the structural argument holds: collapse of the list into a single answer concentrates rather than diffuses the persuasive effect.
Then someone tested it directly
In April 2026, Search Engine Land published the results of an experiment with the headline "Can a fake brand win in AI search?" The methodology was simple. Researchers created a fictional brand, populated the open web with content optimized for AI citation (articles, structured data, third-party mentions), and then queried AI engines for category recommendations.
The fake brand was cited.
The article reported that AI engines surfaced the manufactured brand in recommendation lists across multiple query categories. The experiment did not control rigorously for sycophancy or randomness, but it landed a directionally important finding: AI recommendation pipelines can be persuaded to surface brands that have no real-world existence beyond the content footprint manufactured for the experiment.
The Search Engine Land piece concluded with reasonable caveats. The fake brand did not always win. The category mattered. The retrieval architecture of the specific engine mattered. But it won often enough to demonstrate that the gap between "exists" and "appears in AI recommendation lists" can be small, and the inputs to closing it are tractable for someone with a content budget.
Read in light of SEME, the implication is stark. If biased rankings on a real ten-link list shift preferences meaningfully, and a fabricated brand can win citation in a one-answer format, the manipulation surface is larger than any practitioner has yet acknowledged in public.
Why this matters for brand measurement, not just brand visibility
The mainstream framing of "AI search manipulation" treats it as an offensive problem: can we make our brand appear more often? The research record suggests the framing should be inverted to make it a measurement problem.
If the AI engines that report brand visibility are themselves susceptible to ranking manipulation, the visibility data they produce is reporting an output that can be moved by inputs the measurement tool does not control or disclose. A "visibility score" is partly measuring genuine market position and partly measuring how much content footprint the brand has invested in shaping AI inputs. The two are entangled and most AEO platforms do not separate them in the reported number.
The methodology consequence: a defensible visibility score has to disclose the prompt template policy and the engine-weighting that affect how susceptible the measurement itself is to manipulation. We covered the specific methodology choices required in our piece on methodology transparency. The SEME research adds a sharper edge to that argument. Methodology transparency is not just a quality-of-measurement issue. It is a manipulation-resistance issue. A vendor that uses brand-anchored prompts is using a measurement instrument that the brand can game by paying for the right kind of content footprint. A vendor that uses blind prompts has at least neutralized the most direct manipulation lever.
The procurement question for an agency: does your AEO platform measure visibility in a way that the brand you are measuring can game by spending more on content marketing? If the answer is yes, the visibility number is partially measuring marketing budget, not market position.
Three specific findings from the research record that should change AEO practice
Finding 1. Position bias concentrates rather than distributes in single-answer AI formats. The mitigation is to measure across many prompts and many engines, treating each single-answer as a sample from a distribution rather than the authoritative answer.
Finding 2. Trust transfer from the engine to the named brand is empirically documented and substantial. Users believe AI recommendations at higher rates than ranked lists. The measurement consequence is that AI-driven brand visibility translates to belief change more efficiently than Google ranking did. This raises the stakes of being mentioned, in both directions.
Finding 3. Manufactured brand content can move AI citation outcomes. The Search Engine Land experiment showed that brands with no real-world existence can be surfaced. The methodology consequence: a visibility measurement that does not control for content-footprint manipulation is reporting partly a marketing-spend signal rather than a brand-strength signal.
These three findings translate into specific procurement questions agencies should ask AEO vendors:
- How does your measurement methodology account for the concentration of position bias in single-answer formats?
- How do you control for content-footprint manipulation in the visibility number you report?
- What is your sample size per scan, and is it large enough to overcome the per-query variance documented in the consistency literature?
We covered the question framework in detail in our piece on AEO critique engagement. The SEME findings reinforce the same procurement posture from a different angle: not just "is the number consistent" but "is the number resistant to manipulation."
What the research record does NOT establish
Three over-readings of the SEME literature we want to resist.
The research does NOT establish that all AI recommendations are manipulated. The original SEME work used deliberately biased rankings in controlled experiments. The fake-brand experiment used deliberately manufactured content. Most real-world AI recommendations are not manipulated in either of these specific ways. The argument is that the manipulation surface exists and that measurement tools should account for it, not that every visibility score is fake.
The research does NOT establish that AI engines are uniquely bad. Traditional search has the same vulnerabilities, often documented more thoroughly. The argument is that AI compresses the persuasive surface into a smaller number of presented options, which mathematically concentrates per-mention influence, not that the underlying engines are designed maliciously.
The research does NOT establish that brands should stop investing in legitimate content. Content that genuinely informs customers, gets cited by trusted publications, and earns trust is the right input to both Google and AI ranking systems. The argument is that brands and agencies should distinguish between "content that builds genuine authority and gets earned media" and "content optimized purely for AI citation manipulation." The first is good practice. The second is a measurement liability.
How to talk about this with clients who ask "is AI search manipulable?"
The client question that surfaces this research is usually framed as a worry: "is my AEO investment getting gamed by competitors who are running better manipulation programs?" The honest answer has three parts.
Yes, the manipulation surface exists, and competitors with content budgets can move AI citation outcomes for themselves. The Search Engine Land experiment is the most direct evidence.
No, the manipulation is not as universal as the worry implies. Most AI recommendations are not manipulated in the laboratory sense. Most are produced by engines retrieving from a real content corpus and producing a reasonable-but-noisy answer. The 9.2 percent same-day consistency documented elsewhere is more often a noise issue than a manipulation issue.
The defense is the same as the diagnosis. A measurement methodology with disclosed prompts, large samples, and multi-engine coverage neutralizes most of the manipulation surface. If a competitor is gaming citation outcomes, the agency that uses defensible measurement methodology can detect the inconsistency between the competitor's claimed visibility and the platform-reported measurement. The agency that uses vendor-opaque measurement cannot detect the difference.
Frequently asked questions
Did Epstein's SEME finding hold up under replication?
The 20 percent figure was specific to artificial laboratory search engines on political topics. Subsequent replications in less artificial settings have shown smaller effects, generally in the 5-15 percent range for political preference shifts. The directional finding (biased rankings shift preferences subliminally) has held up well; the precise magnitude in real-world settings is contested.
Is the SEME work peer-reviewed?
Yes. The 2015 paper was published in the Proceedings of the National Academy of Sciences, one of the highest-impact journals in the field. The Bink and Draws follow-up studies are also peer-reviewed.
If AI engines are more manipulable, should brands invest more or less in AEO?
The argument cuts both ways. Brands with content budgets can move AI citation outcomes for themselves. Brands that ignore the channel cede the citation surface to competitors who do not. The right amount of investment depends on the brand's category and the binding constraint, which we covered in our piece on AEO criticism engagement.
Can methodology transparency actually prevent manipulation?
Not entirely. A determined adversary with a sufficient content budget can move citation outcomes regardless of how the measurement tool is built. Methodology transparency makes manipulation expensive enough that most brands will not pursue it, and gives the measurement tool the ability to detect inconsistencies that suggest active manipulation. It is harm reduction, not elimination.
Where does the Search Engine Land fake brand experiment fit in the research record?
It is the most direct modern evidence that AI citation surfaces are manipulable, but it is an industry experiment rather than peer-reviewed research. The finding is directionally important and consistent with the SEME literature, but it should be treated as preliminary evidence rather than established science.
Is GenPicked's measurement resistant to manipulation?
We use blind prompts (no brand anchoring), multi-engine coverage with documented weights, and sample sizes designed to overcome per-query variance. These choices neutralize the most direct manipulation levers. We do not claim immunity to all manipulation; we claim methodology transparency that makes the remaining manipulation surface visible to the buyer.
Related reading
- Why most AEO tools won't show you their engine weights
- Share of Model: the AEO metric everyone wants, and why almost nobody measures it defensibly
- The AEO critics have a point. Here is where they are right, and where they are wrong
- AI Search Divergence: Why Your Google Ranking Does Not Predict Your AI Citations
Test your platform's manipulation resistance
The fastest way to evaluate whether your current AEO platform's visibility score is manipulation-resistant is to ask the vendor the three procurement questions in this article. If the answers are not documented in writing, the score may be measuring marketing spend as much as market position.
Run a free GenPicked AEO audit to see multi-engine, blind-prompt visibility data with the methodology disclosed.
Start your 14-day free trial of GenPicked Growth →
Dr. William L. Banks III is Founder of GenPicked. References to Epstein 2015, Granka 2004, Pan 2007, Bink 2022 and 2023, Draws 2021, Ursu 2018, Flavian 2023, Azzopardi 2021, and Search Engine Land 2026 are documented in the underlying research wiki. Specific citations available on request.