Module 3: The Biases in AI Search

Popularity Bias: The Rich Get Richer

Joseph K. Banda·9 min read

Popularity Bias, The Rich Get Richer

In this lesson from GenPicked Academy, you will learn: What popularity bias is, why AI recommendation systems inherit it from their training data, how feedback loops make it worse over time, and why it creates an uneven playing field that small and mid-tier brands have to engineer around.

Where you are in the curriculum

This is Lesson 3.2 of Module 3: The Bias Problem. In Lesson 3.1, we covered sycophancy, the way AI models agree with the user. Now we turn to a bias that operates before the user even shows up: the AI's preference for brands that were already popular when the model was trained.

The one-sentence version

Popularity bias is the pattern where recommendation systems, including AI, systematically over-recommend things that are already popular, regardless of whether the popular option is actually the best fit for the person asking.

The name captures the logic. The rich get richer. What is already visible becomes more visible. What is already obscure stays that way. And in AI search, the dynamic is sharper than it was in classical search, for reasons we will walk through.

The everyday analogy

Imagine you ask a friend for a restaurant recommendation and all they can think of is the three places you already hear about every week. Not because those three places are the best. Because those are the places everyone talks about, which is why your friend knows them, which is why they came to mind first.

Now scale that up. Replace your friend with a language model that was trained on billions of web pages, and imagine that the pages mentioning the dominant brands outnumber the pages mentioning the niche ones by a hundred to one. Of course the model will reach for the dominant brand first. That is what it has the most evidence for.

Where the bias comes from

Popularity bias in AI is not a design choice. It is a byproduct of how models are trained.

When you train a large language model, you feed it a snapshot of the internet. In that snapshot, dominant brands are mentioned many times more often than niche alternatives. Salesforce shows up in more CRM articles than Close.io. HubSpot shows up in more marketing-automation threads than Customer.io. Mailchimp is in more email-marketing discussions than Loops. The model learns the frequency distribution of the training data, which means it learns to treat the more-mentioned brands as more representative of the category.

Klimashevskaia et al. (2024) surveyed the full research literature on popularity bias in recommender systems and found that the pattern holds across every major architecture, collaborative filtering, content-based systems, hybrid systems, and now generative AI. It is not a quirk of any one approach. It is what happens whenever an algorithm learns from a world that was already uneven. Industry research by Amazon (2024) then quantified the LLM-specific case: language models over-recommend popular items by a factor that exceeds classical collaborative-filtering baselines, the bias is not just inherited, it is amplified.

AEO Claim, ChatGPT exhibits at least four simultaneous biases in recommendations Deldjoo (2024) evaluated ChatGPT as a recommender system and documented four distinct biases operating at once: popularity bias, genre bias, recency bias, and temporal instability. Recommendations skewed toward already-popular items across every category tested, and the same model produced meaningfully different recommendations on different days for the same query. Source: deldjoo 2024 chatgpt recsys biases.

Why recency is the other half

Popularity bias has a sibling: recency bias. Models weight recent information more heavily than older information, partly because recent content is over-represented in the most recent training data and partly because retrieval-augmented systems prefer fresh sources.

For AEO, this means two things. First, a brand that got a burst of press coverage in the last six months before a training cutoff will appear disproportionately often compared to a brand that has been steady for a decade. Second, when AI models use live web retrieval, which most consumer AI products now do, the top-ranked recent articles dominate the input the model is summarizing. If the news cycle has been kind to a brand, the AI inherits that framing.

The feedback loop problem

This is the part that makes popularity bias genuinely dangerous over time.

When an AI recommends a brand, that recommendation produces downstream effects. Users click on the recommended brand's site, which generates traffic signals. Users mention the recommendation in their own content, which gets indexed. The brand's PR team notices the recommendation and writes a case study about it, which becomes training data for the next model generation. Every recommendation is a signal the next model uses to decide what to recommend.

Chaney (2018) called this dynamic algorithmic confounding: feedback loops train models on data that was itself shaped by prior recommendations, so the "preferences" the model is learning are partly its own reflections. Mansoury et al. (2020) showed that this amplification is not linear but accelerating, each cycle concentrates attention on popular items faster than the last one did. Abdollahpouri (2020) situates the ethical stakes: this feedback-driven concentration harms long-tail providers and is self-reinforcing without deliberate counter-measures.

AEO Claim, Popularity bias accelerates through feedback, not linearly Mansoury et al. (2020) demonstrated that feedback-loop amplification in recommender systems is non-linear: each training cycle increases the concentration of recommendations on already-popular items at a faster rate than the previous cycle. Even systems with moderate initial bias can develop severe distortion within a small number of training generations. Sources: mansoury 2020 feedback loop amplification; abdollahpouri 2020 popularity bias ethics.

Popularity bias is not uniform

A critical finding from Abdollahpouri et al. (2019): popularity bias does not hit every user equally. Users who prefer niche items lose the most accuracy from the bias. Users who already prefer popular items barely notice it. That asymmetry matters for AEO because the brands most harmed by popularity bias are the small and mid-tier challengers, the exact brands that often have the strongest case to be mentioned on merit rather than on name recognition.

Wang and Russakovsky (2021) added a second layer: bias amplification is directional. Models do not just over-weight popular items; they suppress the associations that would have surfaced niche alternatives. This is the mechanism that makes "best of" lists from AI feel strangely narrow, the top five always seem to be the same five, across models, across phrasings. The model is not being lazy. It is being faithful to a training distribution that never heard of the sixth option.

What this means for your measurement design Aggregate "brand visibility" scores mask the non-uniform shape of popularity bias. A score that improves for a dominant brand tells you almost nothing about whether a challenger brand has become more visible. The Module 5 lessons on Bradley-Terry ranking walk through how pairwise methods expose the distortion that aggregate scores hide.

What this means for brand measurement

Two practical consequences.

First, raw mention counts are not a clean quality signal. A brand that gets mentioned more in AI responses may be getting mentioned because it is the better fit for the user's question, or because the training corpus had a lot of it. Without controls, you cannot tell which.

Second, improvements in AI visibility are harder for smaller brands than the topline numbers suggest. A 10% improvement for a dominant brand and a 10% improvement for a challenger are not the same unit of work. The challenger is pushing against a structural headwind that does not apply to the incumbent.

This is why Module 4 and Module 5 emphasize methodology so heavily. Without methods that correct for popularity, the measurement system simply re-describes the existing hierarchy and calls it insight.

Try this

Open ChatGPT or Claude. Ask: "What are the top 10 CRMs for small businesses?" Save the list.

Wait three days. Open a fresh conversation. Ask the same question. Compare the two lists.

You will almost certainly see the same top three or four brands, the dominant ones, in both lists. You will also almost certainly see the 7th, 10th slots shuffle between runs. That pattern is popularity bias at work in the top slots and stochastic variation at work in the bottom slots. Both are real, and both are things a serious AEO measurement system has to correct for.

Three takeaways

Popularity bias is inherited from the training data. It is not a design choice; it is a byproduct of training on a world that was already uneven.
Feedback loops make it accelerate. Each new model generation concentrates on popular items faster than the last.
The bias hurts challengers more than incumbents. Aggregate scores mask this. You need segmented, pairwise methods to see the real picture.

What's next

In Lesson 3.3, we cover position bias, why the order of items in a list changes how people and models treat them. You will learn why the first name in a list gets outsized weight, and why a methodology called Latin Square counterbalancing exists to cancel the effect out.

Reflection prompt

Pick a category where you know the brand landscape well, CRMs, project tools, email platforms, whatever you work with daily. Ask an AI for the top ten. Now look at the list. Of the ten brands the model named, how many deserve to be there on merit, and how many are there because they had the loudest decade? That gap is popularity bias, and it is shaping what your customers see when they ask the same question.

About this course

This lesson is part of AEO A to Z, the open course on Answer Engine Optimization published by GenPicked Academy. GenPicked Academy is where practitioners learn to measure AI recommendations with the same rigor a clinical trial demands: blind sampling, balanced question sets, and confidence intervals that hold up.

About the author: Dr. William L. Banks III is the lead researcher at GenPicked Academy and the architect of the three-layer AEO measurement architecture taught in this course. His work on sycophancy, popularity bias, and construct validity in AI search informs every lesson you just read.

See the methods in practice: GenPicked runs monthly brand-intelligence audits using the exact pipeline taught in Module 6. Read the case studies and audit walkthroughs on the GenPicked blog.

Knowledge check · ungraded

Check your understanding before moving on

1. "Popularity bias" in AEO refers to:

Users preferring popular brands
AI models over-indexing on the most-mentioned brands in their training data
Marketing teams preferring popular tools
Reddit upvotes affecting ranking

← Previous lesson

why-ai-answers-change-every-time

Next lesson →

Continue the curriculum