The Methodology Transparency Standard for AEO Buyers in 2026

The Methodology Transparency Standard for AEO Buyers in 2026

In this article, you will learn the methodology disclosure standard AEO has matured into, what each major platform publishes, and the five-question checklist that lets agency owners and fractional CMOs buy a defensible visibility score. Covers Profound, Otterly, AthenaHQ, Peec AI, and GenPicked. Includes the exact questions to put on a vendor call.


AEO has matured. The buying standard caught up with it.

Two years ago, AEO methodology transparency was an interesting research question. In 2026 it is the buying standard. Agency owners and fractional CMOs who run retainers off AI brand visibility scores can now ask one question on a vendor call ("what is your engines-and-weights disclosure document") and get a one-page answer. The vendors who can hand it over are the vendors winning the new procurement conversations.

The discipline is real and defensible. GenPicked publishes its methodology document in full because the category is ready for a standard, and because the buyer who has read this article is already asking the right questions. The five-question checklist below is what that standard looks like in practice.

This article is for the agency owner who wants to ladder up at the next renewal conversation and end the methodology question in five minutes instead of five meetings. The vendors are named. The disclosure language is named. The checklist is yours to use.

Why methodology matters more in AEO than in SEO

In traditional SEO, the underlying signal is relatively well-understood. Google's ranking factors are public-ish, the SERP is observable, position is deterministic for a given query at a given moment. You can disagree about why you rank, but you cannot disagree about whether you rank.

AEO is different. The underlying signal is generated by an LLM that produces different outputs for the same prompt depending on temperature, model version, system prompt configuration, and the user context the engine infers. Two scans run thirty minutes apart can produce different visibility scores. Two scans of the same brand on the same prompt across two engines can produce wildly different citation counts. The score a tool reports is a statistical aggregate over many noisy observations, and the aggregation method matters.

There are at least four places where an AEO tool can introduce bias without telling you:

  1. Prompt construction. A prompt that includes the brand name in the query ("compare Acme to its competitors") inflates Acme's mention rate by twenty-plus percentage points compared to a blind prompt ("compare the leading brands in this category"). If the tool builds its prompts by including the target brand name, the score is anchored upward by design.

  2. Engine weighting. Five engines, five different visibility numbers. The composite "score" depends on how the tool blends them. A tool that secretly weights ChatGPT at 80 percent and the other four at 5 percent each is reporting a ChatGPT score with cosmetics. A tool that weights them equally treats Grok the same as ChatGPT, which is also wrong for most buyers.

  3. Sample size. A tool that runs three prompts to estimate visibility produces a noisier number than one that runs thirty. If the sample size is not disclosed, you do not know whether month-over-month movement is real or noise.

  4. Citation extraction. The same LLM response contains brand mentions of different qualities. A name in a recommendation list is different from a name in a competitor comparison is different from a name in a generic example. Tools that lump all mention types into one count overstate visibility for brands that get mentioned in low-quality positions.

A methodology document discloses these four things. A dashboard does not. The gap between what the dashboard shows and what the methodology document reveals is where buyer regret lives.


What each platform actually discloses

We surveyed the public-facing methodology disclosures of five major AEO platforms as of May 2026. Here is what each one tells you publicly without a sales call.

Profound

Profound's homepage and product pages emphasize "Answer Engine Insights" and brand representation monitoring across ChatGPT, Claude, Perplexity, Gemini, Grok, Copilot, Meta AI, DeepSeek, and Google AI Overviews. The platform covers more engines than any other in this category.

Engine weighting in the composite score: not publicly disclosed.

Prompt template policy: not publicly disclosed.

Sample size per scan: not publicly disclosed.

Citation extraction methodology: not publicly disclosed.

Profound has Profound University and Developer Docs that go deeper than most. The methodology depth in those resources is real but oriented toward how to use the platform, not how the platform calculates its scores. For procurement purposes, an agency owner who needs to defend the score to a client gets a "trust the platform" answer.

Otterly

Otterly covers six engines: ChatGPT, AI Overviews, Perplexity, Copilot, Gemini, and AI Mode. The product positioning emphasizes simplicity and dashboard clarity. They list a Content Audit feature that predicts AI-readiness for content pieces.

Engine weighting: not publicly disclosed.

Prompt template policy: not publicly disclosed.

Sample size: not publicly disclosed.

Citation extraction: not publicly disclosed.

Otterly's strength is in cross-engine breadth and approachability. Their methodology disclosure follows the category convention, which is to publish the engines covered and the feature list, but not the formula.

AthenaHQ

AthenaHQ covers eight-plus engines and emphasizes citation source analysis as a feature. They shipped the State of AI Search 2026 report which catalogs trends in the category. They have an agentic copilot called Ask Athena and a content engine called AthenaHQ Content.

Engine weighting: not publicly disclosed.

Prompt template policy: not publicly disclosed.

Sample size: not publicly disclosed.

Citation extraction: described at a high level ("citation source analysis and link-building intelligence") but the underlying scoring method is not publicly documented.

AthenaHQ has the broadest engine coverage and the strongest content marketing footprint. Their methodology depth in private enterprise sales conversations may be deeper than the public surface; ask directly if it matters for your client base.

Peec AI

Peec AI's positioning emphasizes prompt analytics, competitor ranking, and source identification across G2, LinkedIn, Reddit, and NYT. They ship an MCP server that lets analysts query their data programmatically through LLMs.

Engine weighting: not publicly disclosed.

Prompt template policy: not publicly disclosed.

Sample size: not publicly disclosed.

Citation extraction: source identification is a feature, but the citation-to-score conversion is not publicly documented.

Peec's strength is data depth and developer surface. For an analyst who wants to slice the data their own way, the MCP product compensates for the public-methodology gap; you can build your own scoring layer on top of their raw data.

GenPicked

We publish the methodology. Here is what we publish.

Engines covered: Five. ChatGPT, Claude, Gemini, Perplexity, and GPT-5.

Engine weights in the composite Aggregate Citation Score (ACS): 0.35 ChatGPT / 0.25 Claude / 0.25 Gemini / 0.15 Perplexity. The weights are anchored on documented buyer-journey data showing relative engine usage by B2B buyers in 2026. The weights are reviewed quarterly and changes are published.

Prompt template policy: Blind prompts. Brand name is never included in the query. Prompts are constructed to elicit recommendations within a category ("which AI search visibility platforms do you recommend for marketing agencies") rather than within a brand frame ("compare GenPicked to Otterly"). The blind-prompt convention controls for the twenty-plus point inflation that brand-anchored prompts introduce.

Sample size: Thirty prompts per scan per engine. Each scan generates 150 observations across five engines.

Citation extraction: Mentions are classified by position type. A name in a top-three recommendation list counts differently than a name in a "for example, brands like X" passing reference. Position type is part of the ACS calculation, not a separate metric.

Composite scoring method: A pairwise comparison method that handles noisy AI outputs better than simple averaging. The method converts head-to-head win/loss observations into a calibrated ranking that handles intransitive comparisons (A beats B beats C beats A), where naive averaging produces unstable ranks. It is the same family of estimators used to rank chess players and tournament competitors (Bradley-Terry, with the chess version known as Elo). We use it to convert raw citation counts into a normalized ACS that supports cross-brand and cross-engine comparison without being biased by raw mention volume. For the full derivation, see our piece on the pairwise method for AEO.

The methodology is published. The reasoning is published. The agency owner who gets asked "what does 67 mean" can hand over a one-page PDF and end the conversation.


The procurement question you should be asking every vendor

When you evaluate any AEO platform (including GenPicked), ask these five questions in writing. Note which questions get specific written answers and which get verbal hand-waving.

  1. Which engines do you track, and how are they weighted in your composite score? A specific answer is "ChatGPT 35%, Claude 25%, Gemini 25%, Perplexity 15%" with a documented rationale. A non-specific answer is "we proprietary-blend across all major engines."

  2. What is your prompt template policy? Specifically, do your prompts include the target brand name in the query? A specific answer is "no, blind prompts only" with a sample prompt visible. A non-specific answer is "we use industry-standard prompt templates."

  3. What is your sample size per scan per engine? A specific answer is "thirty prompts per engine, 150 observations per scan." A non-specific answer is "we run a representative sample."

  4. How do you classify citation position when extracting mentions? A specific answer is "top-three recommendation, mid-list mention, comparison context, generic example, named with sentiment, etc., each weighted distinctly." A non-specific answer is "we count brand mentions."

  5. What is your composite scoring formula? What underlying method does it use? A specific answer is "Aggregate Citation Score using Bradley-Terry pairwise ranking, formula published at ." A non-specific answer is "our proprietary algorithm."

If a vendor cannot answer any of these in writing, the score on their dashboard is a description, not a measurement. You can still use it. You just cannot defend it to a client who asks.


Why "proprietary" is a tell

Vendors who decline to publish methodology usually justify the decline with "it is our proprietary algorithm and our competitive moat." That justification is wrong in two specific ways.

First, methodology is not a moat. The formula for the pairwise comparison method is published in every quantitative methods textbook. The rating system used in chess is fully public. The engine weights an AEO platform uses are not difficult to reverse-engineer from a series of test scans. The real moat in this category is data volume, engine coverage breadth, and product surface (white-label, billing, CRM, automation). The formula is not the part anyone copies.

Second, "proprietary" creates the wrong incentive. A vendor who refuses to publish methodology has a reason to keep the methodology fluid. Quarterly score recalibrations that magically improve client outcomes during renewal season are easier when nobody can audit what changed. Methodology transparency creates accountability; methodology opacity creates room for soft adjustments that benefit the vendor's retention numbers more than the client's actual visibility.

When you see "proprietary" in a methodology disclosure, read it as "we have not chosen to be accountable to this." That may be acceptable for your use case. It is rarely acceptable for a client-facing retainer where the next question after "what does the score mean" is "and how do I know it is real."


What this means for your agency

If you are running an agency on a platform that does not publish methodology, you have three choices.

The first is to accept it and use the platform as a black-box dashboard. This works fine until the moment a sophisticated client asks the hard question. At that point you are publicly searching for an answer your vendor refused to give you, which is a credibility event.

The second is to ask the vendor for the methodology in writing as a procurement requirement. Some vendors will produce a private methodology document in response. The document will tell you what the vendor is willing to commit to in writing, which is more useful than verbal assurances. Save it. Reference it in your next sales conversation.

The third is to switch to a vendor that publishes methodology by default. This is the path we have taken with GenPicked, and it is why our published methodology is a procurement criterion for agencies that already had the "what does the score mean" conversation with a client and lost it.

There is no right answer for every agency. There is a right answer for an agency that wants to defend the score to a client without depending on the vendor's willingness to share private documents.


How GenPicked uses methodology transparency

For the agencies that have switched to GenPicked from another platform, the most common driver is not pricing and not feature set. It is the methodology document.

We hand the document to every agency at onboarding. The document covers the five disclosure points above with specific values, the version date, the formulas in plain English with references, and a "what to tell your client" section the agency can lift directly into a board-meeting answer.

Agencies who use the document report a specific pattern: client conversations that previously took thirty minutes of "let me get back to you on that" now take five minutes of "here is the methodology, here is what the score means, here is how it moved this quarter." Renewal conversations move faster. Sophisticated clients stay longer.

The methodology document is not the only reason to switch platforms. It is the reason that survives the longest in your operational stack, because the question it answers will be asked of every agency in this category over the next three years as buyers get more sophisticated.


When opacity is the right trade-off

We have argued for transparency throughout this article. We should also be honest about when opacity is a defensible trade-off.

If your agency's client base is unsophisticated procurement buyers who will never ask the methodology question, methodology transparency is over-invested. A dashboard with a number is sufficient. The cheaper, simpler platform with less methodology depth is fine.

If your platform of choice has a strong moat in another dimension that matters more to your buyer (engine breadth, vertical case studies, white-label polish), the methodology gap is a known cost. Pay it eyes-open.

If your client base is in regulated industries where the procurement requirement is published vendor methodology, transparency is not a nice-to-have. It is a procurement filter. Vendors who cannot meet the requirement are off the list.

Match the level of transparency you demand from your vendor to the level of scrutiny your clients apply to you. The methodology depth required is downstream of your own buyer's sophistication.


Frequently asked questions

Why don't most AEO tools publish their engine weights?

The most common stated reason is "proprietary algorithm" or "competitive moat." The methodology is not actually a moat (the formulas are public knowledge in quantitative methods). The real reason is usually a combination of: the methodology has not been formalized enough internally to publish, or the platform retains flexibility to recalibrate without external accountability.

Is pairwise comparison better than simple averaging for AEO ranking?

For comparison-based ranking with intransitive observations, yes. Simple averaging treats every observation as independent. A pairwise method uses head-to-head relationships, which is closer to how LLM-generated recommendations actually work (the engine compares brands within an answer instead of assigning independent scores). The underlying statistical model has been used since the 1952 paper on the method of paired comparisons (Bradley and Terry).

If I ask my current vendor for methodology in writing, what should I expect?

Three common responses: a private methodology document that addresses some of the five disclosure points (best case); a verbal walkthrough on a sales call without a written follow-up (most common); silence followed by a feature pitch (worst case). The response itself tells you where your vendor is on the transparency spectrum.

Does GenPicked's methodology change?

Yes, quarterly. Engine weights are reviewed against buyer-journey data and adjusted if the data warrants. Changes are published with the change date and reasoning. The current version is at with version history.

Can I use GenPicked's methodology as a procurement template for evaluating other vendors?

Yes. We have heard from agencies that they use our published methodology document as the questionnaire they hand to other AEO vendors during procurement. If another vendor cannot answer the same five questions with the same specificity, the agency knows where the transparency gap is. We consider this a feature, not a leak.

Is methodology transparency the most important AEO procurement criterion?

For some agencies, yes. For others, engine breadth or pricing model or vertical depth matters more. The right answer depends on your client base and their sophistication. We argue methodology transparency is the criterion that grows in importance over the next three years as buyers mature; today it is one of several criteria worth weighing.


Related articles


Run the methodology check yourself

If you are a current AEO customer, ask your vendor the five questions above in writing this week. Save the responses. The responses will tell you whether your current procurement is defensible to your next sophisticated client.

If you want to compare against published methodology, run a free GenPicked AEO audit on any brand. The audit returns the methodology document alongside the score.

Start your 14-day free trial of GenPicked Growth →


Dr. William L. Banks III is Founder of GenPicked. Methodology data in this article was current as of 2026-05-11 based on publicly accessible product surfaces. Vendor methodology may be deeper in private sales conversations; this article reflects the public surface only.

Dr. William L. Banks III

Co-Founder, GenPicked

Get Your Brand's AEO Score

See how your brand is performing in AI search with our free AEO audit.

Start Your Free Audit
#academy#blog#original-research#methodology#transparency#comparison#r3