How We Took Magento AI Visibility from 20% to 86% (Case Study)

A live Magento 2 store went from a 20% to an 86% AEO score — and the first jump took under an hour. Here’s the exact sequence of changes, the full audit history, what we deliberately left untouched, and the one thing that quietly pulled the score back down.

TL;DR

A default Magento 2 install scored 20% (“Critical”) for AI visibility — generative search engines were effectively locked out.
With free, open-source modules it reached 86% (“Excellent”), tracked across 33 audits. The first two fixes alone — done in well under an hour — moved it from 20% to 37%.
We expected structured data to do the heavy lifting. The first big jump actually came from a one-line robots.txt change.
The score later eased back to 79% (“Good”) — nothing broke; the files simply went stale. Freshness turned out to matter as much as correctness.

20%Start · Critical

86%Peak · Excellent

79%After drift · Good

Methodology

Before the story, the setup — so the numbers are reproducible rather than anecdotal.

Store: Magento Open Source 2.4.x
Environment: Live demo store
Period: 7–27 June 2026 (calendar)
Active work: Hours, not days
Audits: 33 (29 in rolling 30 days)
Tool: angeo:aeo:audit (open-source CLI)
Signals scored: 15+, individually weighted

A word on timing, since it’s easy to misread. The audits span three weeks of calendar time, but that’s not how long the work took — most days nothing was touched. The hands-on effort was a handful of short sessions: the first two fixes took well under an hour and the full optimization a few hours spread across those sessions. That matches the rule of thumb we quote elsewhere — a typical mid-size store reaches a strong score in roughly 90 minutes of focused work. The calendar span here exists because we also wanted to observe what happens when a store is then left alone (see “the plot twist” below).

The brand-visibility checks rely on querying live AI models directly. As a registered member of the Anthropic Claude Partner Network, we test recall and citation against Claude, ChatGPT, Gemini and Perplexity as part of our day-to-day module work — so these measurements come from hands-on practice with the models, not secondhand reporting.

What we deliberately did not change

To isolate the effect of answer-engine optimization, everything outside it was held constant. We did not touch:

Theme	Hosting / server	Page-speed work
Product copy	URL structure	Classic SEO settings

Whatever moved the score moved because of the AEO layer alone — not faster pages or rewritten content.

What a 20% score actually means

“20%” sounds abstract until you translate it into how generative search treats the store:

→ ChatGPT may never discover the catalog in the first place.

→ Perplexity can crawl the pages but can’t reliably parse the products.

→ AI shopping agents have almost no structured data to trust, so they default to a competitor that does.

A store can rank #1 on Google and still land here. Classic SEO and answer-engine readiness are different disciplines: one optimizes for a ranked list of links, the other for being selected and quoted inside a single generated answer.

The starting point: 20%, nine failures

The first audit was blunt:

AEO Score: 20% — Critical
✓ Pass: 1   ⚠ Warn: 5   ✗ Fail: 9

The default Magento 2 install scores 20% (“Critical”) in the angeo:aeo:audit CLI — the literal out-of-the-box state, not a worst case staged for contrast.

Check	Status	What generative search sees
robots.txt — AI bot access	✗ FAIL	No explicit rules for AI indexers
llms.txt — content map	✗ FAIL	404 — no map of the store
llms.jsonl — machine catalog	✗ FAIL	404 — no structured catalog
sitemap.xml	✗ FAIL	Not in standard locations
Product JSON-LD	✗ FAIL	No product schema on product pages
Merchant policies	✗ FAIL	No schema to attach policies to
Organization schema	✗ FAIL	No brand entity on the homepage
UCP profile	✗ FAIL	404 — no agentic-commerce profile
JSON-LD quality	✗ FAIL	No WebSite, Product or BreadcrumbList

The lone pass was canonical/hreflang consistency, which Magento handles natively. Everything an LLM needs to find, understand and trust the store was absent.

The climb: 20% to 86%

Because every change was re-audited, the trajectory is real telemetry, not a tidy reconstruction.

The AEO score-trend dashboard: a staircase, not a slope. Each sharp step is a high-weight check flipping to PASS as the store climbs from 20% to 86%.

Date	Score	P / W / F	What changed
Jun 7, 18:32	20% — Critical	1 / 5 / 9	Baseline
Jun 7, ~19:00	28%	2 / 5 / 8	`robots.txt` — AI bots allowed
Jun 7, ~19:03	37%	3 / 5 / 7	`llms.txt` + `llms.jsonl`
Jun 12	51%	4 / 7 / 4	Product + Organization JSON-LD
Jun 12–13	60→77%	—	UCP, merchant policies, schema breadth
Jun 13, 23:48	83% — Good	9 / 6 / 0	Last failure cleared
Jun 14+	86% — Excellent	11 / 5 / 0	Full core stack live

The surprise: the cheapest fix moved the most

Going in, we assumed Product schema would dominate the score. It didn’t. The single highest-leverage change was robots.txt.

After a one-line robots.txt change the score jumps from 20% to 28% — the robots.txt check flips to PASS while everything downstream still fails.

Magento’s default file was written for Google years ago and names none of the modern AI indexers. Until it explicitly allows them, OpenAI’s OAI-SearchBot, GPTBot, PerplexityBot, ClaudeBot and Google-Extended never reliably crawl the store — so nothing else you do downstream can even be seen.

We expected structured data to make the biggest difference. The first major jump came from a one-line crawler-access fix. Visibility starts with permission to be crawled — everything else is downstream of that.

The content map: llms.txt + llms.jsonl

The next step generated two files in one command:

bin/magento angeo:llms:generate

Generating llms.txt and llms.jsonl in one command takes the store to 37% — the content-map check now passes with 3 sections and 220 links.

llms.txt is a small Markdown file at the site root that tells an LLM what the store is and where its key pages live. It’s an open proposal (llmstxt.org) that crawlers such as Perplexity have publicly supported. The cleanest way to think about it: where robots.txt tells crawlers what not to index, llms.txt helps reasoning models understand what the site actually contains. Its sibling, llms.jsonl, is a line-delimited catalog where each line is one self-contained product record, which is far easier for a model to ingest than scraping rendered HTML. The run produced a valid llms.txt (3 sections, 220 links) and 221 catalog records.

Structured data carries most of the weight

The heaviest checks (weight 1.0) are Product JSON-LD and the AI product feed. Valid Product schema with AggregateRating and BreadcrumbList, plus Organization schema for brand-entity disambiguation, is what carried the store from “Good” into “Excellent.” By June 14 it held 86% with zero failures.

The plot twist: back down to 79%

Most case studies stop at the peak. Here’s what happened next. On June 27:

AEO Score: 79% — Good
✓ Pass: 9   ⚠ Warn: 7   ✗ Fail: 0

No code changed, yet the score eased to 79%: stale llms files (12 days old) and an old sitemap <lastmod> (233 days) push three checks from PASS to WARN.

Still zero failures, but seven points below the peak — with no code change. Three checks had slipped from PASS to WARN for one shared reason: staleness.

Check	Status	Why
llms.txt	⚠ WARN	12 days old — regenerate via cron
llms.jsonl	⚠ WARN	12 days old — same fix
sitemap.xml	⚠ WARN	Newest `<lastmod>` 233 days old — looks inactive

To a model revisiting the site, a months-old sitemap and a two-week-old content map read as a store that may no longer be trading — so it gets quietly down-weighted against rivals whose data looks live. The fix is mundane: run Magento cron so these files regenerate on a schedule. The lesson is not: an AEO setup is something you maintain, not something you finish.

What “Excellent” unlocked

The score isn’t vanity — the same audit measures real recall inside AI models, and on this store that check passed:

100%Mentioned

67%Recommended

100%URL cited

These rates were measured across 3 representative test prompts sent to the model and scored for whether the store was mentioned, recommended, and cited by URL. It’s a small sample — a directional signal, not a statistical claim — but the direction is unambiguous: a store that was invisible now appears in every test prompt and gets cited each time (overall 87/100, grade B). That’s the line between existing and not existing inside an AI-generated answer.

The three biggest lessons

If you remember nothing else

Visibility starts with crawlability. If AI indexers can’t enter, no amount of schema matters. Fix robots.txt first.
Structured data carries most of the score. Product, Organization and policy JSON-LD are where the weight lives.
Freshness rivals correctness. Stale files decay your score on their own. Cron is not optional.

The replication recipe

Fix robots.txt — allow AI indexers, declare the sitemap. (Best score-per-effort.)
Generate the content map — llms.txt + llms.jsonl.
Enable & verify the sitemap — Marketing → SEO & Search → Site Map.
Add Product JSON-LD — required fields plus AggregateRating and BreadcrumbList.
Add Organization + policy schema — hasMerchantReturnPolicy and shippingDetails have been required by Google and ChatGPT Shopping since Jan 2026.
Publish the UCP profile — /.well-known/ucp for agentic commerce.
Run cron — and keep it running. Everything above decays without it.

Limitations

Read honestly, this study has boundaries worth stating:

• It measures technical AI readiness, not traffic or revenue. A higher score improves discoverability; it does not guarantee a model will recommend you.

• The audit score is not a published ranking factor of any AI system — it’s a proxy for the signals those systems are known to read.

• Results come from a single demo store. For rough context, based on our internal sample of audits, untouched Magento installs tend to cluster near 20–25%, and partially-optimized stores commonly sit in the 45–70% range before deliberate AEO work — but these are observations, not a controlled benchmark.

• AI-search standards move fast — re-audit periodically rather than trusting a one-time result.

What’s next for this store

The roadmap from here is about durability, not one-off gains: automatic llms regeneration on catalog changes, product-feed freshness checks, Merchant Center synchronization, and completing ChatGPT Shopping registration. We’ll re-audit and update this study as those land.

FAQ

What is an AEO score for a Magento store?

An AEO (Answer Engine Optimization) score measures how visible a Magento store is to AI answer engines such as ChatGPT, Gemini, Perplexity and Claude. It checks signals like AI-indexer access in robots.txt, the presence of llms.txt and llms.jsonl, Product and Organization JSON-LD, a fresh sitemap, and a UCP profile, then returns a single weighted percentage from Critical to Excellent.

Why does a default Magento 2 store score around 20%?

Magento’s defaults predate the AI-search era. AI indexers aren’t explicitly allowed in robots.txt, there’s no llms.txt or llms.jsonl content map, product pages emit microdata rather than the JSON-LD that models prefer, and there’s no agentic-commerce profile. It’s the out-of-the-box state, not a bug.

How long does it take to improve a Magento AEO score?

Less time than the calendar suggests. The first two fixes — robots.txt and generating llms.txt/llms.jsonl — take minutes and produce an immediate jump (20% to 37% in this study). A typical mid-size store reaches a strong score in roughly 90 minutes of focused work; the full set of fixes here took a few hours spread across a few sessions. The three-week span in our data is calendar time, not effort — we left the store idle between sessions on purpose to see how the score behaves over time.

Why did the score drop from 86% to 79%?

Stale data, not broken code. The llms files were 12 days old and the sitemap’s newest lastmod date was 233 days old, so three checks dropped from PASS to WARN. Models read stale signals as a possibly-inactive store. Running cron to regenerate the files on a schedule prevents the decay.

Does a higher AEO score guarantee AI recommendations?

No. The score measures technical readiness — whether AI systems can discover, parse and trust the store. That improves the odds of being mentioned and cited, but recommendation also depends on price, relevance and competition. Readiness is necessary, not sufficient.

Where does your store stand?

Run the same open-source audit used throughout this study and get back:

your overall AEO score, Critical → Excellent
every failing and at-risk check
a prioritized list of fixes
AI-visibility recommendations specific to your catalog

bin/magento angeo:aeo:audit

Compare your store against this case study →

All figures come from audits run against the demo store demo.angeo.dev between 7 and 27 June 2026 using the open-source angeo:aeo:audit CLI. Scoring weights reflect the framework as of June 2026; AI-search standards evolve quickly.