Your sitemap is configured. Your Core Web Vitals score is green. Your product catalog is perfectly structured. And yet when a user asks ChatGPT for products you sell, your store doesn’t appear.
Most of the time, the reason is a single file: robots.txt.
Specifically — a robots.txt written for Google in 2019 and never updated for the AI crawlers that now determine your visibility in ChatGPT, Gemini, Claude, and Perplexity. In 2026, there are ten distinct AI bots across four platforms. Most Magento installations have explicit rules for zero of them.
Why robots.txt Is AEO Signal #1
The angeo/module-aeo-audit checks robots.txt first and marks it Critical because it is a gate. Every other AEO signal — llms.txt, Product schema, AI product feed — is irrelevant if the AI crawler cannot reach your store in the first place.
OpenAI states this without ambiguity:
“Sites that are opted out of OAI-SearchBot will not be shown in ChatGPT search answers.”
Not “may not appear.” Will not appear. If OAI-SearchBot is blocked — by an explicit Disallow or caught in a wildcard rule — your store is excluded from ChatGPT search answers regardless of everything else you do.
The Three Types of AI Bots — Why the Difference Matters
Before listing every bot, you need to understand what each one actually does. AI crawlers fall into three distinct categories with very different purposes — and conflating them causes the most common robots.txt misconfiguration.
| Type | What it does | Examples | Ecommerce recommendation |
|---|---|---|---|
| Search & indexing bots | Builds the live index used when users ask AI questions. Cites sources, links back to your store. Directly drives product discovery. | OAI-SearchBot, Claude-SearchBot, PerplexityBot, Google-Extended |
Always allow |
| User-initiated fetchers | Fetches your page when a specific user asks AI to visit a URL directly. May cite your product page in the response. | ChatGPT-User, Claude-User, Perplexity-User |
Always allow |
| Training crawlers | Collects content to train future AI models. No attribution, no direct traffic back to your store. | GPTBot, ClaudeBot, Applebot-Extended |
Your choice |
The most common mistake: blocking GPTBot (training) while believing it removes you from ChatGPT search results. It does not. GPTBot and OAI-SearchBot are entirely separate bots with separate purposes. Blocking training crawlers has zero effect on AI search visibility — but blocking search crawlers makes you invisible immediately.
Every AI Bot That Matters for Magento in 2026
| Bot | Platform | Type | Impact if blocked |
|---|---|---|---|
OAI-SearchBot |
ChatGPT / OpenAI | Search index | Invisible in all ChatGPT search answers and product recommendations. Most critical. |
GPTBot |
ChatGPT / OpenAI | Training | Content excluded from future GPT training data. Does not affect current search visibility. |
ChatGPT-User |
ChatGPT / OpenAI | User-initiated | ChatGPT cannot fetch your pages when a user requests them directly. |
Claude-SearchBot |
Claude / Anthropic | Search index | Invisible in Claude’s real-time web search answers. |
ClaudeBot |
Claude / Anthropic | Training | Content excluded from future Claude training data. |
Claude-User |
Claude / Anthropic | User-initiated | Claude cannot fetch your pages when a user requests them directly. |
PerplexityBot |
Perplexity | Search index | Invisible in Perplexity answers and product recommendations. |
Perplexity-User |
Perplexity | User-initiated | Perplexity cannot fetch your pages for direct user requests. |
Google-Extended |
Gemini / Google | Search index + training | Not cited in Gemini AI Overviews or Google Shopping AI features. |
Applebot-Extended |
Apple Intelligence | Training | Content excluded from Apple Intelligence training data. |
anthropic-ai |
Anthropic | Deprecated | Legacy name for ClaudeBot. Keep rules for backwards compatibility. |
Anthropic expanded to three bots in early 2026. Sites that only reference ClaudeBot in robots.txt are now missing Claude-SearchBot (live search) and Claude-User (user-initiated fetching). If your robots.txt was last updated before 2026, this almost certainly applies to your store.
The Default Magento robots.txt Problem
Magento’s default robots.txt starts with a wildcard block:
User-agent: * Disallow: /index.php/ Disallow: /*? Disallow: /checkout/ Disallow: /app/ ...
This wildcard establishes a baseline that every bot inherits. If your deployment script, hosting provider, or a staging migration has added Disallow: / anywhere — AI bots are caught in it silently, with no error logged anywhere.
Check this right now. Open https://yourstore.com/robots.txt in a browser and look for Disallow: / on its own line. If it exists without an explicit Allow: / for each AI bot listed above it — every one of those bots is blocked. This affects the majority of Magento stores checked by the AEO audit module.
Where Magento Stores robots.txt — Two Scenarios
Before editing, identify which method your store uses to serve the file. Editing the wrong one has no effect.
Scenario A: Magento Admin (most common)
Magento can serve robots.txt dynamically from the database. Check whether this is active:
# Check if Magento manages robots.txt
bin/magento config:show design/search_engine_robots/default_robots
If it returns a value, Magento owns the file. Edit it via: Content → Design → Configuration → [Store view] → Edit → Search Engine Robots → Edit custom instruction of robots.txt.
Scenario B: Static file in pub/
If Magento Admin changes don’t show up at yourstore.com/robots.txt, a physical file is taking precedence. Check for it:
# Check if a static file exists and is being served ls -la /var/www/html/pub/robots.txt curl -I https://yourstore.com/robots.txt # If no X-Magento headers appear, the file is served statically
Edit pub/robots.txt directly, or remove it to let Magento’s Admin configuration take over.
Multi-store installations: Each store view can have its own robots.txt in Magento Admin. If you run multiple stores on different domains or subdomains, configure each one separately under Content → Design → Configuration → [select store view]. Do not assume one configuration covers all stores.
The Complete robots.txt Configuration for Magento 2
This is the recommended configuration for Magento ecommerce stores in 2026. It explicitly allows all AI search and indexing bots, gives you a clear choice on training crawlers, and keeps Magento-specific paths protected.
# ============================================================ # AI SEARCH & INDEXING BOTS — Allow (critical for visibility) # These build the index ChatGPT, Claude, Gemini, and Perplexity # use to answer product discovery questions. # Blocking these makes your store invisible in AI search answers. # ============================================================ User-agent: OAI-SearchBot Allow: / User-agent: Claude-SearchBot Allow: / User-agent: PerplexityBot Allow: / User-agent: Google-Extended Allow: / # ============================================================ # USER-INITIATED FETCHERS — Allow # Fetch your pages when a user directly asks AI about a URL. # ============================================================ User-agent: ChatGPT-User Allow: / User-agent: Claude-User Allow: / User-agent: Perplexity-User Allow: / # ============================================================ # TRAINING CRAWLERS — your choice # These collect content for model training. No direct traffic # back, no attribution. Blocking them does NOT affect search # visibility. Change Allow to Disallow if you prefer to opt out. # ============================================================ User-agent: GPTBot Allow: / User-agent: ClaudeBot Allow: / User-agent: anthropic-ai Allow: / User-agent: Applebot-Extended Allow: / # ============================================================ # TRADITIONAL SEARCH ENGINES # ============================================================ User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / # ============================================================ # ALL OTHER BOTS — Standard Magento rules # AI bots above are explicitly allowed before this wildcard. # ============================================================ User-agent: * Allow: / # Magento paths — block from all crawlers Disallow: /admin/ Disallow: /adminhtml/ Disallow: /api/ Disallow: /rest/ Disallow: /graphql Disallow: /cron.php Disallow: /var/ Disallow: /lib/ Disallow: /dev/ Disallow: /index.php/ Disallow: /*?SID= Disallow: /*?___store= Disallow: /checkout/ Disallow: /customer/ Disallow: /wishlist/ Disallow: /review/ # ============================================================ # SITEMAPS — helps all crawlers discover your pages # ============================================================ Sitemap: https://yourstore.com/sitemap.xml Sitemap: https://yourstore.com/llms.txt
Order is not optional. robots.txt uses first-match semantics per crawler. A bot reads the file top to bottom and stops at the first User-agent block that matches it. If User-agent: * with Disallow: / appears before the AI bot entries, those AI bots are permanently blocked — the rules below are never reached. The AI bot entries must always appear before the wildcard block.
How to Update robots.txt in Magento Admin
yourstore.com with your actual domain in the Sitemap lines.bin/magento cache:flushhttps://yourstore.com/robots.txt in a browser and confirm OAI-SearchBot, Claude-SearchBot, and PerplexityBot all have Allow: /. If the file hasn’t changed, Scenario B above applies — a static file is overriding the Admin configuration.How to Update robots.txt via SSH
If Magento Admin is not managing the file, or if you prefer a direct file edit:
# SSH into your server ssh user@yourserver.com # Navigate to Magento pub directory cd /var/www/html/pub # Backup existing file cp robots.txt robots.txt.backup.$(date +%Y%m%d) # Edit the file nano robots.txt # Paste the configuration, save with Ctrl+O → Enter → Ctrl+X # Verify it's live curl https://yourstore.com/robots.txt | grep -E "OAI-SearchBot|Claude-SearchBot|PerplexityBot"
Four Mistakes That Block AI Bots in Magento
Mistake 1 — Disallow: / left on from a staging environment
Staging environments use Disallow: / to prevent Google indexing. This is frequently copied to production during deployments and never removed.
Where it comes from:
- Magento Admin: Stores → Configuration → General → Design → Search Engine Robots
- Static file: Manually edited
pub/robots.txtfrom a staging copy - Deployment scripts: CI/CD pipelines that sync the full staging filesystem to production
- Hosting provider defaults: Managed hosts (Hypernode, Nexcess, Cloudways) sometimes apply restrictive defaults on new environments
# Quick check — if this returns output, you have a problem
curl -s https://yourstore.com/robots.txt | grep -n "^Disallow: /$"
Mistake 2 — Wildcard block placed before AI bot rules
# WRONG — wildcard fires first, everything below is ignored User-agent: * Disallow: / User-agent: OAI-SearchBot Allow: / # never reached — bot already matched the wildcard above # CORRECT — explicit AI rules before the wildcard User-agent: OAI-SearchBot Allow: / User-agent: * Disallow: /checkout/
Once a bot matches a User-agent block, it stops reading. The Allow: / for OAI-SearchBot placed after User-agent: * is never evaluated.
Mistake 3 — Using outdated Anthropic bot names
# Deprecated — these names no longer reflect Anthropic's bot infrastructure User-agent: Claude-Web # retired 2024 User-agent: Anthropic-AI # retired 2024 # Current names (2026) User-agent: ClaudeBot # training User-agent: Claude-SearchBot # live search index User-agent: Claude-User # user-initiated fetching
Stores that added Anthropic rules in 2024 and haven’t updated since are missing Claude-SearchBot entirely — the bot responsible for Claude’s real-time web search answers.
Mistake 4 — robots.txt served as a cached or static file
Some infrastructure configurations bypass Magento completely when serving robots.txt:
- Nginx static rule: serves
pub/robots.txtbefore the request reaches Magento - CDN cache: Cloudflare or Fastly cache the old file with a long TTL
- Varnish: cached response served without hitting the application
Diagnosis:
# If Magento headers are absent, the file is served statically curl -I https://yourstore.com/robots.txt # Look for: X-Magento-Cache-Control, X-Magento-Tags, or similar
Fix for Nginx: ensure this location block is present in your server config:
location = /robots.txt {
try_files $uri $uri/ /index.php$is_args$args;
}
Fix for Cloudflare: purge the cache for /robots.txt via the Cloudflare dashboard, or set a Cache Rule to bypass caching for that path.
How to Verify the Fix
Option 1 — Manual curl check
# Check all critical AI search bots have Allow: / curl -s https://yourstore.com/robots.txt | grep -A1 -E \ "OAI-SearchBot|Claude-SearchBot|PerplexityBot|Google-Extended" # Check server logs for AI crawler activity (confirms bots are visiting) grep -Ei "OAI-SearchBot|Claude-SearchBot|PerplexityBot|Google-Extended" \ /var/log/nginx/access.log | tail -20
Option 2 — AEO audit module (checks all bots + order validation)
# Install once, run any time
composer require angeo/module-aeo-audit
bin/magento setup:upgrade
bin/magento angeo:aeo:audit
✓ PASS robots.txt — AI Bot Access
OAI-SearchBot ✓ Claude-SearchBot ✓ ChatGPT-User ✓
ClaudeBot ✓ Claude-User ✓
PerplexityBot ✓ Google-Extended ✓
The audit checks not just whether each bot is listed, but whether the order of rules is correct — validating that AI bot entries appear before any wildcard Disallow directives.
Checking multiple stores
# Run per store view for multi-store installations
bin/magento angeo:aeo:audit --store=de
bin/magento angeo:aeo:audit --store=fr
bin/magento angeo:aeo:audit --store=en
After robots.txt — What’s Next
robots.txt is the access layer — it gets bots into your store. Once AI crawlers can reach your pages, the signals that determine whether you actually appear in AI product answers are:
- llms.txt — a curated, machine-readable index of your store structure and top products, formatted for LLMs. Install with angeo/module-llms-txt.
- Product JSON-LD schema — structured markup that tells AI exactly what your products cost, what they are, and where to buy them. Checked by angeo/module-aeo-audit.
- AI product feed — structured CSV/JSON feed required for ChatGPT Shopping eligibility. Generate with angeo/module-openai-product-feed.
- FAQPage schema — increases citation probability in conversational AI answers for category and how-to queries.
See all eight AEO signals and your complete score in one command:
bin/magento angeo:aeo:audit
Check all 8 AEO signals for your Magento store — robots.txt, llms.txt, schema, product feed, and more.
0 Comments