← Blog
Tutorial8 min read

Did Your AEO Changes Actually Work? A Practical Verification Guide

AEO is not set-and-forget. Every change you make to improve AI visibility needs a baseline, a control, and a window to evaluate. Without these three things, you are guessing.

C

Bryan

You updated your schema markup, rewrote your FAQ page with structured Q&A blocks, and published three new category guides. Three weeks later, your AI visibility score went up. Did your changes cause that? Without a proper verification framework, you cannot know. This guide gives you that framework.

1. The core problem: AI engines change for many reasons

AI engine outputs are not static. Models get updated. New content gets indexed and weighted. Seasonal patterns affect what sources appear in responses. Changes in user behavior influence what the engine treats as authoritative. A competitor publishes new content that displaces yours. All of these forces affect your brand's AI visibility — completely independently of anything you did.

This means that a visibility improvement three weeks after you made a change does not mean your change caused the improvement. And a visibility decline does not mean your change made things worse. Without a structured experiment design, every AEO action is a hypothesis with no test — and the feedback loop that should be teaching you what works is producing noise.

The solution is to treat every AEO action as an experiment. Experiments require three things before they can produce useful results: a baseline, a control group, and a patient evaluation window.

2. The three requirements for verifiable AEO measurement

01

A stable baseline

Before you make any change, run your target prompt set multiple times over at least one week and record the results. You need to know the pre-change visibility state for the specific prompts that your change is intended to affect. A single pre-change measurement is not a baseline — it is a data point. Multiple measurements across different days give you a range that accounts for normal result variance.

02

Control prompts

Control prompts are queries that your change should NOT affect. If you restructured your pricing page with FAQ schema, the control prompts are queries about topics that page does not cover — unrelated category questions, competitor queries, topic areas outside your change scope. If your visibility improves on control prompts at the same rate as affected prompts, the improvement is likely from an external force (model update, seasonal shift) rather than your change.

03

Patience: an appropriate evaluation window

AI engines do not respond to content changes instantly. Web crawlers need to re-index your changes. Model updates need to incorporate newly indexed content. Source weighting needs to shift. Different types of changes propagate at different speeds — and the important discipline is not to check results too early and draw false conclusions from incomplete data.

3. How fast do different changes propagate?

Not all AEO changes move at the same speed. Understanding the rough propagation pattern for different change types helps you set realistic evaluation windows and avoid premature conclusions.

Change TypePropagation PatternRecommended Check-In
Schema / structured data (FAQPage, Organization)Fastest — picked up at next crawl cycleCheck after 2–3 weeks
Page restructuring (headers, Q&A blocks, tables)Moderate — depends on crawl frequency and re-indexingCheck after 3–5 weeks
New page creation (FAQ page, comparison page)Moderate-to-slow — new pages need crawling + authority signalsCheck after 4–6 weeks
Off-site entity normalization (Wikidata, LinkedIn)Slow — knowledge graph updates have variable propagationCheck after 6–10 weeks
Third-party coverage (reviews, editorial, Reddit)Slow — requires external indexing and weighting shiftCheck after 6–12 weeks

These are approximate ranges, not guarantees. Engine-specific factors, your domain's existing crawl frequency, and whether a model update happens to coincide with your check-in window all affect actual propagation time. The table is a planning guide, not a precise timeline.

4. A template for a verifiable AEO experiment

Write this down before you make any change. Keep it in a shared doc so you can refer back to it when you check results.

AEO Experiment Record

Change made: [Specific description of the change — what page, what was changed, what was added]
Date of change: [Date]
Expected mechanism: [Why do you expect this to affect visibility? What signal does this change send to AI engines?]
Affected prompts: [List of 3–5 specific prompts that should respond to this change]
Control prompts: [List of 3–5 prompts that should NOT respond to this change]
Metric to measure: [Mention rate? Citation rate? Position? Sentiment?]
Baseline (pre-change): [Record baseline values for both affected and control prompts]
Check-in date: [Set based on change type from the table above]
Success criterion: [What specific improvement would you consider a confirmed success?]

5. Common mistakes that invalidate AEO verification

Single-shot measurement

Checking once is not a measurement — it is a sample. AI engines have natural variance in their outputs. Check your affected prompts at least 3 times over your evaluation window and average the results.

Checking too soon

If you make a schema change and check results three days later, you are almost certainly looking at noise. Changes need time to propagate through crawl cycles and model updates. Patience is part of the methodology.

Changing multiple things at once

If you restructure your FAQ page, add schema markup, and publish two new guides in the same week, you cannot attribute a subsequent visibility change to any specific action. Sequence your changes if you want clean signal about what works.

No control group

Without control prompts, any change in your visibility score — up or down — could be a model update. Control prompts are the mechanism for separating your changes from background noise.

Wrong prompts for the change type

If you restructured your pricing page and then check visibility on category discovery queries, you are measuring the wrong thing. Match your measurement prompts to the query intent that your specific change is designed to affect.

Measure What Changes

Track the impact of every AEO action in Citany

Citany's experiment tracking records your baseline, your change date, and your results — so you can see exactly which actions moved your AI visibility and which ones did not.

Stay updated

Follow the journey on X

Weekly threads with raw data, hypothesis tests, and the honest story of building AI visibility from zero.