On March 16, I typed “best tools to monitor brand visibility in AI search” into ChatGPT. Profound showed up. Otterly showed up. Brand24 showed up. Citany — the tool I'd been building to solve exactly this problem — didn't appear once.
That was Day 2 of a 90-day experiment I'm running publicly: can a brand go from zero AI mentions to consistently cited across multiple engines? The answer, I'm finding, depends almost entirely on whether you're actually measuring the right thing. Most brands aren't.
Here's what I've learned about monitoring brand mentions in ChatGPT — from doing it manually before building the automated version.
First: ChatGPT traffic doesn't show up in your analytics. At all.
When someone asks ChatGPT about your category and clicks through to your site, Google Analytics records it as direct traffic. Not a referral from chat.openai.com. Direct. It looks identical to someone who typed your URL from memory.
ChatGPT doesn't pass referrer headers the way search engines do. So your analytics dashboard isn't broken — it's just blind to this channel. The consequence: AI-influenced visits are already happening to brands in your space, and nobody can see the source in their dashboards.
The number that broke my brain when I first saw it
Google search results and ChatGPT citations overlap by only 8%. Ranking #1 on Google tells you almost nothing about whether ChatGPT mentions you. They're two separate lists, curated by different signals. This is why brands with solid SEO are often completely invisible in AI search.
What “monitoring” actually means here
Most people hear “monitor brand mentions in ChatGPT” and think: search for my brand name, see if it comes up. That's not wrong, but it only covers a tiny slice of what matters.
The real signal is what ChatGPT says when your potential customers ask category questions — without your brand name in the prompt. “What are the best tools for X?” “How do I solve Y?” “Compare A vs B.” These are the queries where buyers are forming opinions, and where your brand either shows up or doesn't.
Monitoring means running those prompts consistently — same questions, same cadence, every week — and tracking whether you appear, where you rank in the response, how you're framed, and which sources get cited. One check tells you almost nothing. Eight weeks of weekly data tells you whether you're gaining or losing ground.
How to do it manually (this is what we did for the first two weeks)
Before building any automated tooling, I ran this by hand. It's tedious but it works, and doing it manually for a few weeks gives you intuitions you won't get any other way. Here's exactly how:
Build a prompt list around buyer intent, not your brand name
Write 15–20 questions that real buyers in your category would actually ask. Category discovery ("best tools for X"), comparison ("X vs Y"), problem-aware ("how do I fix X?"). Don't include your brand name in the prompts — those only measure what ChatGPT thinks of you once someone already knows you exist. The high-value signal is whether you show up when your brand has to earn its place.
New session for every single prompt — no exceptions
Always start a fresh conversation for each prompt. ChatGPT uses context from earlier in a thread, so running 10 prompts in one session gives you contaminated results. New conversation every time. Same model (e.g. GPT-4o), noted in your sheet. Private browsing helps avoid personalisation, though how much it matters is debated.
Record five things per run
For each prompt, capture: (1) did your brand appear, (2) what position — 1st brand mentioned, 2nd, etc., (3) which competitors appeared and where, (4) any URLs cited as sources, (5) the exact framing — positive, neutral, or qualified with a caveat. Paste the full response too. Phrasing changes over time and it's useful to track.
Same prompts, every week, without changing the wording
The whole value of this is the time series. Change a prompt mid-run and you break the comparison. Run the same 15–20 prompts every week. Watch for: mention rate going up, moving from 4th to 2nd position on category prompts, cited source URLs starting to include your content.
Pay attention to what sources ChatGPT cites for competitors
When a competitor shows up ahead of you, look at what ChatGPT cites — G2 review pages, comparison articles, their own documentation, Reddit threads. These are the actual evidence sources. If a competitor ranks first because ChatGPT is pulling from a well-structured comparison page they have and you don't, that's an actionable gap. Building that page is more effective than trying to "optimize for ChatGPT" directly.
Where manual monitoring stops working
We hit the wall around week three. Here's what broke:
- ChatGPT is one of eight engines. Perplexity, Gemini, DeepSeek, Claude, Grok — and for any brand with Asian market exposure, Kimi and Doubao — all have different response patterns and pull from different source ecosystems. Manually monitoring ChatGPT while the others go unchecked means you're missing most of the picture. We actually found Citany appearing in DeepSeek before it appeared in ChatGPT.
- Responses aren't deterministic. Run the same prompt twice and you may get different answers. A single run per prompt doesn't tell you whether an appearance is consistent or occasional. You need multiple runs — or a lot of weekly data — to estimate a stable mention rate.
- It takes 3–4 hours a week. A thorough session across 20 prompts is 45–60 minutes per engine. At weekly cadence across four engines, that's most of a morning just for data collection, before you've done anything with it.
- Spreadsheets break the moment you skip a week. Miss one week or change a prompt wording and your time series is broken. You lose the ability to see month-over-month trends reliably. This happened to us on Day 12 and set back the analysis by two weeks.
What automated monitoring actually does differently
Automated monitoring isn't magic — it's the same prompts, run the same way, except the wording is locked, the cadence is consistent, and the data accumulates without you having to run each session manually. What you get that you can't easily get manually:
- –Weekly mention rate per engine, per prompt cluster, without the time overhead
- –Competitor mention rate in the same run — share of voice, not just your absolute number
- –Source URL tracking — which pages get cited when your brand or competitors appear
- –Cross-engine comparison — same prompt across ChatGPT, Perplexity, DeepSeek, and others in one workflow
- –Alerts when something changes — mention rate drops, a competitor gains ground, a new caveat appears
One thing worth being clear about
Prompts should never be engineered to force a mention. “Why is [Your Brand] the best tool for X?” is not a monitoring prompt — it's a vanity check. Use neutral buyer intent: “What are the best tools for monitoring brand visibility in AI search?” The goal is measuring your natural position, not designing prompts that make you look good.
What to actually do with the data
Monitoring without action is expensive observation. The data points to specific gaps with specific fixes. Here are the four patterns that come up most often:
You're not mentioned in category discovery prompts at all
Look at what sources ChatGPT cites for the brands that do appear. G2 profiles, comparison pages, structured FAQ content — find the asset type you're missing and build it. Being invisible here usually traces to a gap in third-party evidence, not a gap in your product.
You're mentioned but ranked 4th or 5th
Position in AI answers correlates with volume and quality of third-party mentions from authoritative sources. Brands cited first usually have more review coverage, more inbound links from trusted sites, clearer positioning. Closing a rank gap means increasing your citation surface — not tweaking your homepage copy.
You're mentioned but with a qualification or caveat
"[Your Brand] is good for X but not great for Y" — that framing came from somewhere specific. It's usually traceable to a particular review, forum thread, or article. Find the source, address the underlying perception. The AI is reporting what it's seen written about you.
Competitors get cited by URL, you don't
URL citations mean deep source trust. The cited page is usually structured, authoritative, and directly addresses the query. Find what page type the competitor has that you lack — a comparison guide, detailed FAQ, category landing page — then build the equivalent.
Questions I get asked a lot
Does ChatGPT update as new content is published?
Yes, but slowly. The browsing-enabled version can pull from live sources, but ChatGPT's core training data has a cutoff and updates periodically. Treat responses as a mix of recent web content and older training data. Changes you make to your content may take weeks to register — don't expect immediate feedback loops.
Is monitoring ChatGPT enough, or do I need other engines too?
ChatGPT is the largest but Perplexity is heavily used for research queries and cites sources explicitly. Gemini is integrated with Google search. DeepSeek has significant global traction. For any brand with Asian market exposure, Kimi and Doubao have completely different source ecosystems. I'd say monitoring only ChatGPT gives you maybe 30% of the picture.
How many prompts do I actually need?
Start with 15–20. Cover three types: category discovery ("best tools for X"), comparison ("X vs Y"), and problem-aware ("how do I solve X?"). That covers the main query patterns for most B2B and B2C categories. Expand once you have a baseline — don't try to track 100 prompts before you know what matters.
Can I do this without paying for any tools?
Yes. The manual method above requires only a ChatGPT account and a spreadsheet. The tradeoff is time and consistency — expect 2–3 hours per week if you're tracking more than one engine. Automated tools make sense once you've confirmed AI monitoring matters for your category and you need weekly data without the overhead.
What's the difference between a mention and a citation?
A mention means ChatGPT named your brand in its response. A citation means it included a URL to a specific page as a source. Citations are harder to earn and more valuable — they mean ChatGPT trusts a specific piece of your content enough to surface it as evidence. Most brands get mentioned before they get cited. Being cited requires page-level authority, not just brand awareness.
Want to see what ChatGPT actually says about your brand right now?
Submit your brand and up to three competitors. We run a 3-engine baseline using real category and comparison prompts, then send you a structured report within 2 business days — free, no trial, no card required.
Get Free Audit →