Visibility metrics that actually work
Ranking by 'mention count' is naive. A paragraph that says your name three times in passing is worse than one that names you once as the recommendation. We compute prominence based on first-mention position normalized to answer length, plus a sentiment-weighted score for the surrounding sentence.
Pull these as raw numbers per provider per query. Roll them up however you like. By day, by competitor cohort, by intent category. The API gives you the atoms; you compose the metrics that matter to your customers.
We also return a per-provider `visibility_score` (0-100) as a starting point. A weighted blend of presence, prominence, and sentiment computed with documented weights. Most tool builders override the weights to match their customers' priorities; the raw inputs are still in the response, so swapping in your own scoring is a single function on your side, not a re-query.
Track visibility over time without re-querying
We store every raw answer for 90 days by default. That means you can add a new competitor to your `track_brands` list and re-run extraction over historical data without spending a single LLM call. Backfill a competitor's share-of-voice for the last quarter in a few seconds.
Need indefinite retention for trend reports, longitudinal analyses, and the occasional 'when did Claude start citing us?' debugging session? Email [email protected]. We'll set you up with an extended-retention arrangement.
The retention store is the longitudinal asset. Tools that started measuring AI visibility in early 2026 will have year-over-year baselines by the time most agencies are still figuring out what to track. The API exposes `GET /v1/ask/:id` for replay and `GET /v1/monitors/:id/runs` for time-series pulls. Both return the original normalized response, so the chart you draw today will draw the same way next year.
Citation rate is its own KPI
Even when an AI doesn't name you, it might cite a page on your site as a source. That's a different funnel than mentions: clicks come from citations, brand awareness comes from mentions. We split them out so you can optimize for both.
Practically: a B2B brand might rank well on mention-rate (the model recommends them) but poorly on citation-rate (the model never links to their docs). The remediation is different. Better naming hits prompt-mention; better technical content hits citation-rate. Splitting the metrics lets your customers see which lever to pull.
How visibility scoring works under the hood
For each `mentions[]` entry, we compute three numbers: `position_norm` (the character offset of the first mention divided by the total answer length, expressed as 0-1 where 0 is start and 1 is end), `sentence_index` (which sentence contains the mention, 1-indexed), and `sentiment` (categorical: `positive`, `neutral`, `negative`). The sentiment classifier runs on the sentence containing the mention, not the full answer, so a brand that's praised in the intro but mocked in the conclusion gets per-mention sentiment instead of an average.
The default `visibility_score` formula is `0.4 × presence + 0.4 × (1 - position_norm) + 0.2 × sentiment_weight`, where `sentiment_weight` is +1 / 0 / -1 mapped onto a 0-1 range. That's a deliberate starting point. Most GEO tools override to weight prominence higher (because first-mention rate is what their customers actually care about) or weight sentiment higher (because B2B prompts often mention competitors neutrally and what matters is positive framing). Both flavors are a one-line change since we return the raw inputs.
For confidence intervals on aggregate metrics. 'is our mention rate 22% across this prompt set, or 28%?'. The methodology page documents how we compute Wilson 95% confidence intervals over the rolling sample. If you're shipping numbers to a customer's executive deck, the CI bounds are the difference between 'data' and 'a number to argue about'.
Latency profile for visibility calls: cached `mode: quick` returns p50 ~140 ms / p99 ~600 ms; uncached `mode: quick` p50 ~3.8 s / p99 ~7.4 s. The classifier inference step adds ~30-60 ms per provider, mostly amortized in the cache. For dashboard refreshes that need to land in under a second, design for cache hits. Repeat queries against your prompt list will mostly cache after the first run.
When a visibility API fits (and when it doesn't)
Use it when you're shipping a tool that compares brand presence over time across providers. GEO dashboards, AEO reports, agency client portals, internal CMO dashboards. The combination of normalized atoms (presence, prominence, sentiment, citation rate), 90-day raw-answer retention, and re-extraction without re-querying is the pattern that makes longitudinal reporting viable without exploding your LLM bill.
Use it specifically when your customers will ask 'what was our mention rate last month?' and 'how did that change after our content launch?' Both questions need a longitudinal store; both are expensive to back-build if you didn't capture data at the time. Starting on the API now compounds.
Skip it for one-shot reports where you only need a snapshot. A consultant running a single 'how does AI describe our category?' analysis can pay a third party for a one-time report cheaper than wiring up a monitoring loop. Skip it also if your reporting needs to match exactly what users see in chatgpt.com. Our API returns API-mode data; it diverges from the live UI on roughly 80-96% of prompts. Pair with `mode: perplexity_live` ($0.25) on the prompts where parity actually matters, or use the delta-report tool to characterize the gap.