Methodology

How we measure what real users see

We sell numbers. Numbers without methodology are vibes. This page documents how each mode runs, the sample sizes we use, the confidence intervals we compute, and the per-platform reliability we promise. If you're integrating MentionsAPI into a tool you sell to other people, this is the page you point your customers at.

The shippable modes

What each /v1/check mode actually does

Two modes ship today. Two are on the Q3 2026 roadmap. We list both honestly.

ModeWhat it doesStatusCostLatency p95
quickOfficial LLM APIs (ChatGPT, Claude, Gemini, Perplexity Sonar) in parallel. Returns brand mention + rank + the citations each API surfaces. Headline product.Shippable$0.02<5s
perplexity_liveLive UI scrape of perplexity.ai via our dedicated browser-based scraping infrastructure. Returns the answer real users see, plus 5–10 inline citations and 3–5 'related queries' fan_out items.Shippable$0.25<60s
chatgpt_liveLive UI scrape of chatgpt.com. Returns answer text, source citations, fan_out sub-queries (the queries ChatGPT issued during web search), and brand entities.Shippable$0.10<30s
gemini_liveLive UI scrape of gemini.google.com. Returns markdown answer with inline citations and items.Shippable$0.10<30s
ai_overviewGoogle's AI Overviews block extracted from the standard SERP. Returns the AI summary, every cited reference URL, and the citation graph.Shippable$0.05<5s
ai_modeGoogle's dedicated AI Mode chat-style search. Returns markdown answer + citations + items + tables + shopping cards.Shippable$0.10<8s
bing_copilotBing Copilot's AI overview block from Bing SERP. Returns Copilot summary + every reference + images.Shippable$0.05<5s
all_liveComposed bundle: fans out perplexity_live + chatgpt_live + gemini_live + ai_overview + ai_mode + bing_copilot in parallel. The full ground-truth picture in a single call. Partial-success refund applied automatically.Shippable$0.50<60s
realisticDEPRECATED ALIAS for perplexity_live. Same single-Perplexity scrape, same price. Kept for back-compat with early-access integrations.Alias$0.25<60s
deepROADMAP. Multi-run variance + Wilson CI95 + API-vs-UI delta. Returns 501 today.Q3 2026
change_trackROADMAP. Scheduled brand-rank watches with diff. Returns 501 today; use /v1/watch for current change-tracking via webhooks.Q3 2026

Two infrastructure layers underneath every mode. quick calls the official APIs of OpenAI (ChatGPT), Anthropic (Claude), Google (Gemini), and Perplexity in parallel — fast, cheap, normalized to a unified response shape. *_live, ai_overview, ai_mode, and bing_copilot use our dedicated browser-based scraping infrastructure that captures the real user-facing rendering — including fan-out queries, citation graphs, and brand entities that the official APIs don't expose. Single API surface. Single wallet. Single MCP integration.

Statistical methodology

Confidence intervals or it didn't happen

When a deep-mode run says the mention rate is 60%, here's what we actually mean by that number.

Wilson 95% confidence interval on mention rate

Mention rate = (runs that mentioned the brand) / (total successful runs). We report Wilson CI95 alongside the point estimate. Wilson is the right interval for binary proportions on small samples — 5 runs is enough to compute it, but you should treat the bounds as wide until you push past 20 runs.

Bootstrap CI on rank distributions

Rank position is integer-valued and skewed; we resample with replacement (default 1000 iterations) to estimate the rank distribution. Bootstrap output is non-deterministic at the 4th decimal — we use the deterministic Wilson interval in production and surface bootstrap only on the deep-mode response.

Majority vote for `mentioned: bool`

In deep mode, `providers[name].mentioned` is true when ≥ 50% of runs mentioned the brand. The threshold is a defensible default (matches the natural intuition of 'usually shows up'); flagged for product decision in plan §17.8 #8.

Standard deviation of rank across runs

Reported as `variance.rank_stddev`. Excludes runs where the brand wasn't mentioned (rank null). High stddev means a flaky ranking — a tool builder that surfaces this number to their customer is doing the customer a favor.

Brand detection

Token match + Levenshtein-≤2

When a customer asks 'did Notion get mentioned?' we don't ask an LLM. We match deterministically.

Step 1 — case-insensitive substring match

`brand.toLowerCase()` against `text.toLowerCase()`. Catches every exact mention including nested forms ('Notion AI', 'Notion's').

Step 2 — Levenshtein distance ≤ 2 on candidate words

For each whitespace-separated word in the response within ±25% of the brand length, compute edit distance. ≤2 edits is enough to forgive 'Pereplexity' or 'Notion' → 'Notiom' typos without false-positive matches on unrelated short words.

Step 3 — context window

240 character window around the first match (120 before, 120 after) extracted from the original text so reading the context preserves the model's actual phrasing.

Why no LLM

Plan §6.4 mandates deterministic primary brand detection. LLM-based extraction adds 500ms median latency, ~1¢ per call, and varies run-to-run. The deterministic path is sub-millisecond, free, and reproducible.

Public SLA

Per-platform reliability targets

What we promise. Live numbers at /v1/health update from a 7-day rolling window of usage_events.

SurfaceTarget success rateTarget p95 latencyNotes
quick (4 LLM APIs)99%5sOfficial OpenAI, Anthropic, Google, Perplexity APIs in parallel. Partial-success refund applied automatically when any provider fails.
perplexity_live (UI)95%8sAnonymous Perplexity UI scrape. Native fan-out via related_queries. Failures trigger 100% refund.
chatgpt_live (UI)95%30sChatGPT.com UI scrape. Captures fan_out_queries (sub-queries ChatGPT issued during web search), source citations, brand entities.
gemini_live (UI)95%30sGemini.google.com UI scrape. Markdown answer, inline citations, items.
ai_overview (Google AIO)99%5sGoogle AI Overviews block extracted from standard SERP. Citation references + AI summary.
ai_mode (Google AI Mode)95%8sGoogle's dedicated AI Mode chat surface. Markdown + citations + items + tables + shopping.
bing_copilot (Bing AI)99%5sBing Copilot AI overview from Bing SERP. Summary + references + images.
Claude UI scrapeROADMAP. Claude session expiry under scrape patterns is unsolved. Today: use mode:quick for the official Anthropic API.

SLA credit: if the monthly success rate drops below target -10pp, customers get pro-rated credit on next top-up for the affected calls. Wallet model — credits land directly in balance, no invoicing, no contract amendments.

Caching invariants

Three cache tiers, decoupled TTLs

So you can reason about how 'fresh' a response actually is.

TierStorageTTL by mode (quick / perplexity_live / discover-compare)
L1Cloudflare Cache API (per colo)5m / never / never
L2Cloudflare KV (global)1h / never / never
L3Postgres `cache_entries`24h / never / never

Cache key digest: sha256(query + brand + mode + region + day). The UTC-day component naturally rolls every entry over at 00:00 UTC even if the L3 row would otherwise survive longer. Deep mode caches the aggregated result but never the constituent runs — variance is only valid on live runs.

What we don't claim

Honest limitations

Where the data product stops. We list it instead of pretending it doesn't matter.

Only Perplexity has reliable UI scraping today

Our internal §22 test confirmed ChatGPT, Claude, and Gemini UIs only invoke web search on certain queries — and even when they do, the citations they render are inconsistent. Until we can detect/force search-mode reliably, we ship Perplexity-only UI scraping at perplexity_live and call out the rest as roadmap. mode:quick still queries all 4 official APIs in parallel.

Citations are surfaced as-returned

When Perplexity gives us a list of source URLs we forward them verbatim. We do NOT validate URL reachability. Spot-checking 5 sample URLs from a recent Perplexity response, 4 returned 200 and 1 returned 404 (a stale Zoho article). If your downstream system needs guaranteed-live links, validate post-hoc.

Sentiment classification needs the LLM path

The deterministic brand detector reports `sentiment: 'neutral'`. If you need positive/negative classification, pass an empty `track_brands` to /v1/check or call /v1/extract_brands directly with the response text.

Scrapes can be rate-limited under load

Perplexity tightens its bot detection unpredictably. The /v1/health endpoint shows live numbers from a 7-day rolling window; partial failures trigger an automatic proportional refund (you only pay for what worked).

Numbers with error bars.

$1 free signup credit covers ~50 quick-mode calls or 4 perplexity_live calls — enough to evaluate before you top up.