Why a wrapper instead of the OpenAI SDK?
The OpenAI SDK is excellent for building chat products. It is not built for brand monitoring. You get unstructured text, no citation parsing, and no caching layer. Every brand-tracking workflow ends up duplicating the same three or four utility files. Worse, when your boss asks 'what does Claude say?', you start the whole project over from scratch.
MentionsAPI flips the model: a single endpoint, a single response schema, four LLMs swappable via a `providers` array. Add Gemini to your dashboard with a one-line code change.
There's also an auth flattening that's easy to underestimate. You authenticate once with your MentionsAPI bearer token; we hold the OpenAI key, the Anthropic key, the Gemini key, and the Perplexity key on the server side and rotate them quietly. Your `.env` doesn't grow every time you add a provider, and your security review surface stays exactly one credential wide.
What you get out of the box
Pass a prompt and a list of brands you care about. The response includes the raw answer text, an array of mentions (each with position, sentiment, and surrounding context), and a list of cited URLs with their domains resolved. We handle the LLM rate limits, retries, and caching. Your code becomes a single fetch call.
Because we share a 24-hour cache across customers running similar prompts, your costs drop by an order of magnitude versus calling OpenAI directly for repeat queries, and you can override with `cache_bypass: true` whenever you need a fresh result.
Brand matching uses a Levenshtein-≤2 fuzzy comparator on top of case-insensitive normalization, so 'OpenAI' and 'Open AI' (or 'GitHub' and 'Github') don't quietly mismatch. We also track sentence-level position, not just first-mention offset, so a brand that gets named in the closing sentence after a competitor lead doesn't get a misleading 'first mention' score in your dashboard.
Built for GEO and brand-monitoring tools
We built MentionsAPI for developers who want one HTTP call to do something their toolkit makes painful: ask every major LLM about their brand and get back a structured answer. If you're building a Generative Engine Optimization product, an SEO agency dashboard, or an internal brand-watch tool, this saves you weeks of glue code.
It's also a clean drop-in for teams already running OpenAI in production. The request body accepts the prompt and (optionally) the model overrides; the response keeps the original `text` so any existing rendering logic still works, and the `mentions[]` and `citations[]` arrays are additive. Most teams adopt it by changing the URL and the auth header. Total integration cost is the time it takes to update an environment variable.
How the ChatGPT wrapper works under the hood
Your request hits our edge, gets canonicalized (prompt + provider set + tracked brands + model override), and we hash it for cache lookup. A cache hit returns the stored normalized response in roughly 80-200 ms total round-trip and bills at $0.02. A miss invokes our OpenAI adapter, which sends the chat-completions request through GPT-5 (or your overridden model), unwraps `choices[0].message.content` into `text`, and pulls any tool-call source URLs into a per-provider `citations[]`.
Brand extraction runs on the returned text in a deterministic pass: we tokenize on sentence boundaries, run case-insensitive plus alias matching for each `track_brands` entry, compute the character offset and sentence index, and score the surrounding sentence for sentiment using a calibrated classifier, not a re-prompt to another LLM. That means the extraction step is cheap (sub-50 ms typical) and reproducible: the same input text always returns the same mentions array.
We measured uncached single-provider OpenAI latency at p50 ~1.6 s, p95 ~3.4 s, p99 ~5.8 s over the last 30 days of production traffic. Close to a direct OpenAI SDK call once the cache lookup overhead is amortized. Cache hits on the same prompts run p50 ~140 ms because they skip OpenAI entirely. Repeat queries against the same prompt list cluster around 70-80% cache hit rate in production agency workloads, which is where the cost story actually wins.
Per the methodology page, brand extraction confidence is reported with a Wilson 95% confidence interval over the rolling sample. If a customer asks 'why didn't ChatGPT mention us in this answer?', you can hand them a deterministic, replayable record, not a maybe-it-was-a-bad-day shrug.
When to use this wrapper (and when to skip it)
Use this wrapper when you need structured outputs from ChatGPT for monitoring, comparison, or analytics. Anything where the raw `text` is the start of your pipeline, not the end. GEO tools, brand-mention dashboards, content-attribution loops, and AI-search benchmarks are all the right shape: you want the same prompt run repeatedly, normalized, and stored for re-extraction with new brand lists later.
Use it specifically over the OpenAI SDK when you're going to add Claude, Gemini, or Perplexity later. Most teams that start single-provider end up multi-provider within a quarter. Your boss sees the dashboard and asks 'what does Claude say?', and the rewrite cost is real. Adopting the wrapper now makes that future change a one-line array edit.
Skip it for chat products. If you're building a customer-facing assistant, a copy generator, or anything where you want streaming, function calling, or vision inputs, the OpenAI SDK exposes knobs we deliberately don't surface. Multi-provider normalization is overhead you don't need when only one model's answer matters and you want the latest provider-specific feature the day it ships. Use the SDK directly there; come back to MentionsAPI when the question shifts from 'generate good output' to 'measure where our brand shows up'.