For developers building AI-visibility tools

One multi-LLM API for ChatGPT, Claude, Gemini, and Perplexity

A multi-LLM API is a single HTTP endpoint that dispatches one prompt to multiple language-model providers in parallel and returns their answers under one normalized response schema. MentionsAPI's `/v1/check` does exactly that across ChatGPT, Claude, Gemini, and Perplexity. One bearer token, one billable line, one shape to parse.

Comparing answers across four LLM providers used to mean four SDKs, four auth flows, four rate-limit handlers, and four totally different response shapes. By the time you've normalized everything into a comparable format, you've written more glue code than business logic.

MentionsAPI gives you a single endpoint that fans out to all four providers in parallel, normalizes the responses into a single schema, and returns them as one object. Add or remove a provider by editing one line of an array.

If you've ever caught yourself writing `if (provider === 'anthropic')` for the fourth time, this is for you.

Top up from $10 · Pay per call · Credits never expire

Pay as you go·$10 minimum · Credits never expire · No plans

The unified response schema

Every provider returns the same fields: `text` (the raw answer), `model` (the exact model version that handled the request), `latency_ms`, `tokens` (input and output counts), `citations` (URLs the model referenced, if any), and `mentions` (when you pass `track_brands`). Comparing answers becomes a JavaScript array operation, not a parsing project.

Errors are scoped to the provider that failed: the response includes a partial set of successes plus an `errors` array. Your code never has to worry about a single provider taking down the whole request.

We bias toward the lowest-common-denominator field set rather than exposing every provider quirk through the schema. If Gemini returns a `safety_attributes` block that nobody else returns, it lands in a per-provider `raw` field. Your normalization layer keeps working, and the people who care about per-provider extras can still get them.

Parallel execution by default

Calling four LLMs sequentially is the slowest possible thing you can do. We dispatch all providers in parallel, with smart timeouts so a slow provider never blocks the response. Total latency is the latency of the slowest single provider, not the sum.

Aggressive caching keeps repeat queries near-zero-latency. We've measured median total response times of ~1.2 seconds for cached prompts across all four providers. Faster than a single uncached OpenAI call. Per-provider timeouts default to 8 s for non-search calls and 25 s for `web_search: true` calls; if Perplexity hangs, the response still ships with the three providers that came back, plus a timeout entry in `errors[]`.

The fan-out is also where most of our cost discipline lives. We cap concurrency per upstream key at the level each provider tolerates without throttling, so a 100-prompt batch doesn't trigger a 429 cascade. You get back the full set of answers, in order, without writing your own retry queue.

Built for comparison workflows

Most multi-LLM use cases boil down to comparison: 'which model picks our brand,' 'which model has the better citation,' 'which model is cheapest for this query.' The unified schema lets you pivot on any of those dimensions without preprocessing. Build a dashboard column per provider in an afternoon.

Because every raw answer is archived for 30 days, you can also re-run brand extraction over historical answers without re-billing the LLM. Add a competitor to `track_brands`, hit `GET /v1/ask/:id`, and get the new mentions back in milliseconds. Useful for backfilling share-of-voice when a customer asks about a competitor that wasn't on the original list.

How the multi-LLM fan-out works under the hood

When `/v1/check` receives a request with `providers: ["openai", "anthropic", "gemini", "perplexity"]`, we hash the canonicalized request body (prompt + provider set + tracked brands + model overrides) and look it up in the shared 24-hour cache. Cache hits return immediately at $0.02. Cache misses trigger four parallel `fetch` calls. Wrapped in per-provider adapters that handle each upstream's auth, request shape, and response idiosyncrasies.

Each adapter normalizes its provider's output into the common schema before the fan-out resolver merges them. OpenAI's `choices[0].message.content` becomes `text`. Anthropic's `content[].text` blocks get joined. Perplexity's inline `[1]` markers get stripped from `text` and rebuilt into the canonical `citations[]` array. Gemini's grounding metadata is parsed for cited URLs. The merge step also computes a top-level `citations[]` (deduplicated across providers, with `providers_cited` per URL) and a `mentions[]` array per provider when `track_brands` is set.

Latency profile across our fleet, sampled over the last 30 days: cached `mode: quick` median ~1.2 s, p99 ~2.1 s. Uncached `mode: quick` (no web search) median ~3.8 s, p99 ~7.4 s. Full fan-out with `web_search: true` median ~6.5 s, p99 ~14 s. Perplexity's web-grounded path is the long pole. If you need predictability over freshness, set `cache_bypass: false` (the default) and most production workloads will land in the cached band.

Failure modes are bounded: a single provider 5xx returns an `errors[]` entry, never a 500 on the parent call. A timeout is recorded as `error.code: "timeout"` with the elapsed milliseconds. A schema-level mismatch (rare. Usually a provider rolling out a breaking change) is logged and the provider's slot is omitted from `results[]` rather than silently corrupting the response.

When to use a multi-LLM API (and when not to)

Use `/v1/check` when you're building anything that needs cross-provider data: brand-monitoring tools, GEO scoring, AI-visibility dashboards, model-comparison reports, prompt-quality benchmarks, or vendor-risk reviews where 'are we too dependent on one model?' is a real question. The endpoint also fits internal tools. Pricing teams running competitive prompts daily, content teams checking which engine cites them, data teams running regression tests across model upgrades.

Don't use it for single-shot generative work. If you're building a chat product, a summarization pipeline, or a one-LLM agent loop, the OpenAI or Anthropic SDK is cheaper and more featureful. They expose streaming, function calling, and provider-specific knobs we deliberately don't surface. Multi-LLM fan-out is overhead you don't need when only one model's answer matters.

Don't use it as a routing layer either. If your goal is 'pick the cheapest provider for this prompt and return that one answer,' OpenRouter is purpose-built for that. We're optimized for the inverse problem: you want every provider's answer side by side, normalized, with brand and citation extraction layered on top. Different shape, different bill.

FAQ

Frequently asked questions

Answer-first, dev-to-dev. Each one is also embedded as FAQPage schema for AI engines.

What is a multi-LLM API?
A multi-LLM API is a single endpoint that calls multiple LLM providers in parallel and returns their responses in one normalized response shape. MentionsAPI's `/v1/check` fans out to ChatGPT, Claude, Gemini, and Perplexity simultaneously, returning a `results[]` array where every entry has the same fields. You go from four SDKs and four auth flows to one bearer token.
Can I query ChatGPT, Claude, Gemini, and Perplexity in one call?
Yes. That's the core endpoint. Pass `"providers": ["openai", "anthropic", "gemini", "perplexity"]` in the request body and all four run in parallel. Total latency is the slowest single provider, not the sum. With a cache hit, median total response time is ~1.2 seconds across all four providers.
How fast is a multi-provider call?
Median 1.2 seconds for cached prompts across all four providers. Uncached `mode: quick` calls land around 3-4 s median; full fan-outs with `web_search: true` are 6-8 s median because Perplexity's grounded path is the long pole. Providers run in parallel with smart timeouts, so a hung Gemini call won't stall the response. It shows up as a timeout entry in `errors[]`.
How much does a multi-provider call cost?
$0.25 for the multi-provider fanout (2-4 LLMs in parallel, no web search). $0.75 for the full fanout (4 LLMs + web_search). $0.02 if it hits the shared 24-hour cache. One billable line per call, regardless of how many providers were in the fan-out. Pay-as-you-go, $10 minimum top-up, credits never expire.
How is MentionsAPI different from OpenRouter or LiteLLM?
OpenRouter is a routing layer. It forwards requests to one LLM at a time. LiteLLM is self-hosted and you maintain it. MentionsAPI runs all four providers in parallel by default and adds brand mention extraction, citation canonicalization, and a 24-hour shared cache on top. It's an aggregator built specifically for comparison and monitoring workflows, not just routing.
Can I pin specific model versions across providers?
Yes. Pass `model: { openai: "gpt-5", anthropic: "claude-sonnet-4-5", perplexity: "sonar-pro" }` to override the defaults. The response shape stays identical, so your downstream code doesn't change. Useful when you want a cheaper tier for bulk crawls or need to lock in a specific version for regression testing.
How do I handle partial failures in a multi-provider call?
The response includes both successful results and an `errors[]` array. You never get a 500 because one provider hiccupped. Iterate `data.results` for the successes and check `data.errors` for which providers failed and why. You only pay for the providers that actually returned data; full failures cost nothing.
How do I add a fifth provider later?
When we ship a new provider adapter (Mistral and Cohere are on the roadmap), you add it to your `providers` array and ship. The unified schema stays the same, your normalization code keeps working, and the new provider's quirks live inside our adapter. Zero migration work on your side. That's the whole reason to use an aggregator instead of writing your own.
Code example

Query all four providers in one call

Drop in your API key and you're live. Same response shape across every provider.

 POST /v1/ask
curl https://api.mentionsapi.com/v1/ask \
  -H "Authorization: Bearer lvk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "providers": ["openai", "anthropic", "gemini", "perplexity"],
    "prompt": "Best open-source vector databases?",
    "track_brands": ["Qdrant", "Weaviate", "Pinecone", "Chroma"]
  }'
Compare

MentionsAPI vs. building your own multi-LLM router

 The other wayMentionsAPI
DIY (4 SDKs)4 auth flows, 4 retry handlers1 bearer token, automatic retries
DIY (4 SDKs)4 different response shapesSingle normalized schema
DIY (4 SDKs)Sequential = slowParallel by default
DIY (4 SDKs)No shared cacheCross-customer 24h cache
Pricing

Top up from $10. Pay per call. No plans.

Each /v1/check?mode=quick call (4 LLM APIs in parallel) is $0.02. One billable line, regardless of provider count. /v1/check?mode=perplexity_live (live UI scrape) is $0.25. $1 free signup credit, $5 minimum top-up, no monthly tiers, no commitment.

Stop wiring up four SDKs.

One API key, four answer engines, structured responses. $10 minimum top-up. Credits never expire.