The unified response schema
Every provider returns the same fields: `text` (the raw answer), `model` (the exact model version that handled the request), `latency_ms`, `tokens` (input and output counts), `citations` (URLs the model referenced, if any), and `mentions` (when you pass `track_brands`). Comparing answers becomes a JavaScript array operation, not a parsing project.
Errors are scoped to the provider that failed: the response includes a partial set of successes plus an `errors` array. Your code never has to worry about a single provider taking down the whole request.
We bias toward the lowest-common-denominator field set rather than exposing every provider quirk through the schema. If Gemini returns a `safety_attributes` block that nobody else returns, it lands in a per-provider `raw` field. Your normalization layer keeps working, and the people who care about per-provider extras can still get them.
Parallel execution by default
Calling four LLMs sequentially is the slowest possible thing you can do. We dispatch all providers in parallel, with smart timeouts so a slow provider never blocks the response. Total latency is the latency of the slowest single provider, not the sum.
Aggressive caching keeps repeat queries near-zero-latency. We've measured median total response times of ~1.2 seconds for cached prompts across all four providers. Faster than a single uncached OpenAI call. Per-provider timeouts default to 8 s for non-search calls and 25 s for `web_search: true` calls; if Perplexity hangs, the response still ships with the three providers that came back, plus a timeout entry in `errors[]`.
The fan-out is also where most of our cost discipline lives. We cap concurrency per upstream key at the level each provider tolerates without throttling, so a 100-prompt batch doesn't trigger a 429 cascade. You get back the full set of answers, in order, without writing your own retry queue.
Built for comparison workflows
Most multi-LLM use cases boil down to comparison: 'which model picks our brand,' 'which model has the better citation,' 'which model is cheapest for this query.' The unified schema lets you pivot on any of those dimensions without preprocessing. Build a dashboard column per provider in an afternoon.
Because every raw answer is archived for 30 days, you can also re-run brand extraction over historical answers without re-billing the LLM. Add a competitor to `track_brands`, hit `GET /v1/ask/:id`, and get the new mentions back in milliseconds. Useful for backfilling share-of-voice when a customer asks about a competitor that wasn't on the original list.
How the multi-LLM fan-out works under the hood
When `/v1/check` receives a request with `providers: ["openai", "anthropic", "gemini", "perplexity"]`, we hash the canonicalized request body (prompt + provider set + tracked brands + model overrides) and look it up in the shared 24-hour cache. Cache hits return immediately at $0.02. Cache misses trigger four parallel `fetch` calls. Wrapped in per-provider adapters that handle each upstream's auth, request shape, and response idiosyncrasies.
Each adapter normalizes its provider's output into the common schema before the fan-out resolver merges them. OpenAI's `choices[0].message.content` becomes `text`. Anthropic's `content[].text` blocks get joined. Perplexity's inline `[1]` markers get stripped from `text` and rebuilt into the canonical `citations[]` array. Gemini's grounding metadata is parsed for cited URLs. The merge step also computes a top-level `citations[]` (deduplicated across providers, with `providers_cited` per URL) and a `mentions[]` array per provider when `track_brands` is set.
Latency profile across our fleet, sampled over the last 30 days: cached `mode: quick` median ~1.2 s, p99 ~2.1 s. Uncached `mode: quick` (no web search) median ~3.8 s, p99 ~7.4 s. Full fan-out with `web_search: true` median ~6.5 s, p99 ~14 s. Perplexity's web-grounded path is the long pole. If you need predictability over freshness, set `cache_bypass: false` (the default) and most production workloads will land in the cached band.
Failure modes are bounded: a single provider 5xx returns an `errors[]` entry, never a 500 on the parent call. A timeout is recorded as `error.code: "timeout"` with the elapsed milliseconds. A schema-level mismatch (rare. Usually a provider rolling out a breaking change) is logged and the provider's slot is omitted from `results[]` rather than silently corrupting the response.
When to use a multi-LLM API (and when not to)
Use `/v1/check` when you're building anything that needs cross-provider data: brand-monitoring tools, GEO scoring, AI-visibility dashboards, model-comparison reports, prompt-quality benchmarks, or vendor-risk reviews where 'are we too dependent on one model?' is a real question. The endpoint also fits internal tools. Pricing teams running competitive prompts daily, content teams checking which engine cites them, data teams running regression tests across model upgrades.
Don't use it for single-shot generative work. If you're building a chat product, a summarization pipeline, or a one-LLM agent loop, the OpenAI or Anthropic SDK is cheaper and more featureful. They expose streaming, function calling, and provider-specific knobs we deliberately don't surface. Multi-LLM fan-out is overhead you don't need when only one model's answer matters.
Don't use it as a routing layer either. If your goal is 'pick the cheapest provider for this prompt and return that one answer,' OpenRouter is purpose-built for that. We're optimized for the inverse problem: you want every provider's answer side by side, normalized, with brand and citation extraction layered on top. Different shape, different bill.