Caching

Caching is the difference between paying a cache-hit price (2¢) and paying for a fresh multi-provider fan-out. Here's how it works and how to control it.

How keys are computed

The cache key is a SHA-256 digest of the normalized inputs:

plaintext
SHA-256(
  normalized_prompt_or_messages
  + providers[] (sorted)
  + params (temperature, max_tokens)
  + web_search
  + country
  + track_brands[] (sorted)
  + cache_scope ('shared' | 'private')
  + system_message
  + model overrides
  + json_schema
)

Any change to any of the above produces a different key — a fresh fan-out runs and the result is cached under the new key. Whitespace and case are normalized before hashing.

Tiers

  • L1 — Cloudflare Cache API (per-colo). Single-digit-ms reads. Served from the same Cloudflare datacenter that handled your request.
  • L2 — Cloudflare KV (global). ~50ms reads. An L2 hit is asynchronously promoted back to L1 so subsequent calls in that colo are L1 hits.
  • L3 — Postgres (durable backup). Fallback when L1+L2 miss but the entry is still within retention. Slower but survives KV eviction.

Which tier served the call is reported in the response body as cache_tier (l1 / l2 / l3 / miss).

Shared vs private scope (cache_scope)

Default is "shared" — if another customer (or you) has already issued the same normalized prompt to the same providers with the same params, we return the cached result. The underlying provider response is the same regardless of who asked, so shared caching is privacy-safe.

json
{
  "providers": ["openai", "anthropic"],
  "prompt": "Best vector databases?",
  "cache_scope": "private"
}

Set "cache_scope": "private" to namespace the cache key by your account_id so cached entries are isolated to your account. Useful when your prompts are internally sensitive and you don't want them matching a shared-cache line.

Reading cache state

Every /v1/ask response includes top-level cached and cache_tier fields:

json
{
  "cached": true,
  "cache_tier": "l1",
  "providers": [/* … */],
  "usage": { "billable_units": 0, "latency_ms": 12, "cost_cents": 2 }
}

Cache hits and billing

Cache hits are billed at a flat regardless of how many providers you originally asked for. They do not count against your rate limit — only against your credit balance. See Billing for the full price matrix.

Forcing a fresh call

Today the only supported way to guarantee a fresh fan-out is to vary one of the inputs that goes into the cache key (for example, add a nonce to the prompt or flip cache_scope to private). A cache_bypass parameter is on the roadmap.