Citation canonicalization

Every provider cites sources differently. We normalize them into two layers: raw per-provider citations inside each ProviderResult, and a deduplicated aggregate at the top level of the response.

Per-provider Citation

Inside providers[].citations[] — each entry is the citation as we received it from that provider, after URL cleanup but before cross-provider aggregation.

FieldTypeDescription
urloptional
stringCanonical URL. Redirect chains resolved, tracking parameters stripped, protocol forced to https.
titleoptional
string | undefinedPage title if the provider returned one.
snippetoptional
string | undefinedShort excerpt the provider associated with the citation, if any.

AggregatedCitation

Top-level citations[] — deduplicated across providers so you can answer “how many of the N providers cited this URL?” in O(1).

FieldTypeDescription
canonical_urloptional
stringThe canonical URL after normalization. Use this as the stable key.
domainsoptional
string[]All hostnames represented under this canonical entry (e.g. ["docs.supabase.com", "supabase.com"] if different providers pointed at both).
providers_citedoptional
('openai' | 'anthropic' | 'gemini' | 'perplexity')[]Which providers cited this URL. Length = how many of the N providers referenced it.
titleoptional
string | undefinedPage title if any provider returned one.

What we strip

  • Redirect trackers: we follow short-link hosts (e.g. t.co, lnkd.in, Perplexity's redirect) and record the canonical destination URL.
  • UTM parameters: every utm_*, gclid, fbclid, and similar affiliate markers.
  • Tool-generated fragments like Perplexity's #:~:text= text fragments. Semantic anchor fragments are kept.
  • Trailing slashes on roots so example.com/ and example.com deduplicate.

Example

Two providers cite overlapping URLs. After per-provider cleanup and cross-provider aggregation:

json
// Top-level citations[] from a 3-provider fan-out
[
  {
    "canonical_url": "https://supabase.com/pricing",
    "domains": ["supabase.com"],
    "providers_cited": ["openai", "anthropic"],
    "title": "Pricing | Supabase"
  },
  {
    "canonical_url": "https://docs.supabase.com/guides/auth",
    "domains": ["docs.supabase.com"],
    "providers_cited": ["anthropic"],
    "title": "Auth — Supabase Docs"
  }
]

The raw per-provider lists are still available under providers[i].citations if you need to audit what each model returned.