Citation canonicalization
Every provider cites sources differently. We normalize them into two layers: raw per-provider citations inside each ProviderResult, and a deduplicated aggregate at the top level of the response.
Per-provider Citation
Inside providers[].citations[] — each entry is the citation as we received it from that provider, after URL cleanup but before cross-provider aggregation.
| Field | Type | Description |
|---|---|---|
urloptional | string | Canonical URL. Redirect chains resolved, tracking parameters stripped, protocol forced to https. |
titleoptional | string | undefined | Page title if the provider returned one. |
snippetoptional | string | undefined | Short excerpt the provider associated with the citation, if any. |
AggregatedCitation
Top-level citations[] — deduplicated across providers so you can answer “how many of the N providers cited this URL?” in O(1).
| Field | Type | Description |
|---|---|---|
canonical_urloptional | string | The canonical URL after normalization. Use this as the stable key. |
domainsoptional | string[] | All hostnames represented under this canonical entry (e.g. ["docs.supabase.com", "supabase.com"] if different providers pointed at both). |
providers_citedoptional | ('openai' | 'anthropic' | 'gemini' | 'perplexity')[] | Which providers cited this URL. Length = how many of the N providers referenced it. |
titleoptional | string | undefined | Page title if any provider returned one. |
What we strip
- Redirect trackers: we follow short-link hosts (e.g.
t.co,lnkd.in, Perplexity's redirect) and record the canonical destination URL. - UTM parameters: every
utm_*,gclid,fbclid, and similar affiliate markers. - Tool-generated fragments like Perplexity's
#:~:text=text fragments. Semantic anchor fragments are kept. - Trailing slashes on roots so
example.com/andexample.comdeduplicate.
Example
Two providers cite overlapping URLs. After per-provider cleanup and cross-provider aggregation:
json
// Top-level citations[] from a 3-provider fan-out
[
{
"canonical_url": "https://supabase.com/pricing",
"domains": ["supabase.com"],
"providers_cited": ["openai", "anthropic"],
"title": "Pricing | Supabase"
},
{
"canonical_url": "https://docs.supabase.com/guides/auth",
"domains": ["docs.supabase.com"],
"providers_cited": ["anthropic"],
"title": "Auth — Supabase Docs"
}
]The raw per-provider lists are still available under providers[i].citations if you need to audit what each model returned.