Why wrap Perplexity?
Perplexity's primary advantage is fresh, web-grounded answers with citations. The API is genuinely good. The pain is that consuming it programmatically still requires nontrivial parsing. Especially if you also want to compare against ChatGPT or Claude. You end up writing the same citation parser twice.
MentionsAPI handles the citation parsing once, and you get the same shape from Perplexity, ChatGPT (with browsing), Claude (with web search), and Gemini (with grounding). One parser, four providers.
Perplexity is also the most API-faithful UI of the four. Our 1,000-prompt teardown showed only 42% drift between Perplexity API and the live perplexity.ai UI on citations, versus 81% for ChatGPT and 88% for Gemini. That makes Perplexity the cheapest provider to use as a 'what real users see' proxy when you don't want to pay for live UI scrapes on every call.
Pin the Perplexity model explicitly
Our default is `sonar-pro`. Override per call by passing `model: { perplexity: "sonar" }` in the request body. Useful when you want a cheaper/faster tier for bulk crawls. The response shape is identical, so your code doesn't change.
Pricing is bundled. You don't pay Perplexity's per-token rate on top of our per-call price. One number, no surprises.
Tier selection is a real cost lever for bulk workloads. `sonar-pro` is the higher-quality tier with longer context and better grounding; plain `sonar` is faster and cheaper at the upstream level. For an agency tool running daily monitor passes, mixing tiers. `sonar-pro` for high-priority client prompts, `sonar` for the long tail. Is a one-line change that compounds materially against your monthly burn.
Citation tracking across providers
Every response includes a top-level `citations[]` array of canonicalized URLs plus the `providers_cited` that referenced each one. Filter by your domains on the client; cross-reference with your prompt library to see which pages Perplexity is actually recommending.
Perplexity also tends to surface a wider tail of citations than ChatGPT or Claude. Typically 6-12 sources per answer where the LLM-only providers cite 2-4, so the canonical URL store grows fastest from Perplexity calls. That's an asset for content-attribution work: more URLs to attribute, more prompt-to-citation pairs, more raw signal for the 'which content is winning?' question.
How the Perplexity wrapper works under the hood
The Perplexity adapter handles three format quirks the raw Sonar response leaves to you. First, inline `[1]`-style markers in the answer text get matched against the `citations` array and rewritten out of `text`, so the canonical answer string is human-readable without footnote noise. Second, the Sonar `citations[]` URLs go through our redirect resolver and tracker-param stripper before landing in the normalized output. `t.co`, `news.google.com`, AMP caches, and `utm_*` parameters all collapse out. Third, every URL gets `domain`, `title`, and `snippet` extracted from the model's response, so downstream code never has to fetch the page to get a label.
Model selection: default is `sonar-pro` for higher-quality grounding, with `sonar` available as the cheaper/faster tier via `model: { perplexity: "sonar" }`. We always pin a specific model version on the upstream call (not 'latest') and surface the resolved `model` field in the response, so regression-testing against a specific Sonar version stays deterministic across model rolls.
Latency profile for Perplexity-only calls, sampled over 30 days of production traffic: cached p50 ~140 ms / p99 ~600 ms; uncached `sonar-pro` (no `web_search` flag, since Sonar is always grounded) p50 ~3.2 s / p95 ~5.8 s / p99 ~8.4 s. Perplexity is consistently the slowest single provider in a multi-LLM fan-out. That's a tradeoff for the freshness; if you can tolerate the latency on a daily monitor run, the citation density is worth the wait.
Edge cases worth knowing: Sonar occasionally returns the same URL twice with slightly different cited titles (a known upstream quirk on long answers). We dedupe on canonical URL and merge titles into a `titles[]` array if they differ, so you never see ghost duplicates in your `citations[]` count. When Sonar's `citations` array is empty (rare. Usually a query that triggered a non-grounded response), we surface that as `citations: []` rather than dropping the call, so your downstream code doesn't have to special-case 'no citations' vs 'parsing failed'.
When to use the Perplexity wrapper (and when to call Sonar directly)
Use the wrapper when you need clean, normalized citations and you're comparing Perplexity against other providers. The combination of canonicalized URLs, tracker-stripping, dedupe across providers, and a unified schema is what makes citation-tracking dashboards viable without writing four format parsers. If your tool ever needs to answer 'which providers cited this URL?', the wrapper is the right shape.
Use it specifically when you want bundled per-call pricing instead of upstream per-token billing. Perplexity Sonar's token-based pricing is fine for one-off generative work, but for repeat-query monitoring (the same 50 prompts polled daily), our cache amortizes the cost across customers and the bundled per-call price stays predictable as token rates drift. PAYG with $0.02 cache hits beats per-token billing for every monitoring workload we've measured.
Skip it if you're building a Perplexity-only chat product and want streaming, system prompts with full Sonar parameters, or the latest provider-specific feature flags the day they ship. Direct Sonar API is leaner and exposes knobs we deliberately don't surface (we optimize for cross-provider normalization, not per-provider feature parity). Skip it also if you only ever want one citation per call and never plan to compare across providers. A 30-line custom parser handles single-provider, single-shape Perplexity output cheaply enough.