LLM eval suite

Run the same prompt across all 4 LLM APIs in parallel for $0.02 per call. Cheap enough that a 1,000-prompt eval suite is $20. Drop-in replacement for the "fan out across providers" boilerplate.

Why it matters

Most LLM eval suites loop one provider at a time, which is slow and ratchets up cost via per-provider billing. mode:quick fans out to all 4 in parallel and charges a single 2¢ — bundled tokens, no per-provider invoice surprises.

Code

javascript

import fetch from "node-fetch";

const PROMPTS = [
  { id: "geography",  text: "What is the capital of Australia?",     expected: "Canberra" },
  { id: "math",       text: "What is 7 * 6?",                          expected: "42" },
  { id: "code",       text: "Write a Python one-liner to reverse a string." },
];

async function eval_(prompt: string, expected?: string) {
  const res = await fetch("https://api.mentionsapi.com/v1/check", {
    method: "POST",
    headers: { Authorization: `Bearer ${process.env.MENTIONSAPI_KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify({ mode: "quick", query: prompt, brand: expected ?? "" }),
  });
  const { providers } = await res.json();
  return Object.fromEntries(
    Object.entries(providers).map(([k, v]: [string, any]) => [k, { context: v.context, mentioned: v.mentioned }]),
  );
}

for (const p of PROMPTS) {
  const result = await eval_(p.text, p.expected);
  console.log(`\n--- ${p.id}: ${p.text} ---`);
  console.log(JSON.stringify(result, null, 2));
}

Tip

For brand-mention evals (does provider X cite my brand?), pass the brand name in brand and read providers[].mentioned directly. For freeform evals, pass any string in brand (or a sentinel) and use providers[].context as the answer text.