{"id":"responsible-ai","relativePath":"responsible-ai.md","title":"Responsible AI","markdown":"# Responsible AI\n\nHow Meta Museum runs LLM features on a public deployment without leaking the API key, draining the budget, or publishing unverifiable content. Every claim here is enforced in code, not policy.\n\n## Threat model\n\nThree distinct risks, handled separately:\n\n| Risk | Mitigation |\n|---|---|\n| **Key exposure** — the API key reaching the browser | Key is read **only server-side**; no `NEXT_PUBLIC_*` key exists. Next.js never ships server env vars to the client. |\n| **Denial of wallet** — anonymous traffic spending the key | The paid model path is **auth-gated**; anonymous requests transparently fall back to local heuristics. |\n| **Unverifiable output** — hallucinated or rights-unsafe content | **Citation/originality gates** + **rights-aware** rendering + **human approval** before any publication. |\n\n## 1. Server-only key handling\n\n`ANTHROPIC_API_KEY` is read only inside server code (`src/services/agents.ts` → `callClaude`). The model call is a server-side `fetch` to the Anthropic Messages API; the key is never serialized into a response or a client bundle. A `grep` for `NEXT_PUBLIC.*KEY` returns nothing.\n\n## 2. Auth-gate (denial-of-wallet guard)\n\nThe token-spending path (`generateContent` → `callClaude`, reached by `/api/content/generate` and `/api/agents/run`) runs the real model **only for authenticated requests**:\n\n- `src/services/ai-model-access.ts` — `modelAccessAuthenticated(request, session)` (a real Auth.js session, or a non-prod test-token) and `gateUseModel(requested, authenticated)`.\n- The routes resolve the session via `auth()` and **downgrade `useModel` to `false` for anonymous callers**, so the request transparently returns local heuristics instead of a model call.\n- Those write routes are *also* role-gated (editor+) by the middleware. Result: a random visitor or bot cannot trigger model spend.\n\nRegression-tested both directions: an anonymous `useModel: true` request with a key present stays `mode: local-heuristics` (the model never fires); an authenticated request reaches the model. See `tests/services/ai-model-access.test.ts` and `tests/api/content-generate.test.ts`.\n\n**Operational floor:** use a dedicated, spend-capped key. Even if every code guard failed, a hard monthly cap in the provider console bounds the worst case.\n\n## 3. Cite or refuse\n\nGenerated content is gated before it can be marked publish-ready:\n\n- **Citation coverage** — every record in scope must be cited (source URL + provider). Below threshold ⇒ `refused`.\n- **Originality** — semantic-novelty score + a unique-insight minimum; template-only output is rejected.\n- **Rights** — output carries `imageReuse` / `textReuse` / overall rights; unknown rights render an explicit \"do not reuse\" state.\n- **Review state** — output is `ready-for-review`, never auto-published. Agents draft; humans approve.\n\nA real production response shows all of these: `citationCoverage: 57/57`, `originalityScore: 0.96`, `rights: \"Rights unknown — do not reuse\"`, `reviewState: ready-for-review`, `refused: false`.\n\n## 4. Eval harness\n\nAI quality is checked against a golden dataset, not vibes — `docs/evals/golden-museum-questions.md` defines questions + a rubric; `pnpm ai:eval:gate` runs faithfulness / relevance / citation-freshness checks and can gate releases against a recorded baseline (`scripts/ai-eval-gate.ts`). The AI-evals dashboard surfaces results at `/ai-evals`.\n\n## 5. Cost control\n\nThe model receives a **compact, capped projection** of the data, not the full record set: the computed analysis plus a representative sample (≤24 records, key fields). The full record set still drives citations and rights separately, so coverage is unchanged. This keeps a 57-record brief at a few thousand tokens (fractions of a cent) instead of ~160K tokens (~$0.80) — and makes it fit any model's context window. See the compact-prompt logic in `src/services/agents.ts`.\n\n## Provider & model\n\nThe LLM path uses the **Anthropic Claude** Messages API. Default model `claude-opus-4-8`, overridable via `ANTHROPIC_MODEL` (e.g. `claude-haiku-4-5` for a cheaper public demo). `temperature` is intentionally omitted (rejected by Opus-tier models); `stop_reason: \"refusal\"` is handled as a failure that falls back to the local heuristic draft.\n\n## Turning AI on (for an operator)\n\n1. Configure GitHub OAuth (`AUTH_GITHUB_ID` / `AUTH_GITHUB_SECRET`) so sign-in works — required for the auth-gate to grant model access.\n2. Set a dedicated, spend-capped `ANTHROPIC_API_KEY`.\n3. Public visitors get free heuristics; signed-in users get Claude; the provider cap bounds the floor. Full steps: [ops/go-live-checklist.md](ops/go-live-checklist.md).\n","sections":[{"level":2,"heading":"Threat model","anchor":"threat-model"},{"level":2,"heading":"1. Server-only key handling","anchor":"1-server-only-key-handling"},{"level":2,"heading":"2. Auth-gate (denial-of-wallet guard)","anchor":"2-auth-gate-denial-of-wallet-guard"},{"level":2,"heading":"3. Cite or refuse","anchor":"3-cite-or-refuse"},{"level":2,"heading":"4. Eval harness","anchor":"4-eval-harness"},{"level":2,"heading":"5. Cost control","anchor":"5-cost-control"},{"level":2,"heading":"Provider & model","anchor":"provider-model"},{"level":2,"heading":"Turning AI on (for an operator)","anchor":"turning-ai-on-for-an-operator"}],"html":"<h1 id=\"responsible-ai\">Responsible AI</h1>\n<p>How Meta Museum runs LLM features on a public deployment without leaking the API key, draining the budget, or publishing unverifiable content. Every claim here is enforced in code, not policy.</p>\n<h2 id=\"threat-model\">Threat model</h2>\n<p>Three distinct risks, handled separately:</p>\n<p>| Risk | Mitigation |</p>\n<p>|---|---|</p>\n<p>| <strong>Key exposure</strong> — the API key reaching the browser | Key is read <strong>only server-side</strong>; no `NEXT_PUBLIC_*` key exists. Next.js never ships server env vars to the client. |</p>\n<p>| <strong>Denial of wallet</strong> — anonymous traffic spending the key | The paid model path is <strong>auth-gated</strong>; anonymous requests transparently fall back to local heuristics. |</p>\n<p>| <strong>Unverifiable output</strong> — hallucinated or rights-unsafe content | <strong>Citation/originality gates</strong> + <strong>rights-aware</strong> rendering + <strong>human approval</strong> before any publication. |</p>\n<h2 id=\"1-server-only-key-handling\">1. Server-only key handling</h2>\n<p>`ANTHROPIC_API_KEY` is read only inside server code (`src/services/agents.ts` → `callClaude`). The model call is a server-side `fetch` to the Anthropic Messages API; the key is never serialized into a response or a client bundle. A `grep` for `NEXT_PUBLIC.*KEY` returns nothing.</p>\n<h2 id=\"2-auth-gate-denial-of-wallet-guard\">2. Auth-gate (denial-of-wallet guard)</h2>\n<p>The token-spending path (`generateContent` → `callClaude`, reached by `/api/content/generate` and `/api/agents/run`) runs the real model <strong>only for authenticated requests</strong>:</p>\n<ul><li>`src/services/ai-model-access.ts` — `modelAccessAuthenticated(request, session)` (a real Auth.js session, or a non-prod test-token) and `gateUseModel(requested, authenticated)`.</li><li>The routes resolve the session via `auth()` and <strong>downgrade `useModel` to `false` for anonymous callers</strong>, so the request transparently returns local heuristics instead of a model call.</li><li>Those write routes are <em>also</em> role-gated (editor+) by the middleware. Result: a random visitor or bot cannot trigger model spend.</li></ul>\n<p>Regression-tested both directions: an anonymous `useModel: true` request with a key present stays `mode: local-heuristics` (the model never fires); an authenticated request reaches the model. See `tests/services/ai-model-access.test.ts` and `tests/api/content-generate.test.ts`.</p>\n<p><strong>Operational floor:</strong> use a dedicated, spend-capped key. Even if every code guard failed, a hard monthly cap in the provider console bounds the worst case.</p>\n<h2 id=\"3-cite-or-refuse\">3. Cite or refuse</h2>\n<p>Generated content is gated before it can be marked publish-ready:</p>\n<ul><li><strong>Citation coverage</strong> — every record in scope must be cited (source URL + provider). Below threshold ⇒ `refused`.</li><li><strong>Originality</strong> — semantic-novelty score + a unique-insight minimum; template-only output is rejected.</li><li><strong>Rights</strong> — output carries `imageReuse` / `textReuse` / overall rights; unknown rights render an explicit &quot;do not reuse&quot; state.</li><li><strong>Review state</strong> — output is `ready-for-review`, never auto-published. Agents draft; humans approve.</li></ul>\n<p>A real production response shows all of these: `citationCoverage: 57/57`, `originalityScore: 0.96`, `rights: &quot;Rights unknown — do not reuse&quot;`, `reviewState: ready-for-review`, `refused: false`.</p>\n<h2 id=\"4-eval-harness\">4. Eval harness</h2>\n<p>AI quality is checked against a golden dataset, not vibes — `docs/evals/golden-museum-questions.md` defines questions + a rubric; `pnpm ai:eval:gate` runs faithfulness / relevance / citation-freshness checks and can gate releases against a recorded baseline (`scripts/ai-eval-gate.ts`). The AI-evals dashboard surfaces results at `/ai-evals`.</p>\n<h2 id=\"5-cost-control\">5. Cost control</h2>\n<p>The model receives a <strong>compact, capped projection</strong> of the data, not the full record set: the computed analysis plus a representative sample (≤24 records, key fields). The full record set still drives citations and rights separately, so coverage is unchanged. This keeps a 57-record brief at a few thousand tokens (fractions of a cent) instead of ~160K tokens (~$0.80) — and makes it fit any model&#39;s context window. See the compact-prompt logic in `src/services/agents.ts`.</p>\n<h2 id=\"provider-model\">Provider &amp; model</h2>\n<p>The LLM path uses the <strong>Anthropic Claude</strong> Messages API. Default model `claude-opus-4-8`, overridable via `ANTHROPIC_MODEL` (e.g. `claude-haiku-4-5` for a cheaper public demo). `temperature` is intentionally omitted (rejected by Opus-tier models); `stop_reason: &quot;refusal&quot;` is handled as a failure that falls back to the local heuristic draft.</p>\n<h2 id=\"turning-ai-on-for-an-operator\">Turning AI on (for an operator)</h2>\n<ol><li>Configure GitHub OAuth (`AUTH_GITHUB_ID` / `AUTH_GITHUB_SECRET`) so sign-in works — required for the auth-gate to grant model access.</li></ol>\n<ol><li>Set a dedicated, spend-capped `ANTHROPIC_API_KEY`.</li></ol>\n<ol><li>Public visitors get free heuristics; signed-in users get Claude; the provider cap bounds the floor. Full steps: ops/go-live-checklist.md(ops/go-live-checklist.md).</li></ol>","updatedAt":"2018-10-20T01:46:40.000Z","checksum":"f900066508212bfff42cf8565953e4deb456dc00984a32abb46fd9c348fe6bfc","checksumPrefix":"f90006650821","anchorCount":8,"lineCount":59,"rawUrl":"/api/docs/content?path=responsible-ai.md","htmlUrl":"/docs?doc=responsible-ai.md","apiUrl":"/api/docs/content?path=responsible-ai.md"}