Responsible AI
How Meta Museum runs LLM features on a public deployment without leaking the API key, draining the budget, or publishing unverifiable content. Every claim here is enforced in code, not policy.
Threat model
Three distinct risks, handled separately:
| Risk | Mitigation |
|---|---|
| Key exposure — the API key reaching the browser | Key is read only server-side; no `NEXT_PUBLIC_*` key exists. Next.js never ships server env vars to the client. |
| Denial of wallet — anonymous traffic spending the key | The paid model path is auth-gated; anonymous requests transparently fall back to local heuristics. |
| Unverifiable output — hallucinated or rights-unsafe content | Citation/originality gates + rights-aware rendering + human approval before any publication. |
1. Server-only key handling
`ANTHROPIC_API_KEY` is read only inside server code (`src/services/agents.ts` → `callClaude`). The model call is a server-side `fetch` to the Anthropic Messages API; the key is never serialized into a response or a client bundle. A `grep` for `NEXT_PUBLIC.*KEY` returns nothing.
2. Auth-gate (denial-of-wallet guard)
The token-spending path (`generateContent` → `callClaude`, reached by `/api/content/generate` and `/api/agents/run`) runs the real model only for authenticated requests:
- `src/services/ai-model-access.ts` — `modelAccessAuthenticated(request, session)` (a real Auth.js session, or a non-prod test-token) and `gateUseModel(requested, authenticated)`.
- The routes resolve the session via `auth()` and downgrade `useModel` to `false` for anonymous callers, so the request transparently returns local heuristics instead of a model call.
- Those write routes are also role-gated (editor+) by the middleware. Result: a random visitor or bot cannot trigger model spend.
Regression-tested both directions: an anonymous `useModel: true` request with a key present stays `mode: local-heuristics` (the model never fires); an authenticated request reaches the model. See `tests/services/ai-model-access.test.ts` and `tests/api/content-generate.test.ts`.
Operational floor: use a dedicated, spend-capped key. Even if every code guard failed, a hard monthly cap in the provider console bounds the worst case.
3. Cite or refuse
Generated content is gated before it can be marked publish-ready:
- Citation coverage — every record in scope must be cited (source URL + provider). Below threshold ⇒ `refused`.
- Originality — semantic-novelty score + a unique-insight minimum; template-only output is rejected.
- Rights — output carries `imageReuse` / `textReuse` / overall rights; unknown rights render an explicit "do not reuse" state.
- Review state — output is `ready-for-review`, never auto-published. Agents draft; humans approve.
A real production response shows all of these: `citationCoverage: 57/57`, `originalityScore: 0.96`, `rights: "Rights unknown — do not reuse"`, `reviewState: ready-for-review`, `refused: false`.
4. Eval harness
AI quality is checked against a golden dataset, not vibes — `docs/evals/golden-museum-questions.md` defines questions + a rubric; `pnpm ai:eval:gate` runs faithfulness / relevance / citation-freshness checks and can gate releases against a recorded baseline (`scripts/ai-eval-gate.ts`). The AI-evals dashboard surfaces results at `/ai-evals`.
5. Cost control
The model receives a compact, capped projection of the data, not the full record set: the computed analysis plus a representative sample (≤24 records, key fields). The full record set still drives citations and rights separately, so coverage is unchanged. This keeps a 57-record brief at a few thousand tokens (fractions of a cent) instead of ~160K tokens (~$0.80) — and makes it fit any model's context window. See the compact-prompt logic in `src/services/agents.ts`.
Provider & model
The LLM path uses the Anthropic Claude Messages API. Default model `claude-opus-4-8`, overridable via `ANTHROPIC_MODEL` (e.g. `claude-haiku-4-5` for a cheaper public demo). `temperature` is intentionally omitted (rejected by Opus-tier models); `stop_reason: "refusal"` is handled as a failure that falls back to the local heuristic draft.
Turning AI on (for an operator)
- Configure GitHub OAuth (`AUTH_GITHUB_ID` / `AUTH_GITHUB_SECRET`) so sign-in works — required for the auth-gate to grant model access.
- Set a dedicated, spend-capped `ANTHROPIC_API_KEY`.
- Public visitors get free heuristics; signed-in users get Claude; the provider cap bounds the floor. Full steps: ops/go-live-checklist.md(ops/go-live-checklist.md).