Catalog (70)

ID	Document	Updated	Anchors	SHA
agents/ag2-extraction-notes	AG2 Extraction Notes agents/ag2-extraction-notes.md	10/20/2018, 1:46:40 AM	11	e8d0072ebec1
asset-provenance	Asset Provenance asset-provenance.md	10/20/2018, 1:46:40 AM	4	1025c0acc117
closeout-notes	AI-RSI one-click closeout notes closeout-notes.md	10/20/2018, 1:46:40 AM	2	1f560f6a8535
content-credibility-engine	Content Credibility Engine content-credibility-engine.md	10/20/2018, 1:46:40 AM	8	d9aa32358670
demo-script	Demo Video — Shot List & Script (60–90s) demo-script.md	10/20/2018, 1:46:40 AM	2	131ddae42e6e
deployment	Deployment — Vercel + Render deployment.md	10/20/2018, 1:46:40 AM	8	4911b1f459b5
development-roadmap	Meta Museum Development Roadmap development-roadmap.md	10/20/2018, 1:46:40 AM	23	624a8a089d72
development/aidd-tdd	AIDD + TDD Discipline development/aidd-tdd.md	10/20/2018, 1:46:40 AM	5	cd0a0524525a
env	Environment Variables env.md	10/20/2018, 1:46:40 AM	10	9c18634cab1a
evals/golden-museum-questions	Golden Eval Dataset: Complex Museum Questions evals/golden-museum-questions.md	10/20/2018, 1:46:40 AM	6	2876a2b5e78d
linked-art/conformance-matrix	Linked Art 1.0 — Conformance Matrix linked-art/conformance-matrix.md	10/20/2018, 1:46:40 AM	5	53ff87000bf4
linked-art/Linked%20Art%20Notes	Linked Art Notes.md linked-art/Linked Art Notes.md	10/20/2018, 1:46:40 AM	0	aca66d51107b
linked-art/Linked%20Open%20Art%20Data%20Web%20App%20-%20Must-have%20Data%20Sources	Linked Open Art Data Web App (AI) — Must-have Data Sources linked-art/Linked Open Art Data Web App - Must-have Data Sources.md	10/20/2018, 1:46:40 AM	7	7b7d350fe8a0
linked-art/LinkedArtAppFeatures	🏛️ Art Explorer: Linked Art Application & Ecosystem linked-art/LinkedArtAppFeatures.md	10/20/2018, 1:46:40 AM	14	e23b890ecd2a
linked-art/LinkedArtChallenges	LinkedArtChallenges.md linked-art/LinkedArtChallenges.md	10/20/2018, 1:46:40 AM	0	d8c987070277
linked-art/LinkedArtCollaboration	LinkedArtCollaboration.md linked-art/LinkedArtCollaboration.md	10/20/2018, 1:46:40 AM	1	14ccf63edef3
linked-art/LinkedArtDashboard	LinkedArtDashboard.md linked-art/LinkedArtDashboard.md	10/20/2018, 1:46:40 AM	0	6d04d4b2bf79
linked-art/LinkedArtFeatureRoadmap	Feature Roadmap for Linked Open Art Data Apps linked-art/LinkedArtFeatureRoadmap.md	10/20/2018, 1:46:40 AM	8	ac10d8e79c20
linked-art/LinkedArtJobReady	LinkedArtJobReady.md linked-art/LinkedArtJobReady.md	10/20/2018, 1:46:40 AM	0	c60b357bcb87
linked-art/LinkedArtModel1.0-Reference	Linked Art Model 1.0 Reference (Round 1) linked-art/LinkedArtModel1.0-Reference.md	10/20/2018, 1:46:40 AM	344	e6d48d474b3e
linked-art/LinkedArtPatterns	LinkedArtPatterns.md linked-art/LinkedArtPatterns.md	10/20/2018, 1:46:40 AM	0	d45bbbb02d70
linked-art/LinkedArtPRD	🖼️ Product Requirements Document linked-art/LinkedArtPRD.md	10/20/2018, 1:46:40 AM	20	91bc1f37307c
linked-art/LinkedArtRoadmap	LinkedArtRoadmap.md linked-art/LinkedArtRoadmap.md	10/20/2018, 1:46:40 AM	0	e52e71c6bd28
linked-art/LinkedArtSaaS	LinkedArtSaaS.md linked-art/LinkedArtSaaS.md	10/20/2018, 1:46:40 AM	0	3d260738fb29
linked-art/LinkedArtSoftware	Code and Tools linked-art/LinkedArtSoftware.md	10/20/2018, 1:46:40 AM	8	9e8fef24aea9
linked-art/LinkedArtSOTAWebApp	LinkedArt SOTA Web App — Master Build Specification linked-art/LinkedArtSOTAWebApp.md	10/20/2018, 1:46:40 AM	129	a5f0baca89c6
linked-art/LinkedArtUnmetNeeds	LinkedArtUnmetNeeds.md linked-art/LinkedArtUnmetNeeds.md	10/20/2018, 1:46:40 AM	0	cb35fac29cc1
linked-art/LinkedArtUseCases	LinkedArtUseCases.md linked-art/LinkedArtUseCases.md	10/20/2018, 1:46:40 AM	0	5c572ce8e7f3
linked-art/LinkedArtWidgets	LinkedArtWidgets.md linked-art/LinkedArtWidgets.md	10/20/2018, 1:46:40 AM	0	b39911c7d97d
linked-art/LinkedDesign	LinkedDesign.md linked-art/LinkedDesign.md	10/20/2018, 1:46:40 AM	0	0a02240471e5
linked-art/LODEngine	LODEngine.md linked-art/LODEngine.md	10/20/2018, 1:46:40 AM	0	ef73426f80db
linked-art/LODPipeline	LODPipeline.md linked-art/LODPipeline.md	10/20/2018, 1:46:40 AM	0	fe95e61ed9da
linked-art/LODTools	LODTools.md linked-art/LODTools.md	10/20/2018, 1:46:40 AM	0	3167947fc4e4
linked-art/SPARQL	SPARQL.md linked-art/SPARQL.md	10/20/2018, 1:46:40 AM	0	50e00ed51733
linked-art/Vocabularies	Vocabularies.md linked-art/Vocabularies.md	10/20/2018, 1:46:40 AM	0	e0574a338aaa
linked-art/YaleLux	YaleLux.md linked-art/YaleLux.md	10/20/2018, 1:46:40 AM	0	74fd47fae749
meta-wiki-art-bridge	Meta Wiki Art Bridge (MediaWiki + Wikibase) meta-wiki-art-bridge.md	10/20/2018, 1:46:40 AM	7	7a43fb0c48b8
ops/activity-adoption-proof	Activity Feed Adoption Proof Runbook ops/activity-adoption-proof.md	10/20/2018, 1:46:40 AM	5	68a80b43ae58
ops/ag2-worker	AG2 Worker and Bridge Runbook ops/ag2-worker.md	10/20/2018, 1:46:40 AM	9	50efcd4e3318
ops/auth-credential-rotation	Auth credential rotation runbook ops/auth-credential-rotation.md	10/20/2018, 1:46:40 AM	4	449b8b8eecb6
ops/deployment-preflight	Deployment Preflight Runbook ops/deployment-preflight.md	10/20/2018, 1:46:40 AM	5	ac60432d0aed
ops/era-c-exit-gate-evidence	Era C Exit-Gate Evidence Pack ops/era-c-exit-gate-evidence.md	10/20/2018, 1:46:40 AM	6	656b9c7f85c6
ops/go-live-checklist	Go-Live & Evidence-Pipeline Checklist ops/go-live-checklist.md	10/20/2018, 1:46:40 AM	6	ae7f5d71f7dc
ops/k6-slo	k6 SLO Load Test (SOTA §20.4) ops/k6-slo.md	10/20/2018, 1:46:40 AM	4	328b5b3163d4
ops/kpi-evidence	SOTA §26 KPI Evidence Input ops/kpi-evidence.md	10/20/2018, 1:46:40 AM	5	d7b2973d2927
ops/launch-review	Launch Review Packet ops/launch-review.md	10/20/2018, 1:46:40 AM	5	880e41ebcbe3
ops/managed-linked-art-pilot-runbook	Managed Linked Art Pilot Runbook ops/managed-linked-art-pilot-runbook.md	10/20/2018, 1:46:40 AM	11	d4f125c2ddae
ops/otel-local	Local OpenTelemetry Wiring (Tempo / Jaeger) ops/otel-local.md	10/20/2018, 1:46:40 AM	5	1ebbc3b33f92
ops/outbox-projector	Transactional Outbox Projector (Postgres -> Solr/GraphDB) ops/outbox-projector.md	10/20/2018, 1:46:40 AM	5	dc70ad766471
ops/procurement-readiness-packet	Procurement Readiness Packet ops/procurement-readiness-packet.md	10/20/2018, 1:46:40 AM	9	c5685e82cca7
ops/reconciliation-service	Reconciliation Service (C2) ops/reconciliation-service.md	10/20/2018, 1:46:40 AM	6	05162c313ea9
ops/search-graph-provisioning	Solr 9 + GraphDB Provisioning ops/search-graph-provisioning.md	10/20/2018, 1:46:40 AM	6	fc1b15279a84
ops/security-dr-drill	Pen Test Baseline + DR Drill Runbook ops/security-dr-drill.md	10/20/2018, 1:46:40 AM	3	a766ef3e2afc
progress/2026-05-31/era-c-readiness-snapshot	Era C Readiness Snapshot (May 31, 2026) progress/2026-05-31/era-c-readiness-snapshot.md	10/20/2018, 1:46:40 AM	3	9672614ceb53
progress/era-history	Meta Museum — Era Delivery History progress/era-history.md	10/20/2018, 1:46:40 AM	47	cc030755d1e5
providers/harvard-art-museums	Harvard Art Museums API Integration Plan providers/harvard-art-museums.md	10/20/2018, 1:46:40 AM	11	fa8b980154f5
providers/louvre-collections-json	Louvre Collections JSON Integration Plan providers/louvre-collections-json.md	10/20/2018, 1:46:40 AM	11	775f91a8d813
providers/nga-open-data	National Gallery of Art (NGA) Open Data Integration Plan providers/nga-open-data.md	10/20/2018, 1:46:40 AM	11	51c4807c8de0
providers/princeton-art-museum	Princeton University Art Museum API Integration Plan providers/princeton-art-museum.md	10/20/2018, 1:46:40 AM	11	c8823f65ee41
providers/rkd-knowledge-graph	RKD Knowledge Graph Integration Plan providers/rkd-knowledge-graph.md	10/20/2018, 1:46:40 AM	16	2b4b42f2ad42
providers/smithsonian-open-access	Smithsonian Open Access Integration Plan providers/smithsonian-open-access.md	10/20/2018, 1:46:40 AM	12	db1ffa4cab02
providers/vanda-collections-api	Victoria and Albert Museum (V&A) Collections API Integration Plan providers/vanda-collections-api.md	10/20/2018, 1:46:40 AM	11	755d93972233
quality	Quality & Performance quality.md	10/20/2018, 1:46:40 AM	6	174add040960
reconciliation/exhibition-literature-reconciliation	Exhibition + Literature Reconciliation (B6.1) reconciliation/exhibition-literature-reconciliation.md	10/20/2018, 1:46:40 AM	7	293e9d81dd7c
responsible-ai	Responsible AI responsible-ai.md	10/20/2018, 1:46:40 AM	8	f90006650821
risk-register	Risk Register risk-register.md	10/20/2018, 1:46:40 AM	4	becb213d5c5e
roadmap-to-10	Roadmap to 10/10 roadmap-to-10.md	10/20/2018, 1:46:40 AM	15	40a11000dc7e
roadmap	Meta Museum Roadmap roadmap.md	10/20/2018, 1:46:40 AM	18	145d0cbbe54a
rsi-wiki	AI-RSI compounding wiki rsi-wiki.md	10/20/2018, 1:46:40 AM	8	b64914fe6f20
wikibase-cloud-migration-checklist	Wikibase Cloud -> Self-Host Migration Checklist wikibase-cloud-migration-checklist.md	10/20/2018, 1:46:40 AM	12	170657fcbf2b

Current Document: Golden Eval Dataset: Complex Museum Questions

Source updated 10/20/2018, 1:46:40 AM · SHA-256 2876a2b5e78d · 168 lines

Canonical ID: evals/golden-museum-questions

JSON for this doc:/api/docs/content?path=evals/golden-museum-questions.md

Human link:/docs?doc=evals%2Fgolden-museum-questions.md

Canonical API endpoint:/api/docs/content?path=evals%2Fgolden-museum-questions.md

Sections (stable anchors):

scope Scope H2
prompt-contract Prompt contract H2
reliability-rubric Reliability rubric H2
conformance-enforcement Conformance enforcement H2
ci-gate-ai-layer-prs CI gate (AI-layer PRs) H2
regression-thresholds-fail-fast-citation-drift Regression thresholds + fail-fast citation drift H2

Golden Eval Dataset: Complex Museum Questions

This document defines the golden dataset used for AI/LLM reliability checks on grounding and citation behavior.

Dataset file:

`evals/golden-museum-questions.v1.json`

Scope

Minimum prompt count: `100` (current: `120`)
Focus: complex museum/research questions where grounding quality is high risk
Required behavior: `cite-or-refuse`

Prompt contract

Each prompt row includes:

stable `id` (`MM-EVAL-xxxx`)
`domain` and `difficulty`
full natural-language `prompt`
`expectedBehavior` with:
grounding requirements
per-claim citation requirements
refusal behavior when evidence is missing
explicit failure behaviors (`mustNot`)

Citation expectations are strict:

per-claim citations
entity identifier references
property-path scoped references
source URL and retrieval timestamp

Reliability rubric

Dataset-level thresholds:

`groundedPrecisionThreshold = 0.95`
`citationCoverageThreshold = 0.95`
`citationFreshnessThreshold = 0.95`
refusal policy = `cite-or-refuse`

Conformance enforcement

Executable quality test:

`tests/quality/ai-eval-golden-dataset.test.ts`

The test enforces:

prompt minimum (`>=100`)
prompt ID uniqueness
domain diversity
required grounding/citation/refusal fields for every prompt row

CI gate (AI-layer PRs)

Eval harness:

`scripts/ai-eval-gate.ts`
`src/services/ai-eval-harness.ts`

Commands:

`pnpm ai:eval:report` (writes artifact without failing)
`pnpm ai:eval:gate` (fails when thresholds/regressions are not met)
`pnpm ai:eval:baseline:record` (records/updates baseline for the current dataset/model/prompt identity)

Workflow:

`.github/workflows/ai-eval-gate.yml`

Gate metrics:

faithfulness
relevance
citation accuracy
citation freshness

Output artifact:

`artifacts/evals/ai-eval-gate-latest.json`
`artifacts/evals/runs/ai-eval-gate-<timestamp>.json`
`artifacts/evals/trend-index.json`
`artifacts/evals/summary.md`
`artifacts/evals/summary.md` includes review priority when at least two retained runs exist.
`/api/ai-evals/summary` exposes the same review priority and severity history as JSON for agents.
`/api/openapi` references `AiEvalSummaryResponse` for `/api/ai-evals/summary`, and route tests parse payloads with `aiEvalSummaryResponseSchema`.
`/api/ai-evals/summary` returns `schemaVersion: 1`; unsupported versions must be rejected by agent contract tests before consumption.

Schema migration notes:

`schemaVersion: 1` is the current agent contract for `/api/ai-evals/summary`.
`schemaVersion: 2` is reserved for future additive agent fields or review-priority metadata changes that cannot be represented safely in v1.
Do not accept v2 payloads until a v2 parser, v2 fixture, OpenAPI schema update, and migration notes land in the same RSI cycle.
Fixture compatibility tests keep `tests/fixtures/ai-eval-summary/schema-v1-ready.json` accepted and `schema-v2-planned.json` rejected until that migration exists.

CI summary behavior:

`.github/workflows/ai-eval-gate.yml` appends `artifacts/evals/summary.md` to `$GITHUB_STEP_SUMMARY`.
The same workflow uploads `artifacts/evals/` as `ai-eval-artifacts`.
Summary badges include status, faithfulness, relevance, citation accuracy, `citationFreshness`, and pass rate.
Freshness-aging alerts warn when `citationFreshness` stays flat/green while the oldest cited evidence approaches the max policy window.
Review priority summarizes the latest-vs-previous severity (`regression`, `watch`, `stable`, or `improved`) directly in the CI summary.
Malformed `diffSeverityPolicy` values fail validation instead of silently falling back.
CI logs emit a GitHub warning annotation when review priority is `watch` or `regression`.

CI annotation examples:


::warning title=AI eval regression::Review priority is regression. current-run vs previous-run. Review severity%3A regression.
::warning title=AI eval watch::Review priority is watch. current-run vs previous-run. Review severity%3A watch.

Dashboard:

`/ai-evals` is the read-only local artifact dashboard for humans and AI-assisted operators.
It reads ignored local `artifacts/evals/*` files when present and shows an explicit non-failing empty state when absent.
It surfaces latest metrics, run identity, `citationFreshness`, freshness-aging pressure, active warnings, trend history, and artifact timestamps.
It includes a latest-vs-previous diff for retained runs with metric deltas, freshness-aging pressure movement, and “what changed” notes for fast review.
It labels latest-vs-previous diffs as `regression`, `watch`, `stable`, or `improved` using explicit metric-drop and freshness-aging thresholds.
It reads those thresholds from `config/ai-eval-regression-policy.json` and shows policy-window severity distribution counts.
It shows a compact severity sparkline/history where `!` = regression, `~` = watch, `=` = stable, and `+` = improved.

Retention controls:

`METAMUSEUM_EVAL_RETENTION_MAX_RUNS` (default `200`)
optional CLI override: `--retention-max-runs=<n>`
`severityHistoryPolicy.maxComparisons` in `config/ai-eval-regression-policy.json` controls how many latest-vs-previous severity comparisons are shown to humans and agents.
Orphaned JSON files under `artifacts/evals/runs/` are pruned when they fall outside the retained trend index; non-JSON operator notes are preserved.
Retention pruning reports `delete` or `dry-run` mode in `artifacts/evals/summary.md`, including retained, orphaned, deleted, and preserved-file counts.
The latest retention pruning report is also exposed through `/api/ai-evals/summary` as `summary.retentionPruneReport` for agent consumers.
Generated CI summary markdown is snapshot-locked by `tests/fixtures/ai-eval-summary/summary-snapshot.md`.
`/api/openapi` includes the AI eval summary schema migration compatibility table so agents can discover current and planned summary versions.

Regression thresholds + fail-fast citation drift

Policy file:

`config/ai-eval-regression-policy.json`

Policy includes:

absolute floor thresholds (`faithfulness`, `relevance`, `citationAccuracy`, `citationFreshness`, `passRate`)
drift thresholds (`max*Drop`) for regression comparisons
diff severity thresholds (`diffSeverityPolicy`) for dashboard/CI review priority
`diffSeverityPolicy` schema/order validation before severity classification
severity-history retention window (`severityHistoryPolicy.maxComparisons`) with explicit malformed-policy failure tests
`failFastOnCitationDrift` control
`failFastOnCitationFreshnessDrift` control
versioned baselines keyed by:
dataset ID + dataset version
model version
prompt version

Version identity env vars:

`METAMUSEUM_EVAL_MODEL_VERSION`
`METAMUSEUM_EVAL_PROMPT_VERSION`
`METAMUSEUM_EVAL_RETENTION_MAX_RUNS`

Fail-fast behavior:

If citation accuracy drop exceeds `maxCitationAccuracyDrop`, gate exits immediately with failure.
If citation freshness drop exceeds `maxCitationFreshnessDrop`, gate exits immediately with failure.
If citation freshness is flat or improved while `oldestAgeRatio` crosses the aging-pressure threshold, the summary emits a `citation_freshness_aging_pressure` warning for proactive review.

AI/agent quick endpoints

/api/docs/manifest — current catalog with IDs, anchors, source timestamps, and checksums.
/api/docs/content?path=roadmap.md — roadmap JSON payload + rendered HTML for downstream tooling.

Meta Museum Docs (Live Markdown Source)