Era C Exit-Gate Evidence Pack
This runbook turns the four Era C exit checks into automated, dated pass/fail evidence.
Scope
The evidence pack evaluates:
- SOTA §20.4 p95 SLOs over a rolling 30-day window: cached Record, cold Record, keyword+facet search, whitelisted SPARQL, and IIIF tile serving.
- Public-read uptime >= 99.9% over 30 days.
- >= 3 declared external consumers of `/api/activity` within the adoption window.
- SOTA §26 KPI thresholds.
Artifact schema
Canonical schema:
- `docs/schemas/era-c-exit-gate-evidence.schema.json`
Latest and historical artifacts:
- `artifacts/exit-gate/era-c-exit-gate-latest.json`
- `artifacts/exit-gate/runs/era-c-exit-gate-<timestamp>.json`
- `artifacts/exit-gate/trend-index.json`
The trend index stores each check's pass/fail state plus compact `failedChecks` reasons so agents can identify the next blocking gate without opening every historical run artifact.
Supporting performance trend input:
- `artifacts/performance/k6-slo-summary.json`
- `artifacts/performance/k6-slo-trend.json`
New performance samples must include all five p95 metrics. Legacy three-metric samples remain in history, but they fail the all-SLO gate until replaced by 30 days of complete samples.
Exit-gate reports include `missingMetricsInWindow` plus per-sample `metricDetails` so partial k6 summaries are distinguishable from actual p95 threshold breaches.
Policy and inputs
Policy thresholds:
- `config/era-c-exit-gate-policy.json`
- Public-read uptime evidence must be recent: `uptime.maxSnapshotAgeHours` defaults to `48`, so stale or undated snapshots cannot satisfy the 99.9% gate.
- SOTA §26 KPI evidence must be recent: `kpis26.maxSnapshotAgeHours` defaults to `48`.
- AI query cost must come from real usage telemetry when `kpis26.requireAiQueryCostTelemetry` is `true`; fallback default cost values do not satisfy the KPI gate.
- AI query rows written by `/api/ai/query` include `costUsd`, `costCurrency`, `costSource`, and usage counts. The deterministic local planner records `costUsd: 0` with `costSource: "deterministic-local-planner"` because it makes no paid model call.
Metric inputs:
- Uptime snapshot: `monitoring/public-read-uptime.json`
- Exit-gate reports include the uptime `source` (`prometheus`, `probe`, or `unavailable`) and source `notes` so missing 30-day proof is actionable from the artifact alone.
- Uptime history: `monitoring/public-read-uptime-history.json`
- KPI snapshot: `monitoring/kpis.json`
- Exit-gate reports include KPI `sources`, snapshot `notes`, and per-failed-metric details (`metric`, `source`, `reason`) so SOTA §26 blockers can be triaged from the evidence artifact.
- Optional aggregate KPI input: `monitoring/kpi-evidence.json` supplies production-like record-enrichment and reconciliation review counts before `pnpm monitoring:telemetry:sync`; see `docs/ops/kpi-evidence.md`.
- `aiQueryCostUsd` is calculated from `storage/ai-query-log.json`; run at least one representative `/api/ai/query` request in the target environment before telemetry sync so the source is no longer the fallback default.
- Activity adoption telemetry: `storage/activity-consumers.json`
- Declared consumer rows must preserve `class: "declared"` and `declaredId`; the exit-gate normalizer intentionally ignores derived fingerprints for adoption credit.
- Proof command: `pnpm activity:adoption:probe` records one partner-owned declared consumer and writes `artifacts/activity-adoption/activity-adoption-proof-latest.json`.
- Matrix proof command: `pnpm activity:adoption:matrix` records/checks three partner-owned declared consumers and writes `artifacts/activity-adoption/activity-adoption-matrix-latest.json`.
Activity adoption counts only declared external consumers (`x-linked-art-consumer-id`). Derived fingerprints remain useful diagnostics, but they do not satisfy the external Linked Art systems exit gate.
See `docs/ops/activity-adoption-proof.md` for partner hand-off instructions and placeholder-ID safeguards.
Commands
Refresh telemetry snapshots first:
pnpm monitoring:telemetry:sync
Generate evidence report (always writes artifact):
pnpm era-c:exit-gate:evidence
Enforce as a hard gate (non-zero exit on any failed check):
pnpm era-c:exit-gate:check
Nightly automation
Workflow:
- `.github/workflows/era-c-exit-gate-evidence.yml`
Nightly job does:
- Run `pnpm k6:slo` against the deployed target when `METAMUSEUM_EVIDENCE_BASE_URL` and `METAMUSEUM_EVIDENCE_IIIF_TILE_URL` are configured; otherwise run local `pnpm k6:slo:ci` as a fallback.
- Seed one deployed `/api/ai/query` request when `METAMUSEUM_EVIDENCE_BASE_URL` is configured so cost telemetry can flow into SOTA §26 KPI evidence.
- Probe the declared `/api/activity` adoption matrix when `METAMUSEUM_ACTIVITY_CONSUMER_IDS` is configured; fall back to one declared consumer when only `METAMUSEUM_ACTIVITY_CONSUMER_ID` is configured.
- Run `pnpm era-c:exit-gate:evidence` (auto-runs telemetry sync).
- Upload exit-gate, performance, activity-adoption, uptime, and KPI artifacts.
The workflow always uploads artifacts so failed checks still produce dated proof.
Deployed-target workflow variables:
- `METAMUSEUM_EVIDENCE_BASE_URL` — deployed app URL used as `BASE_URL` for k6, AI query seeding, and adoption proof.
- `METAMUSEUM_EVIDENCE_IIIF_TILE_URL` — CDN-backed IIIF tile used as `IIIF_TILE_URL` for the tile-serving p95 check.
- `METAMUSEUM_EVIDENCE_SPARQL_URL` — whitelisted SPARQL endpoint used as `SPARQL_URL` by `pnpm k6:slo`.
- `METAMUSEUM_EVIDENCE_SPARQL_QUERY` — optional whitelisted SPARQL query override for `pnpm k6:slo`.
- `METAMUSEUM_EVIDENCE_AI_QUERY` — optional representative query for the deployed `/api/ai/query` telemetry seed.
- `METAMUSEUM_ACTIVITY_CONSUMER_IDS` — comma-separated partner-owned declared consumer IDs for the full adoption matrix proof.
- `METAMUSEUM_ACTIVITY_CONSUMER_ID` — partner-owned declared consumer ID for the single-consumer adoption proof fallback.
Telemetry source configuration
Primary uptime source (recommended):
- `METAMUSEUM_UPTIME_PROMETHEUS_URL`
- `METAMUSEUM_UPTIME_PROMQL_AVAILABILITY`
- `METAMUSEUM_UPTIME_PROMQL_SAMPLE_COUNT`
GitHub Actions wiring:
- secret: `METAMUSEUM_UPTIME_PROMETHEUS_URL`
- repo/org var: `METAMUSEUM_UPTIME_PROMQL_AVAILABILITY`
- repo/org var: `METAMUSEUM_UPTIME_PROMQL_SAMPLE_COUNT`
Fallback uptime source:
- `METAMUSEUM_PUBLIC_READ_BASE_URL`
- `METAMUSEUM_UPTIME_PROBE_PATHS`
- `METAMUSEUM_UPTIME_PROBE_TIMEOUT_MS`
Window controls:
- `METAMUSEUM_UPTIME_WINDOW_DAYS` (default `30`)
- `METAMUSEUM_UPTIME_HISTORY_RETENTION_DAYS` (default `45`)