Catalog (70)

IDDocumentUpdatedAnchorsSHA
agents/ag2-extraction-notesAG2 Extraction Notes
agents/ag2-extraction-notes.md
10/20/2018, 1:46:40 AM11e8d0072ebec1
asset-provenanceAsset Provenance
asset-provenance.md
10/20/2018, 1:46:40 AM41025c0acc117
closeout-notesAI-RSI one-click closeout notes
closeout-notes.md
10/20/2018, 1:46:40 AM21f560f6a8535
content-credibility-engineContent Credibility Engine
content-credibility-engine.md
10/20/2018, 1:46:40 AM8d9aa32358670
demo-scriptDemo Video — Shot List & Script (60–90s)
demo-script.md
10/20/2018, 1:46:40 AM2131ddae42e6e
deploymentDeployment — Vercel + Render
deployment.md
10/20/2018, 1:46:40 AM84911b1f459b5
development-roadmapMeta Museum Development Roadmap
development-roadmap.md
10/20/2018, 1:46:40 AM23624a8a089d72
development/aidd-tddAIDD + TDD Discipline
development/aidd-tdd.md
10/20/2018, 1:46:40 AM5cd0a0524525a
envEnvironment Variables
env.md
10/20/2018, 1:46:40 AM109c18634cab1a
evals/golden-museum-questionsGolden Eval Dataset: Complex Museum Questions
evals/golden-museum-questions.md
10/20/2018, 1:46:40 AM62876a2b5e78d
linked-art/conformance-matrixLinked Art 1.0 — Conformance Matrix
linked-art/conformance-matrix.md
10/20/2018, 1:46:40 AM553ff87000bf4
linked-art/Linked%20Art%20NotesLinked Art Notes.md
linked-art/Linked Art Notes.md
10/20/2018, 1:46:40 AM0aca66d51107b
linked-art/Linked%20Open%20Art%20Data%20Web%20App%20-%20Must-have%20Data%20SourcesLinked Open Art Data Web App (AI) — Must-have Data Sources
linked-art/Linked Open Art Data Web App - Must-have Data Sources.md
10/20/2018, 1:46:40 AM77b7d350fe8a0
linked-art/LinkedArtAppFeatures🏛️ Art Explorer: Linked Art Application & Ecosystem
linked-art/LinkedArtAppFeatures.md
10/20/2018, 1:46:40 AM14e23b890ecd2a
linked-art/LinkedArtChallengesLinkedArtChallenges.md
linked-art/LinkedArtChallenges.md
10/20/2018, 1:46:40 AM0d8c987070277
linked-art/LinkedArtCollaborationLinkedArtCollaboration.md
linked-art/LinkedArtCollaboration.md
10/20/2018, 1:46:40 AM114ccf63edef3
linked-art/LinkedArtDashboardLinkedArtDashboard.md
linked-art/LinkedArtDashboard.md
10/20/2018, 1:46:40 AM06d04d4b2bf79
linked-art/LinkedArtFeatureRoadmapFeature Roadmap for Linked Open Art Data Apps
linked-art/LinkedArtFeatureRoadmap.md
10/20/2018, 1:46:40 AM8ac10d8e79c20
linked-art/LinkedArtJobReadyLinkedArtJobReady.md
linked-art/LinkedArtJobReady.md
10/20/2018, 1:46:40 AM0c60b357bcb87
linked-art/LinkedArtModel1.0-ReferenceLinked Art Model 1.0 Reference (Round 1)
linked-art/LinkedArtModel1.0-Reference.md
10/20/2018, 1:46:40 AM344e6d48d474b3e
linked-art/LinkedArtPatternsLinkedArtPatterns.md
linked-art/LinkedArtPatterns.md
10/20/2018, 1:46:40 AM0d45bbbb02d70
linked-art/LinkedArtPRD🖼️ Product Requirements Document
linked-art/LinkedArtPRD.md
10/20/2018, 1:46:40 AM2091bc1f37307c
linked-art/LinkedArtRoadmapLinkedArtRoadmap.md
linked-art/LinkedArtRoadmap.md
10/20/2018, 1:46:40 AM0e52e71c6bd28
linked-art/LinkedArtSaaSLinkedArtSaaS.md
linked-art/LinkedArtSaaS.md
10/20/2018, 1:46:40 AM03d260738fb29
linked-art/LinkedArtSoftwareCode and Tools
linked-art/LinkedArtSoftware.md
10/20/2018, 1:46:40 AM89e8fef24aea9
linked-art/LinkedArtSOTAWebAppLinkedArt SOTA Web App — Master Build Specification
linked-art/LinkedArtSOTAWebApp.md
10/20/2018, 1:46:40 AM129a5f0baca89c6
linked-art/LinkedArtUnmetNeedsLinkedArtUnmetNeeds.md
linked-art/LinkedArtUnmetNeeds.md
10/20/2018, 1:46:40 AM0cb35fac29cc1
linked-art/LinkedArtUseCasesLinkedArtUseCases.md
linked-art/LinkedArtUseCases.md
10/20/2018, 1:46:40 AM05c572ce8e7f3
linked-art/LinkedArtWidgetsLinkedArtWidgets.md
linked-art/LinkedArtWidgets.md
10/20/2018, 1:46:40 AM0b39911c7d97d
linked-art/LinkedDesignLinkedDesign.md
linked-art/LinkedDesign.md
10/20/2018, 1:46:40 AM00a02240471e5
linked-art/LODEngineLODEngine.md
linked-art/LODEngine.md
10/20/2018, 1:46:40 AM0ef73426f80db
linked-art/LODPipelineLODPipeline.md
linked-art/LODPipeline.md
10/20/2018, 1:46:40 AM0fe95e61ed9da
linked-art/LODToolsLODTools.md
linked-art/LODTools.md
10/20/2018, 1:46:40 AM03167947fc4e4
linked-art/SPARQLSPARQL.md
linked-art/SPARQL.md
10/20/2018, 1:46:40 AM050e00ed51733
linked-art/VocabulariesVocabularies.md
linked-art/Vocabularies.md
10/20/2018, 1:46:40 AM0e0574a338aaa
linked-art/YaleLuxYaleLux.md
linked-art/YaleLux.md
10/20/2018, 1:46:40 AM074fd47fae749
meta-wiki-art-bridgeMeta Wiki Art Bridge (MediaWiki + Wikibase)
meta-wiki-art-bridge.md
10/20/2018, 1:46:40 AM77a43fb0c48b8
ops/activity-adoption-proofActivity Feed Adoption Proof Runbook
ops/activity-adoption-proof.md
10/20/2018, 1:46:40 AM568a80b43ae58
ops/ag2-workerAG2 Worker and Bridge Runbook
ops/ag2-worker.md
10/20/2018, 1:46:40 AM950efcd4e3318
ops/auth-credential-rotationAuth credential rotation runbook
ops/auth-credential-rotation.md
10/20/2018, 1:46:40 AM4449b8b8eecb6
ops/deployment-preflightDeployment Preflight Runbook
ops/deployment-preflight.md
10/20/2018, 1:46:40 AM5ac60432d0aed
ops/era-c-exit-gate-evidenceEra C Exit-Gate Evidence Pack
ops/era-c-exit-gate-evidence.md
10/20/2018, 1:46:40 AM6656b9c7f85c6
ops/go-live-checklistGo-Live & Evidence-Pipeline Checklist
ops/go-live-checklist.md
10/20/2018, 1:46:40 AM6ae7f5d71f7dc
ops/k6-slok6 SLO Load Test (SOTA §20.4)
ops/k6-slo.md
10/20/2018, 1:46:40 AM4328b5b3163d4
ops/kpi-evidenceSOTA §26 KPI Evidence Input
ops/kpi-evidence.md
10/20/2018, 1:46:40 AM5d7b2973d2927
ops/launch-reviewLaunch Review Packet
ops/launch-review.md
10/20/2018, 1:46:40 AM5880e41ebcbe3
ops/managed-linked-art-pilot-runbookManaged Linked Art Pilot Runbook
ops/managed-linked-art-pilot-runbook.md
10/20/2018, 1:46:40 AM11d4f125c2ddae
ops/otel-localLocal OpenTelemetry Wiring (Tempo / Jaeger)
ops/otel-local.md
10/20/2018, 1:46:40 AM51ebbc3b33f92
ops/outbox-projectorTransactional Outbox Projector (Postgres -> Solr/GraphDB)
ops/outbox-projector.md
10/20/2018, 1:46:40 AM5dc70ad766471
ops/procurement-readiness-packetProcurement Readiness Packet
ops/procurement-readiness-packet.md
10/20/2018, 1:46:40 AM9c5685e82cca7
ops/reconciliation-serviceReconciliation Service (C2)
ops/reconciliation-service.md
10/20/2018, 1:46:40 AM605162c313ea9
ops/search-graph-provisioningSolr 9 + GraphDB Provisioning
ops/search-graph-provisioning.md
10/20/2018, 1:46:40 AM6fc1b15279a84
ops/security-dr-drillPen Test Baseline + DR Drill Runbook
ops/security-dr-drill.md
10/20/2018, 1:46:40 AM3a766ef3e2afc
progress/2026-05-31/era-c-readiness-snapshotEra C Readiness Snapshot (May 31, 2026)
progress/2026-05-31/era-c-readiness-snapshot.md
10/20/2018, 1:46:40 AM39672614ceb53
progress/era-historyMeta Museum — Era Delivery History
progress/era-history.md
10/20/2018, 1:46:40 AM47cc030755d1e5
providers/harvard-art-museumsHarvard Art Museums API Integration Plan
providers/harvard-art-museums.md
10/20/2018, 1:46:40 AM11fa8b980154f5
providers/louvre-collections-jsonLouvre Collections JSON Integration Plan
providers/louvre-collections-json.md
10/20/2018, 1:46:40 AM11775f91a8d813
providers/nga-open-dataNational Gallery of Art (NGA) Open Data Integration Plan
providers/nga-open-data.md
10/20/2018, 1:46:40 AM1151c4807c8de0
providers/princeton-art-museumPrinceton University Art Museum API Integration Plan
providers/princeton-art-museum.md
10/20/2018, 1:46:40 AM11c8823f65ee41
providers/rkd-knowledge-graphRKD Knowledge Graph Integration Plan
providers/rkd-knowledge-graph.md
10/20/2018, 1:46:40 AM162b4b42f2ad42
providers/smithsonian-open-accessSmithsonian Open Access Integration Plan
providers/smithsonian-open-access.md
10/20/2018, 1:46:40 AM12db1ffa4cab02
providers/vanda-collections-apiVictoria and Albert Museum (V&A) Collections API Integration Plan
providers/vanda-collections-api.md
10/20/2018, 1:46:40 AM11755d93972233
qualityQuality & Performance
quality.md
10/20/2018, 1:46:40 AM6174add040960
reconciliation/exhibition-literature-reconciliationExhibition + Literature Reconciliation (B6.1)
reconciliation/exhibition-literature-reconciliation.md
10/20/2018, 1:46:40 AM7293e9d81dd7c
responsible-aiResponsible AI
responsible-ai.md
10/20/2018, 1:46:40 AM8f90006650821
risk-registerRisk Register
risk-register.md
10/20/2018, 1:46:40 AM4becb213d5c5e
roadmap-to-10Roadmap to 10/10
roadmap-to-10.md
10/20/2018, 1:46:40 AM1540a11000dc7e
roadmapMeta Museum Roadmap
roadmap.md
10/20/2018, 1:46:40 AM18145d0cbbe54a
rsi-wikiAI-RSI compounding wiki
rsi-wiki.md
10/20/2018, 1:46:40 AM8b64914fe6f20
wikibase-cloud-migration-checklistWikibase Cloud -> Self-Host Migration Checklist
wikibase-cloud-migration-checklist.md
10/20/2018, 1:46:40 AM12170657fcbf2b

    Current Document: LinkedArt SOTA Web App — Master Build Specification

    Source updated 10/20/2018, 1:46:40 AM · SHA-256 a5f0baca89c6 · 1155 lines

    Canonical ID: linked-art/LinkedArtSOTAWebApp

    JSON for this doc:/api/docs/content?path=linked-art/LinkedArtSOTAWebApp.md

    Human link:/docs?doc=linked-art%2FLinkedArtSOTAWebApp.md

    Canonical API endpoint:/api/docs/content?path=linked-art%2FLinkedArtSOTAWebApp.md

    Sections (stable anchors):

    LinkedArt SOTA Web App — Master Build Specification

    Audience: AI coding agents (Claude Code, Codex, Cursor, Aider) and human engineers building the next-generation Meta Museum / Linked Art platform.
    Status: Authoritative consolidation of every file in `LinkedArt/`. Treat this as the single source of truth. When this document conflicts with the older `LinkedArt*.md` notes, this document wins. When this document conflicts with `CLAUDE.md` / `AGENTS.md` (repo-level conventions), the repo-level conventions win for code-style only — the architectural targets here are aspirational and supersede the current monolithic `server.js` prototype over time.
    Reading time budget: ~45 min. Granular intentionally. Skim §0 → §3 → §27 first; everything else is reference depth.

    ---

    0. TL;DR for Coding Agents

    You are building a multi-modal, event-centric, hypermedia-driven cultural-heritage discovery platform with the following non-negotiables:

    1. Linked Art JSON-LD is the canonical data layer. Everything serializes to / from it.
    1. CIDOC-CRM 7.1 (RDFS encoding) + Linked Art profile. Event-centric — never collapse production / acquisition / exhibition into flat fields.
    1. Hybrid storage: pre-materialized JSON-LD "Record" documents + RDF triple index + full-text/faceted index (Solr or OpenSearch) + vector index. Translating SPARQL graph queries into faceted UI results is non-trivial and requires this dual approach. (Yale LUX pattern — see §8.)
    1. HAL hypermedia API — clients follow named `_links`, they do not construct SPARQL. API design is progressive enhancement: simple, human-readable JSON-LD payloads first; full machine-readable graph network on opt-in.
    1. Shortcut triples materialized at ingest time so runtime queries are single-hop. (See §20.1.)
    1. Rights-aware by default — every record, image, and field carries reuse status (RightsStatements.org + CC).
    1. Component library is data-pattern-driven, not domain-typed. A `<LinkedDate>` renders for a fossil, a painting, or a book identically.
    1. Reconciliation against Getty AAT/ULAN/TGN, PeriodO, VIAF, Wikidata, GeoNames, FAST, LoC is part of the ingest path, not an afterthought. External authority dumps are downloaded and cached locally — never queried on the fly at request time.
    1. NL → SPARQL / NL → HAL with grounded Graph-RAG is the primary discovery affordance for non-technical users. Visual SPARQL builders (Sparnatural-style) are the secondary affordance.
    1. WCAG 2.1 AA, i18n via `language`-tagged `Name` objects, and 99.9% uptime are gates, not stretch goals.
    1. 10 primary entity boundaries, hard. 5 main: `HumanMadeObject`, `LinguisticObject`, `VisualItem`, `DigitalObject`, `Set`. 5 connected/authority: `Person`, `Group`, `Place`, `Type` (Concepts), `Activity` (macro-events incl. provenance, destruction). (See §4.9.)
    1. Modeling anti-patterns are forbidden. No multiple instantiation (use partitioning into sub-activities — see §4.10). No blank nodes — every entity gets a minted URI. Always separate carrier (HumanMadeObject) from content (VisualItem / LinguisticObject) from surrogate (DigitalObject).

    Existing repo (`server.js`, `src/adapters/getty.js`, `src/adapters/met.js`, `public/app.js`) is the MVP prototype. Do not prematurely migrate to Next.js (see `README.md` line ~107). Migrate route-by-route only once the contracts in §5 are stable.

    ---

    1. Product Vision & Scope

    1.1 Mission

    Unify cultural-heritage data across museums, libraries, archives, and natural-history collections into a single discoverable, queryable, and machine-actionable graph using the Linked Art profile of CIDOC-CRM. Cover ~90% of common use cases with ~10% of the ontology complexity.

    1.2 In-Scope (MVP → v1)

    • Ingest from JSON APIs, CSV, XML, MARC, IIIF manifests.
    • Normalize to Linked Art JSON-LD with strict pattern conformance.
    • Reconcile entities to Getty AAT/ULAN/TGN, Wikidata, VIAF, GeoNames, LoC.
    • Public discovery UI: faceted search, entity hubs, timelines, maps, network graphs.
    • Curator tools: form-based editor, visual ETL mapper, graph debugger.
    • Public read API: REST + JSON-LD + HAL + SPARQL + GraphQL.
    • AI layer: NL→Query, Graph-RAG chat with verifiable CIDOC-CRM citations.
    • IIIF integration: deep-zoom, side-by-side compare, IIIF Change Discovery sync.
    • ActivityStreams 2.0 syndication for downstream aggregators.

    1.3 Out-of-Scope (explicitly)

    • Replacing institutional CMS (TMS, EMu, Adlib, Qi, CollectionSpace). This is a downstream publication layer.
    • Image hosting — defer to IIIF Image API servers (Cantaloupe, IIPImage, Loris).
    • Native audio/video ingestion in v1 (metadata only).
    • Full CIDOC-CRM implementation. Linked Art profile only.
    • Upstream collection management workflows.
    • Full ML model training (use off-shelf embeddings & fine-tunes).

    1.4 Scale Targets (Yale LUX-derived)

    • Phase 3 target: ~15M metadata records.
    • Phase 5 target: ~45M total entities (records × authority + reconciled sameAs clusters).
    • Reference production sizing: 3× 16-vCPU nodes (AWS m6i.4xlarge equivalent) for the storage tier delivers sub-second p95 for the common query set. Scale horizontally; the storage tier is the bottleneck, not the API gateway.
    • Triple count at the upper bound approaches ~2B triples once shortcut predicates (§20.1) are materialized — plan storage IOPS accordingly.

    1.5 Personas (build-driving)

    | Persona | Primary need | Tool surface |

    |---|---|---|

    | Museum Curator / Cataloger | Publish authoritative metadata without writing JSON-LD | Form Builder, Visual ETL Mapper |

    | Digital Humanities Researcher | Cross-collection graph queries, provenance chains | Sparnatural-style builder, NL→SPARQL, Notebook exports |

    | Software Developer | Predictable REST + JSON-LD APIs | HAL-linked endpoints, GraphQL, OpenAPI/SHACL contracts |

    | Public Visitor / Student | Visual exploration, no graph literacy required | Discovery SPA, IIIF deep-zoom, timelines, storytelling |

    | AI Agent (claude/codex) | Programmatic ingest, validation, content generation | Agent endpoints, contract-validated I/O, dry-run modes |

    ---

    2. Reference Architecture (Yale-LUX-derived, modernized)

    
                ┌──────────────────────────────────────────────────────┐
                │  External Sources                                    │
                │  Getty REST/IIIF/SPARQL · Met Open Access · Harvard  │
                │  SAAM · Walters · YCBA · Rijks · Europeana · British │
                │  Museum · AIC · V&A · Tate · MoMA · CMA · LoC ·      │
                │  Wikidata · DBpedia · VIAF · GeoNames · PeriodO      │
                └────────────┬─────────────────────────────────────────┘
                             │ IIIF Change Discovery (W3C ActivityStreams 2.0)
                             │ + REST polling + SPARQL federation
                             ▼
                ┌──────────────────────────────────────────────────────┐
                │  Ingestion (Python)                                  │
                │  Dagster / Airflow / Temporal orchestrator           │
                │  ├─ Extract  → MinIO/S3 raw lake (immutable)         │
                │  ├─ Profile  → Great Expectations + spaCy            │
                │  ├─ Map      → Crom / PyLD / custom mappers          │
                │  ├─ Reconcile→ RecordLinkage + sentence-transformers │
                │  └─ Materialize "Record" + shortcut triples          │
                │  State: PostgreSQL (jobs/lineage) + Redis (URI cache)│
                └────────────┬─────────────────────────────────────────┘
                             │ JSON-LD records + N-Quads
                             ▼
                ┌──────────────────────────────────────────────────────┐
                │  Multi-Modal Storage                                 │
                │  ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐  │
                │  │MarkLogic│ │  Triple  │ │Elastic / │ │ pgvector │  │
                │  │  -OR-   │ │  store   │ │OpenSearch│ │/Qdrant/  │  │
                │  │ GraphDB │ │ (Fuseki/ │ │  (text + │ │ Weaviate │  │
                │  │  +PG    │ │  Stardog)│ │  facet)  │ │ (vectors)│  │
                │  └─────────┘ └──────────┘ └──────────┘ └──────────┘  │
                │  All four stay in sync via Kafka/NATS event bus      │
                └────────────┬─────────────────────────────────────────┘
                             │ Static JSON-LD docs (CDN-cacheable)
                             │ HAL _links · SPARQL · GraphQL · ActivityStreams
                             ▼
                ┌──────────────────────────────────────────────────────┐
                │  API Gateway (Kong / Tyk / Cloudflare)               │
                │  · OAuth2/OIDC (Keycloak) · API keys · rate-limit    │
                │  · CORS · OpenAPI 3.1 · WAF · request tracing        │
                └────────────┬─────────────────────────────────────────┘
                             ▼
                ┌──────────────────────────────────────────────────────┐
                │  Application Services                                │
                │  · Discovery API     (Node/NestJS or Python/FastAPI) │
                │  · NL→Query Agent    (LLM + schema embeddings)       │
                │  · Graph-RAG Chat    (vector + triple + citations)   │
                │  · Validation API    (PySHACL + JSON Schema)         │
                │  · Reconciliation API(Getty / VIAF / Wikidata)       │
                │  · IIIF Proxy        (Cantaloupe + auth)             │
                │  · ActivityStreams   (publishing endpoint)           │
                └────────────┬─────────────────────────────────────────┘
                             ▼
                ┌──────────────────────────────────────────────────────┐
                │  Frontends (Next.js 15 App Router + TS)              │
                │  · Public SPA (discovery, IIIF, timelines, maps)     │
                │  · Curator Console (Form Builder, ETL Mapper)        │
                │  · Researcher Workspace (Graph Explorer, NB exports) │
                │  · Agent Workspace (job runner, validation reports)  │
                │  · Meta Wiki Art bridge (MediaWiki publish layer)    │
                │  Module-federated micro-frontends, shared component  │
                │  library (`@metamuseum/linked-art-ui`)               │
                └──────────────────────────────────────────────────────┘
    

    2.1 Architectural Invariants

    1. Adapters do not import each other. Bridge via `src/utils/artwork-builder.js` (already enforced — keep it).
    1. Contracts are leaf modules. No deps on adapters or utils.
    1. Provider logic stays in adapters. `server.js` (or future API gateway) only orchestrates.
    1. Source snapshots are immutable. `_source.raw` preserves the original provider response — transform at read time, never mutate stored records in place.
    1. Linked Art JSON-LD is canonical. UI DTOs (`Artwork`) are separate; map at the boundary.
    1. Every write path is idempotent and reprocessable from any stage. No destructive transformations without a recoverable upstream artifact in MinIO/S3.

    ---

    3. Tech Stack (concrete, version-pinned)

    3.1 Languages & Runtimes

    • Node.js ≥ 22 LTS — public API gateway and lightweight services (continuity with current `server.js`).
    • Python 3.12+ — ETL, reconciliation, validation, LLM tooling.
    • TypeScript 5.4+ — all new frontend and any new TS services. `strict: true` non-negotiable.

    3.2 Frontend

    • Next.js 15+ (App Router, RSC). SSR for entity pages, SSG for vocabularies, ISR for facets.
    • React 19.
    • Styling: Tailwind CSS 4 (preferred) or Chakra UI 3. Pick one per project, do not mix.
    • State: TanStack Query 5 for server state; Zustand for local UI state. Avoid Redux for v1.
    • Forms: React Hook Form + Zod schemas mirroring the contracts in §5.
    • Visualization:
    • `d3` 7.x for bespoke chart primitives.
    • `@visx/*` for composable charts.
    • `cytoscape` + `cytoscape-cose-bilkent` for force-directed graphs.
    • `reactflow` 11.x for the Visual ETL Mapper drag-and-drop canvas.
    • `react-leaflet` 4.x + `leaflet` 1.9.x (or `mapbox-gl` 3.x) for maps. Prefer Leaflet for no-account-required dev.
    • `openseadragon` 4.x for IIIF deep-zoom; wrap in `<IIIFCanvasViewer>`.
    • `vis-timeline` 7.x or `@nivo/timeline` for chronologies.
    • IIIF helpers: `manifesto.js`, `@iiif/presentation-3` types.
    • Testing: Vitest + Testing Library + Playwright for e2e + `axe-core` for a11y.

    3.3 Backend Services

    • Python: FastAPI 0.110+, Pydantic v2, uvicorn, httpx, tenacity, structlog.
    • Linked Art libs: `cromulent` (Crom) for instantiation, `PyLD` for JSON-LD framing/compaction/expansion, `rdflib` 7.x for triple manipulation, `pyshacl` for SHACL.
    • Node: NestJS 10 (if you choose Node for the gateway) or stay vanilla-Node per current `server.js` conventions until contracts are stable.

    3.4 Storage

    • Primary: choose one of:
    • MarkLogic (commercial, Yale LUX-grade — multi-model out of box, CTS query language for hybrid text+graph).
    • GraphDB (Ontotext, free Community Edition, SPARQL 1.1 + Lucene plugin) — recommended for OSS path.
    • Stardog (commercial, strong reasoning).
    • Apache Jena Fuseki + Lucene for pure-OSS minimal setups.
    • Document store (if NOT MarkLogic): PostgreSQL 16 + JSONB for materialized `Record` documents and operational state (jobs, lineage, RBAC).
    • Search / facets: Apache Solr 9.x (the Yale LUX choice, strongest JSON-LD-friendly faceting + nested document support), OpenSearch 2.x, or Elasticsearch 8.x. Stores `Record`-derived flattened search docs. Pick one per environment. Default for the OSS LUX-aligned path: Solr 9.x.
    • Vectors: pgvector (simple) or Qdrant / Weaviate (scale). Stores embeddings for `LinguisticObject` text, `VisualItem` image descriptors, and entity summaries.
    • Cache: Redis 7 — URI concordance (auth-string → URI), facet results, HAL link cache.
    • Object storage: MinIO (dev/self-host) or S3 for raw extracts, ETL outputs, image originals, BagIt audit bundles.

    3.5 Pipeline & Eventing

    • Orchestrator: Dagster (preferred — asset-centric, strong lineage UI) or Apache Airflow 2.9+. For long-running curator workflows that need durable retries, Temporal.
    • Stream: Kafka (Confluent or Redpanda) or NATS JetStream for ingest → reconcile → publish events.
    • Validation: PySHACL + Great Expectations.
    • Reconciliation: `recordlinkage`, `rapidfuzz`, `sentence-transformers` (all-MiniLM-L6-v2 baseline; instructor-xl for higher fidelity).

    3.6 AI / LLM Stack

    • Primary model: Claude (Sonnet 4.6 default; Opus 4.7+ for complex SPARQL synthesis).
    • Embeddings: `voyage-3` (cultural-heritage-friendly), fallback `text-embedding-3-large`. Image embeddings via SigLIP or OpenCLIP for "find similar" by IIIF region.
    • Agents: Claude Agent SDK / LangGraph. Tool-use endpoints described in §10.
    • Guardrails: every NL→SPARQL must return (a) the SPARQL it generated, (b) the resulting triples, (c) the citation paths in CIDOC-CRM terms. No free-text claims without graph backing.

    3.7 IIIF

    • Image server: Cantaloupe 5.x.
    • Manifest serving: static (S3 + CDN) or on-the-fly via `iiif-prezi3` (Python).
    • Spec conformance: Image API 3.0, Presentation API 3.0, Change Discovery API 1.0, Content State API 1.0.

    3.8 Auth, Ops, CI/CD

    • Auth: Keycloak (OIDC) — Admin, Data Engineer, Curator, Educator, Public roles. Tenant-aware.
    • Containers: Docker + Docker Compose for dev; Helm charts on Kubernetes (DOKS / EKS / GKE) for prod.
    • Observability: OpenTelemetry → Grafana Tempo (traces) + Prometheus (metrics) + Loki (logs).
    • CI/CD: GitHub Actions. Required checks: `npm test`, `pytest`, SHACL validation of fixture records, OpenAPI lint, Playwright smoke, axe a11y.
    • Feature flags: Unleash (OSS) or GrowthBook.
    • Secrets: Doppler or HashiCorp Vault. Never commit `.env`.

    ---

    4. Linked Art Patterns (the canonical 6)

    Every entity emitted by this system MUST conform to these patterns. The validator (§17) rejects anything that violates them.

    4.1 Activity / Event Pattern

    Never link Object → Person directly. Always Object → Activity → Person.

    
    {
      "id": "https://lod.example.org/object/123",
      "type": "HumanMadeObject",
      "_label": "The Starry Night",
      "produced_by": {
        "type": "Production",
        "carried_out_by": [{ "id": "http://vocab.getty.edu/ulan/500115493", "type": "Person", "_label": "Vincent van Gogh" }],
        "took_place_at": [{ "id": "http://vocab.getty.edu/tgn/7008038", "type": "Place", "_label": "Saint-Rémy-de-Provence" }],
        "timespan": {
          "type": "TimeSpan",
          "begin_of_the_begin": "1889-06-01T00:00:00Z",
          "end_of_the_end":   "1889-06-30T23:59:59Z",
          "identified_by": [{ "type": "Name", "content": "June 1889" }]
        },
        "technique": [{ "id": "http://vocab.getty.edu/aat/300053349", "type": "Type", "_label": "oil painting (technique)" }]
      }
    }
    

    4.2 Classification Pattern

    Type with `classified_as` against controlled vocabularies (Getty AAT primary). Distinguish from `broader` (hierarchical placement) and `type` (CRM class).

    4.3 Core Properties Pattern

    Every entity MUST have:

    • `id` — stable, dereferenceable URI.
    • `type` — CIDOC-CRM class (Linked Art profile).
    • `_label` — human-readable summary, language-undeclared default.
    • `identified_by` — at least one `Name` (with `language` if known) and any `Identifier` (accession nums, ULAN IDs, etc.).

    4.4 Statement Pattern

    When you cannot fully structure a fact, attach a `LinguisticObject` to express it as text. Always classify the statement with an AAT term (e.g., `description`, `provenance statement`, `materials statement`).

    4.5 TimeSpan Pattern

    ISO 8601 (`xsd:dateTime`) in `begin_of_the_begin` / `end_of_the_end`, plus a `Name` for display. For uncertainty, widen the span; for fuzzy boundaries, also populate `end_of_the_begin` and `begin_of_the_end`.

    4.6 Expansion Pattern

    Keep primary data simple. Add nuance with `AttributeAssignment` (e.g., previous attributions, contested authorship) so consumers can ignore detail safely.

    4.7 Uncertainty & Contested Attributions

    Use `AttributeAssignment` with `assigned_property_type` and a confidence statement. Never overwrite the primary attribution; layer alternatives.

    4.8 Multilingual Names

    Multiple `Name` objects each with `language` (referenced via AAT or ISO 639). Default UI label resolution: user locale → `en` → first.

    4.9 The 10 Primary Entity Boundaries (hard rule)

    The graph is segmented into exactly 10 entity types. The validator (§17) rejects records that smuggle data into the wrong boundary.

    Main entities (5):

    • `HumanMadeObject` — physical holdings (paintings, sculptures, books-as-objects, specimens, monuments).
    • `LinguisticObject` — texts and transcribed recordings (the work, not the carrier).
    • `VisualItem` — image content / iconography depicted (the work, not the carrier).
    • `DigitalObject` — surrogates: IIIF manifests, JPGs, PDFs, born-digital files.
    • `Set` — archival subdivisions and curated collections (nested for arrangement vs location).

    Connected / authority entities (5):

    • `Person` — individuals.
    • `Group` — organizations, families, studios, firms.
    • `Place` — geospatial entities with coordinates (TGN/GeoNames/PeriodO-grounded).
    • `Type` (Concepts) — AAT-aligned classifications.
    • `Activity` — macro-events: production, acquisition, exhibition, provenance transfers, destruction, conservation.

    Archives = nested `Set` objects separating physical box from intellectual arrangement. Natural-history specimens map to `HumanMadeObject` for cross-domain UI consistency. A `Person` and a `Group` are never collapsed into one record — even sole-proprietor studios.

    4.10 Anti-Patterns (forbidden)

    No multiple instantiation. Do not type a node as multiple unrelated classes. Instead partition into sub-activities. A provenance event becomes two children: `TransferOfCustody` + `TransferOfTitle`, each its own `Activity`. A composite acquisition becomes `Purchase` + `Loan` partitioned events.

    No blank nodes. Every entity — even an intermediate `TimeSpan`, `Name`, `Identifier`, or sub-`Activity` — gets a minted URI. This is what lets two records referring to the same exhibition align. Validator rejects any node lacking an `id`.

    No carrier/content collapse. The physical painting (`HumanMadeObject`) is never the depicted scene (`VisualItem`) and is never the digital JPG (`DigitalObject`). Link with `shows` / `digitally_shown_by` / `digitally_carries`.

    No silent reconciliation overrides. Never replace a curator-asserted URI in place. Add an `AttributeAssignment` recording the alternative.

    4.11 Human vs Machine Dates

    Every `TimeSpan` carries both:

    • Machine-queryable bounds: `begin_of_the_begin` + `end_of_the_end` (and optionally `end_of_the_begin` + `begin_of_the_end` for fuzzy boundaries), all `xsd:dateTime` ISO 8601.
    • Human display: a `Name` with the original phrasing ("circa 1873", "late 19th century", "after 1492").

    Never attempt to mathematically quantify the uncertainty of "circa" — preserve the phrase, widen the bounds to a sensible range (e.g., ±5 years for "circa"), and let the UI render both.

    ---

    5. Runtime Contracts (extend the existing `src/contracts/`)

    Keep the current convention: plain JS objects with `create()` and `validate()` — not a schema library. Add Zod mirrors for TS clients.

    5.1 Required contracts

    • `Artwork` (UI DTO — already exists)
    • `SourceRecord` (provider envelope — already exists)
    • `RightsReport` (already exists; extend with RightsStatements.org IDs and CC licenses)
    • `ImportJob` (already exists)
    • NEW `Entity` — discriminated union of `HumanMadeObject | Person | Group | Place | Type | Activity | Period | DigitalObject | LinguisticObject | VisualItem | Set`.
    • NEW `MappingTemplate` — Visual ETL mapper output, JSON-serializable rules.
    • NEW `ReconciliationCandidate` — `{ string, candidates: [{ uri, label, source, score, evidence }] }`.
    • NEW `ProvenanceEvent` — `Acquisition | TransferOfCustody | Activity` shape with buyer, seller, place, timespan, price.
    • NEW `Citation` — `{ text, source: { id, type, _label }, page?, locator? }`.
    • NEW `WikiDraft` — MediaWiki markup + citation rollup + agent review state.
    • NEW `AgentTask` — input contract, output contract, dry-run flag, audit trail.
    • NEW `ValidationReport` — `{ entityId, shaclResults[], jsonSchemaResults[], businessRules[], severity, suggestedFixes[] }`.
    • NEW `PROVRecord` — PROV-O serialization of every pipeline activity.

    5.2 Contract rules

    • Contracts MUST NOT import from `src/adapters/` or `src/utils/`. (Already enforced.)
    • Every contract exports `create<Name>`, `validate<Name>`, and a Zod schema in a parallel `.ts` file under `src/contracts/zod/`.
    • Adding a field: bump a `schemaVersion` in the contract, and write a migration in `src/utils/migrations/`.

    5.3 Backward-compatibility migration

    Use `normalizeIncomingRecord(source)` (already in repo) for any field shape change. Never write destructive migrations.

    ---

    6. Vocabularies & Authority Files (URI cheat sheet)

    | Authority | URI pattern | Access |

    |---|---|---|

    | Getty AAT | `http://vocab.getty.edu/aat/{id}` | SPARQL: `https://data.getty.edu/vocab/sparql` · OpenRefine reconcile · N-Triples dumps |

    | Getty ULAN | `http://vocab.getty.edu/ulan/{id}` (agent: `…/ulan/{id}-agent`) | same as AAT |

    | Getty TGN | `http://vocab.getty.edu/tgn/{id}` (place: `…/tgn/{id}-place`) | same |

    | Getty CONA | `http://vocab.getty.edu/page/cona/{id}` | API only (not full LOD yet) |

    | Getty Iconography | (via CONA Web Services) | API |

    | VIAF | `https://viaf.org/viaf/{id}` | REST + SRU, RDF/N-Triples dumps, no public SPARQL |

    | GeoNames | `http://sws.geonames.org/{id}/` | REST + RDF |

    | Wikidata | `http://www.wikidata.org/entity/Q{id}` | SPARQL: `https://query.wikidata.org/sparql` |

    | LoC NAF | `http://id.loc.gov/authorities/names/{id}` | REST + RDF |

    | Iconclass | `http://iconclass.org/{notation}` | REST + RDF |

    | PeriodO | `http://n2t.net/ark:/99152/p0{id}` | REST + JSON-LD |

    | RightsStatements.org | `http://rightsstatements.org/vocab/{code}/1.0/` | static RDF |

    | Creative Commons | `https://creativecommons.org/licenses/{code}/{version}/` | static |

    License notes:

    • Getty Vocabularies: ODC-By 1.0 (attribution required). XML/Relational formats retired end-2025 — must use LOD path.
    • VIAF: ODC-By 1.0. Dumps frozen as of Aug 2024 during OCLC infra work; plan a fallback to Wikidata `P214`.
    • Wikidata: CC0.
    • GeoNames: CC-BY 4.0.

    6.1 Reconciliation routing rules

    • People: ULAN (primary) → VIAF → Wikidata → LoC NAF. Merge via `owl:sameAs` cluster.
    • Places: TGN → PeriodO (for historical-period place names) → GeoNames → Wikidata. Capture coordinates; flag historical-vs-modern boundary ambiguity.
    • Concepts/Materials/Techniques: AAT → FAST → Wikidata. Iconclass for iconographic subjects.
    • Cultural objects (depicted works, monuments): CONA → Wikidata.
    • Bibliographic: LoC NAF → VIAF → FAST.

    6.2 Local-cache rule (performance + reliability)

    Never query an external authority on the request path. Download authority dumps (Wikidata, LoC, Getty N-Triples, GeoNames) into local storage on a scheduled job; reconciliation reads from the local copy. Refresh cadence: Getty monthly, Wikidata weekly delta, LoC monthly, GeoNames weekly. Track dump version in Postgres so a reconciliation result can be reproduced.

    6.3 Graph Crawling Enrichment (hierarchy generation)

    After a first authority match (say, an LoC subject heading), the pipeline crawls outward through `owl:sameAs` / `skos:exactMatch` / `wdt:P244` to collect equivalent URIs across authorities (FAST, Wikidata, AAT). It then walks `skos:broader` / `wdt:P131` (place-in-place) / `gvp:broader` to materialize the hierarchical context (e.g., Yosemite → California → Western United States → North America). These broader-context URIs are written into the materialized `Record` so faceted UIs (concertina lists, regional rollups) can be served without runtime traversal.

    ---

    7. ETL Pipeline (Extract → Load → Transform → Reconcile → Publish)

    Follow ELT, not ETL: preserve raw originals before any transformation. (`alation` best-practice, confirmed by Yale LUX.)

    7.1 Pipeline DAG (Dagster assets)

    
    raw_extract (IO: MinIO/S3, immutable, hashed)
       │
       ▼
    profile (Great Expectations + spaCy entity extraction)
       │
       ▼
    canonical_intermediate (JSON, schema-validated, idempotent ID minting)
       │
       ▼
    mapped_to_linked_art (Crom + PyLD, pattern-conformant JSON-LD)
       │
       ▼
    reconciled (RecordLinkage + sentence-transformers + authority APIs)
       │              \
       │               > queue for human review if confidence < 0.90
       ▼
    validated (SHACL + JSON Schema + business rules)
       │
       ▼
    materialized_record (denormalized JSON-LD doc + shortcut triples)
       │              \
       │               > emit ActivityStream item
       ▼
    indexed (OpenSearch + triple store + vector store)
       │
       ▼
    published (CDN-cacheable static JSON-LD, ActivityStream entry)
    

    Each asset is idempotent. Re-running from any stage MUST NOT create duplicates. Use SHA-256 of canonical input as the dedupe key.

    7.2 ID minting policy

    • Prefer the authority URI if reconciled.
    • Otherwise mint `{base}/{type-slug}/{ulid}`.
    • Never reuse IDs across re-ingests of the same source row — use a `sameAs` cluster instead.

    7.3 Confidence thresholds

    • ≥ 0.95 auto-approve.
    • 0.85 – 0.95 auto-approve with `ReviewFlag` and weekly digest to curators.
    • 0.70 – 0.85 queue for human review in Curator Console.
    • < 0.70 drop the candidate, log evidence for retraining.

    7.4 Synchronization

    • Pull updates from sources that expose IIIF Change Discovery (W3C ActivityStreams 2.0) — walk backwards through `prev` pages until you reach your last-seen cursor. Persist cursor in PostgreSQL.
    • For sources without Change Discovery (most Met / Harvard / AIC REST APIs), poll on a Dagster schedule with `If-Modified-Since` or compare hash of `_source.raw`.

    7.5 Error handling

    • Every stage emits a structured error to Postgres `pipeline_errors` with `entityId`, `stage`, `error`, `inputHash`, `agentVersion`.
    • Retry policy: exponential backoff, max 5 attempts, then human queue.
    • Never drop a record silently. Failure = visible.

    ---

    8. Storage Strategy (multi-modal)

    8.1 The "Record" abstraction (load-bearing)

    Yale LUX's key insight: do not construct sub-graphs on the fly. At materialization time, build a self-contained JSON-LD document per entity that includes everything the UI needs in one fetch:

    • Core entity fields.
    • Pre-resolved labels for all referenced URIs.
    • HAL `_links` for navigation.
    • Shortcut properties for common queries (see §20.1).
    • Optional `_embedded` summaries for hot relationships (top 5 productions for a person, etc.).

    These documents are cached aggressively (CDN, browser, service worker).

    8.2 Triple store

    Indexes the RDF projection of every `Record` so SPARQL queries can traverse relationships. Use named graphs per source institution for provenance partitioning. Reasoning kept minimal (RDFS + SHACL); no full OWL DL at runtime.

    8.3 Search index

    Flatten each `Record` into a denormalized document for OpenSearch with:

    • `text_all` — concatenation of all `LinguisticObject` and `Name` content.
    • Facets: `type`, `classified_as.id[]`, `produced_by.timespan.year`, `produced_by.took_place_at.id[]`, `current_owner.id[]`, `material.id[]`, `rights.code`, `image_available`, `source_institution`.
    • Bidirectional pointers for "follow your nose" navigation.

    8.4 Vector index

    • One collection per entity type.
    • Embed `_label` + top-N classifications + statement texts (truncated to 512 tokens).
    • Separately embed IIIF thumbnails via SigLIP for visual similarity.
    • Always store the source URI as payload so vector hits resolve to canonical records.

    8.5 Operational state (PostgreSQL)

    • `jobs` — ImportJobs with status, error counts, lineage references.
    • `mappings` — versioned MappingTemplate rows.
    • `reconciliation_queue` — pending human reviews.
    • `audit_log` — append-only, hash-chained.
    • `users` / `roles` / `org_tenants`.

    ---

    9. API Design

    9.1 Endpoint shape (REST + JSON-LD + HAL)

    | Resource | Path | Verb | Notes |

    |---|---|---|---|

    | Object | `/api/objects/{id}` | GET | Returns Linked Art `HumanMadeObject` + HAL `_links` |

    | Object collection | `/api/objects` | GET | `?type=`, `?classified_as=`, `?date=`, `?place=`, `?material=`, `?source=`, `?page=`, `?expand=production,provenance` |

    | Person/Group | `/api/agents/{id}` | GET | |

    | Event | `/api/events/{id}` | GET | Activities, Productions, Acquisitions, Exhibitions |

    | Place | `/api/places/{id}` | GET | |

    | Concept | `/api/concepts/{id}` | GET | AAT-aligned types |

    | Set | `/api/sets/{id}` | GET | Archival groupings |

    | Search | `/api/search` | GET, POST | Mirrors OpenSearch facets; returns HAL `_links` with next/prev |

    | SPARQL | `/api/sparql` | POST | Read-only, rate-limited, query whitelist optional |

    | NL Query | `/api/ai/query` | POST | NL → HAL link or SPARQL, with explanations |

    | Reconcile | `/api/reconcile` | POST | OpenRefine reconciliation service spec |

    | Ingest | `/api/ingest/jobs` | POST | Trigger Dagster run |

    | Validate | `/api/validate` | POST | JSON Schema + SHACL |

    | Activity Stream | `/api/activity` | GET | W3C ActivityStreams 2.0, paged |

    | Best practices | `/api/best-practices` | GET | (Already exists — keep.) |

    | Roadmap | `/api/roadmap` | GET | (Already exists.) |

    | Health | `/api/health` | GET | (Already exists.) |

    Existing routes in `server.js` (`/api/explorer/`, `/api/getty/`, `/api/met/`, `/api/linked-art/`, etc.) are the MVP prototype. Keep them working. Layer new routes alongside per the §27 migration plan.

    9.2 HAL conventions

    Every response includes:

    
    {
      "_links": {
        "self":              { "href": "/api/objects/123" },
        "production":        { "href": "/api/objects/123/production" },
        "provenance":        { "href": "/api/objects/123/provenance" },
        "iiif_manifest":     { "href": "https://iiif.example.org/3/123/manifest.json" },
        "search_works_by_agent": { "href": "/api/search?agent=ulan:500115493" },
        "json_ld":           { "href": "/api/objects/123.jsonld", "type": "application/ld+json" }
      }
    }
    

    Clients MUST prefer following links over constructing query strings. This is the Yale LUX decoupling.

    9.3 Content negotiation

    Serve `application/ld+json` with the Linked Art `@context` by default. Also respect:

    • `text/turtle`
    • `application/n-quads`
    • `application/rdf+xml`
    • `application/activity+json` for ActivityStreams pages
    • `application/hal+json` for HAL-only summaries

    9.4 Pagination

    W3C ActivityStreams "Paged Collection" pattern for everything paginated. `first`, `last`, `next`, `prev`, `current`, `partOf`.

    9.5 Graph expansion

    `?expand=production,provenance,exhibitions` returns embedded sub-resources. Hard cap depth = 2. Beyond that, clients must follow links.

    9.5.1 Progressive enhancement principle

    The API ramps in complexity:

    • Default tier — simple, flat, human-presentable JSON-LD. Pre-resolved labels. The kind of payload a junior developer can `console.log` and ship.
    • `?profile=expanded` — fully expanded JSON-LD with nested entities (depth ≤ 2).
    • `?profile=graph` — full RDF graph context for machine consumers.
    • `/api/sparql` — direct graph access for power users.

    Documented in OpenAPI 3.1. Default responses MUST be usable without reading the Linked Art spec — that is the bar.

    9.6 GraphQL

    Optional but recommended for the curator console. Schema mirrors the contracts in §5. Use Apollo or `graphql-yoga` with persisted queries to control surface area.

    9.7 Security & governance

    • API keys for non-anonymous tiers.
    • Rate limits: 100 req/min anonymous; 1000 req/min keyed; 60 SPARQL/min keyed.
    • CORS: whitelist per environment.
    • SPARQL query timeout: 10s default, 60s for keyed `researcher` tier.
    • Read-only by default. Write endpoints behind OIDC + RBAC.
    • Audit log: every write goes to append-only Postgres + S3 mirror (BagIt for export).

    ---

    10. AI / LLM Layer

    10.1 NL → SPARQL / NL → HAL agent

    1. Schema embeddings: precompute embeddings for every AAT/ULAN/TGN term used in the corpus + the CIDOC-CRM property list. Store in pgvector.
    1. Retrieval: query NL question → top-k schema elements (classes + properties).
    1. Prompt assembly: ground the LLM with retrieved schema, examples (few-shot), and the user's facet context.
    1. Generate: produce intermediate AST → SPARQL query OR a HAL link if a prebuilt query exists.
    1. Validate against SHACL before execution.
    1. Execute, returning rows + provenance trail.
    1. Refinement loop: if empty results, the agent re-plans (loosen filters, switch authority, federate via `SERVICE`).

    10.2 Graph-RAG chat

    • Hybrid retrieval: vector hit on text + 1-hop graph expansion.
    • Generate answer with mandatory citations: each sentence carries `[entityId, propertyPath]` references resolvable to source records.
    • No claim ungrounded by a triple or a `LinguisticObject` statement.

    10.3 LLM-assisted mapping

    For Visual ETL Mapper: when a user uploads a new CSV, an agent proposes a mapping template by matching column headers + sample values against past `MappingTemplate` rows and Linked Art property semantics. User accepts / edits — no auto-publish.

    10.4 LLM-assisted reconciliation

    Used as tiebreaker only. The agent receives the top-5 candidates from RecordLinkage + sentence-transformers and explains the choice. Confidence still must clear thresholds (§7.3).

    10.5 Safety

    • All agent calls are `AgentTask` contracts: input + output + dry-run.
    • `dryRun=true` is the default for any write-side agent. UI surfaces a diff before commit.
    • Cite or refuse. Hallucinated content is a P0 bug.

    ---

    11. Frontend Architecture

    11.1 App shell

    • Next.js 15 App Router.
    • Module federation (`@module-federation/nextjs-mf` 8.x) for micro-frontend composition: Public SPA, Curator Console, Researcher Workspace, Agent Workspace are independently buildable.
    • Shared component library `@metamuseum/linked-art-ui` published to a private npm registry (Verdaccio for dev, GitHub Packages for prod).

    11.2 Routing

    • Public SPA mirrors the existing `app.js` views (`Home`, `Explore`, `ArtworkDetail`, `Stories`, `Compare`, `About Data`) — keep the `history.pushState` semantics. URLs:
    • `/artwork/{id}`, `/agent/{id}`, `/place/{id}`, `/concept/{id}`, `/set/{id}`, `/event/{id}`.
    • Server falls back to `index.html` for SPA routing — already in place.

    11.3 Data fetching

    • TanStack Query with HAL link following: `useEntity(id)` returns the record + a `follow(linkName)` helper.
    • Cache key includes the entity URI + facet hash so list views and detail views share cache.
    • Persistent cache via `IDBPersist` so the SPA is usable offline-first within a session.

    11.4 Accessibility

    • WCAG 2.1 AA. Lighthouse + axe-core in CI.
    • Keyboard navigation on all interactive surfaces (already a project quality gate per `README.md`).
    • ARIA live regions for facet updates and search progress.
    • High-contrast theme.
    • Reduced-motion preference respected for timelines and force-directed graphs.

    11.5 Internationalization

    • `next-intl` for app chrome.
    • Entity labels resolved at render via §4.8 rules.
    • Vocabulary terms via AAT preferred labels in the user's locale; fallback to `en`.

    ---

    12. UI Widget Catalog (`@metamuseum/linked-art-ui`)

    Build in this order. Atomics first, macros last. Every widget MUST work with any `Entity` from §5.1 without bespoke domain logic.

    12.1 Atomic

    • `<LinkedDate timespan>` — `TimeSpan` → readable date / range / "Circa 19th C." Falls through `display_label` first.
    • `<LinkedDimensions dimensions[]>` — flattens `Dimension[]` with units. e.g. `24.5 × 30 cm (sheet)`.
    • `<LinkedAttribution object>` — Walks `produced_by → carried_out_by` for primary; surfaces secondary `AttributeAssignment`s as tooltip.
    • `<MaterialTechniqueList object>` — `made_of` + `technique` aggregator.
    • `<LinkedLabel entity locale?>` — multilingual `_label` resolver.
    • `<UriBadge uri>` — pill that shows the authority (AAT/ULAN/TGN/Wikidata/VIAF), opens external link with `rel=external noopener`.
    • `<RightsBadge rights>` — RightsStatements.org + CC.
    • `<SourceBadge source>` — provider attribution.
    • `<CitationBlock citations[]>` — bibliographic formatter (Chicago / MLA / APA toggle).

    12.2 Entity Cards (preview)

    • `<ObjectCard object>` — image, label, date, attribution, status badge, rights.
    • `<ActorCard actor>` — portrait, name, dates, nationality, role.
    • `<PlaceCard place>` — map thumbnail, name, relationship.
    • `<ConceptCard concept>` — AAT term with broader/narrower context.

    12.3 Visualizations

    • `<ProvenanceTimeline provenance>` — vertical/horizontal stepper. Click → buyer / seller / price / place. Shaded ranges for uncertainty.
    • `<ExhibitionHistoryList exhibitions[]>` — chronological, grouped by venue.
    • `<GeoMapViewer entity>` — pins for `produced_at`, `current_location`, `moved_to`. Animation time slider for object movement.
    • `<IIIFCanvasViewer manifestUrl>` — OpenSeadragon wrapper. Multi-canvas (recto/verso), annotations overlay, side-by-side compare mode.
    • `<HierarchyTree set>` — `members_of` traversal.
    • `<NetworkGraph entity depth=2>` — Cytoscape force-directed, drill-down expand, contextual JSON inspector side-panel.
    • `<TimelineRange entities[]>` — overlapping timelines across many entities, with concertina list folding. Histopedia-style rendering for creator publication histories and concurrent exhibitions.
    • `<ConcertinaList entities[] facet>` — accordion-style navigation for dense datasets (e.g., thousands of artists by birth year or nationality). Pairs with a `<FacetHistogram>` head that lets users drill into a year / decade / region; expansion reveals alphabetically sorted entries dynamically without page-by-page pagination.
    • `<FacetHistogram facet bins>` — interactive bar/density chart driving the concertina; clicking a bin filters the list below. Use for date-bands, place-rollups, material counts.
    • `<EntityKnowledgePanel entityId>` — first-order "Knowledge Panel" page for non-work entities (Persons, Places, Concepts, Groups). Merges internal catalog hits with external context: DBpedia biography summary, Wikidata influence graph, ULAN variant names, AAT broader/narrower terms. Treats authorities as peers, not footnotes.

    12.4 Metadata & Context

    • `<ClassificationsBlock classified_as[]>` — clickable AAT pills.
    • `<ReferenceList referred_to_by[]>` — citation list, deep-link to `LinguisticObject`.
    • `<IdentifierBlock identified_by[]>` — accession, ULAN, Wikidata, internal.
    • `<StatementBlock statements[]>` — `LinguisticObject` text with attribution.

    12.5 Participatory

    • `<AnnotationOverlay manifestUrl>` — W3C Web Annotations on IIIF canvas.
    • `<CrowdCorrector entityId fieldPath>` — "Suggest an Edit" modal.
    • `<StoryBuilder>` — drag-and-drop narrative composer (ObjectCard + TextBlock).
    • `<NotebookExport entities[]>` — generate a Jupyter notebook with SPARQL + Python that reproduces the user's current view.

    12.6 Implementation rules

    • All widgets accept the canonical entity shape (§5.1) — no provider-specific props.
    • All widgets degrade gracefully on missing data (use the Statement pattern §4.4 as fallback).
    • All widgets are tree-shakeable and SSR-safe.
    • All widgets ship a Storybook story + visual regression snapshot + axe a11y check.
    • All widgets emit a `data-la-entity-id` attribute for crawlers and agents.

    ---

    13. IIIF Integration

    • Image API 3.0 for high-resolution serving via Cantaloupe behind a CDN.
    • Presentation API 3.0 manifests link `HumanMadeObject` → `DigitalObject` (`digitally_shown_by`).
    • Change Discovery 1.0 for ingest sync (§7.4).
    • Content State 1.0 for deep-links into specific canvas regions.
    • Authorization 2.0 for tiered access (public / Creative Commons / member-only / restricted).
    • Provenance: capture rights at the IIIF level (`requiredStatement`, `rights`) and bubble into `RightsReport`.

    UI: `<IIIFCanvasViewer>` is the single integration point. Supports compare-mode, region annotations, and deep zoom.

    ---

    14. ActivityStreams Syndication

    • Publish a W3C ActivityStreams 2.0 endpoint at `/api/activity`.
    • Activities: `Create`, `Update`, `Delete` for every materialized record.
    • Paged collection pattern (first/last/next/prev/current/partOf).
    • Webhook subscribers register at `/api/activity/subscriptions` (admin-only).
    • Aggregators (Yale LUX-style harvesters) walk backwards via `prev` to a cursor — your endpoint MUST return stable, monotonically increasing IDs and never resequence.

    ---

    15. Search & Discovery

    15.1 Layers

    1. Keyword (OpenSearch, BM25, multi-field).
    1. Faceted (typed facets from §8.3).
    1. Semantic / vector (similar-by-text and similar-by-image).
    1. Graph (SPARQL or HAL-link-follow).
    1. NL agent (§10.1).

    15.2 Autocomplete

    • Edge n-gram analyzer on `_label` + `identified_by.content` + alternate names.
    • Type-ahead returns `{ uri, _label, type, summary }`.

    15.3 "Follow your nose"

    Implement bidirectional links by querying the inverse from the target rather than embedding lists on hubs (Yale LUX rule). For example, a Concept page does not carry an array of items — it carries a HAL link `items_using_concept` that runs a parameterized OpenSearch query.

    15.4 Complexity toggle (Art Tracks model)

    Every page renders a "basic" view by default. A "See full scholarly details" toggle reveals `AttributeAssignment`s, raw JSON-LD, statements, full provenance.

    ---

    16. Curator & Researcher Tooling

    16.1 Visual ETL Mapper

    • ReactFlow canvas, columns on the left, Linked Art graph nodes on the right.
    • Drag a column → connect to a node → choose property → optional reconciliation step.
    • Save mapping as `MappingTemplate` (§5.1).
    • Dry-run preview: shows generated JSON-LD for the first 5 rows + SHACL results.
    • Run: submits a Dagster job via `/api/ingest/jobs`.

    16.2 Linked Art Form Builder (Arches / Ogee-based)

    • Built on Arches (open-source heritage data platform) with the Ogee package for Linked Art JSON-LD output. Curators see human-readable forms; the system emits compliant JSON-LD.
    • Documentation-first configuration via Zelge. Every UI field (e.g. "Birthplace") is declared in a Zelge config that maps it programmatically to a specific CIDOC-CRM semantic path (e.g. `crm:P98i_was_born / crm:P7_took_place_at / crm:E53_Place`). The form, the validator, and the JSON-LD generator all read from the same Zelge file — there is one source of truth. Repeatable patterns become reusable templates.
    • Controlled dropdowns only. All classification, typing, and role fields are restricted to a Reference Data Manager that loads local vocabularies or AAT terms as dropdowns. Free-text in a typed field is a validation error, not a warning. This is how you prevent the "1,000 spellings of 'oil paint'" problem at source.
    • Vocabulary-driven inputs auto-fetch URIs (debounced, cached against Redis URI concordance).
    • Real-time validation against `/api/validate`.
    • Domain-specific warnings: artist birthdate vs object production date, place existence in TGN at the production date (using PeriodO).
    • Outputs strict JSON-LD; never persists a freeform string where a URI is expected.

    16.3 Dynamic Graph Explorer

    • Loads JSON-LD by URL, paste, or upload.
    • Cytoscape force-directed, progressive disclosure (top-level entities first).
    • Click node → JSON inspector side-panel with full triple list.
    • Orphan-node detector.
    • Export: PNG, SVG, GraphML, JSON-LD subset.

    16.4 Sparnatural-style Visual SPARQL Builder

    • Faceted query construction grounded in the SHACL shape catalog.
    • Live preview of generated SPARQL.
    • Result preview as RDF triples table + optional CSV/Notebook export.

    16.5 Validation Console

    • Endpoint: `/api/validate` (POST). Returns `ValidationReport`.
    • UI: drag-and-drop JSON-LD, see SHACL violations grouped by shape, with suggested fixes.

    ---

    17. Validation Stack

    17.1 Layers

    1. JSON Schema (Draft 2020-12) — shape sanity checks. Mirrors the Zod schemas in §5.
    1. SHACL (PySHACL) over CIDOC-CRM 7.1 RDFS + Linked Art profile shapes — semantic constraints (cardinality, allowed `classified_as` ranges, required `identified_by` minimum, no blank nodes, 10-entity-boundary conformance, etc.).
    1. Business rules — date sanity (production within agent lifespan), place-existence-at-time, provenance gap detection.
    1. Linked Art conformance — pattern-by-pattern checks from §4 plus the anti-pattern rejections from §4.10.

    17.2 Authoring

    Maintain SHACL shapes in `shapes/linked-art/.shacl.ttl`. CI runs PySHACL against fixtures in `fixtures/linked-art/.jsonld`. Pull requests that change shapes require a maintainer review.

    17.3 Acceptance

    • 100% of public-facing records must validate.
    • A failing record cannot be published — only queued.
    • The system always returns a `ValidationReport` even on success, for transparency.

    ---

    18. Provenance & PROV-O

    • Every pipeline activity emits a PROV-O `Activity` to an append-only Postgres table + S3 BagIt mirror.
    • Hash-chained log (each entry includes prev hash) for tamper evidence.
    • The Curator Console renders the lineage graph (`prov:wasDerivedFrom`, `prov:wasGeneratedBy`, `prov:wasAttributedTo`) per record.
    • Audit bundle export = BagIt zip containing raw inputs, mappings, validation reports, output record.

    ---

    19. Rights & Licensing

    19.1 Required fields

    Every `HumanMadeObject` and `DigitalObject` MUST carry:

    • `rights` (RightsStatements.org code or CC URI).
    • `rights_holder` (Person/Group).
    • `attribution_string` (display-ready credit line).
    • `reuse_label` (computed: `public_domain | cc_by | cc_by_sa | cc0 | in_copyright | restricted | unknown`).

    19.2 UI rules

    • A record with `reuse_label = unknown` shows an explicit "Rights unknown — do not reuse" badge.
    • Image downloads only enabled for `public_domain`, `cc0`, `cc_by`, `cc_by_sa`.
    • Every Open Access dataset (Met, CMA, Smithsonian Open Access, Rijks, MoMA CSV) carries its source attribution all the way to the UI.

    19.3 Existing helpers (keep)

    • `src/utils/rights.js`, `src/adapters/getty.js::analyzeGettyRights`, `src/adapters/met.js::analyzeMetRights` already implement provider-specific logic. Extend rather than rewrite.

    ---

    20. Performance Engineering

    20.1 Shortcut triples (Yale LUX)

    At materialization, inject 1-hop predicates:

    
    :object lux:agentOfProduction :person .
    :object lux:placeOfProduction :place .
    :object lux:dateOfProductionStart "1889-06-01"^^xsd:dateTime .
    :object lux:dateOfProductionEnd   "1889-06-30"^^xsd:dateTime .
    :object lux:currentOwner :institution .
    :object lux:depicts :concept .
    

    Runtime queries that previously required 3+ joins now do single-hop lookups. The shortcut predicates live in a `lux:` (or `mm:`) namespace under `https://lod.metamuseum.org/ns/`.

    20.2 Caching layers

    • CDN (Cloudflare / Fastly): cache static JSON-LD with `Cache-Control: public, max-age=3600, stale-while-revalidate=86400`.
    • Browser: service worker for the SPA + IndexedDB for entity cache.
    • Redis: URI concordance, facet results (5-min TTL), HAL link cache.
    • Search: OpenSearch request cache.

    20.3 Pre-computation

    • Aggregate counts per facet are pre-computed nightly.
    • "Top 5 productions per agent", "top 5 owned objects per institution" etc. live in the materialized `Record`.

    20.4 Targets

    • API p95 < 200ms for cached `Record` hits.
    • API p95 < 500ms for cold `Record` reads.
    • Search p95 < 300ms for keyword + facet.
    • SPARQL p95 < 2s for whitelisted queries.
    • IIIF tile serving p95 < 100ms (CDN-backed).
    • Ingest: 10k records in < 5 minutes for incremental updates.
    • Uptime: 99.9% public read; 99.5% write.

    ---

    21. Observability

    • OpenTelemetry SDKs in every service.
    • Traces → Grafana Tempo. Span every adapter call, validation, SHACL evaluation, LLM call.
    • Metrics → Prometheus + Grafana. Required dashboards:
    • Ingest throughput / error rate / lag per source.
    • Reconciliation accuracy per authority.
    • API latency p50/p95/p99 per route.
    • SPARQL query mix and timeouts.
    • LLM call cost + cache hit rate.
    • Cache hit ratios (CDN, Redis, OpenSearch).
    • Logs → Loki. Structured (JSON). Required fields: `trace_id`, `tenant`, `actor`, `route`, `entity_id`.
    • Alerts: paging on validation pass rate < 99%, ingest lag > 30 min, API error rate > 1%, vector store sync lag > 10 min.

    ---

    22. Security & API Governance

    • OAuth2/OIDC via Keycloak. Roles: `admin`, `data_engineer`, `curator`, `educator`, `researcher`, `public`.
    • API keys for non-anonymous read tiers; HMAC-signed webhooks for ingest triggers.
    • Rate limits enforced at the gateway (§9.7).
    • SPARQL injection: queries pass through `rdflib`-based parser to reject `INSERT/DELETE/LOAD/CLEAR/DROP` for non-admin tokens.
    • WAF rules: block known LLM-prompt-injection patterns in query params (e.g., `ignore previous instructions`).
    • Secrets in Vault. Never in env files in repo.
    • CSP headers on the SPA. Strict `connect-src` to known authorities only.
    • IIIF Authorization 2.0 for restricted-tier images.

    ---

    23. Testing & TDD Requirements

    23.1 Required coverage

    • Adapters (`src/adapters/*`) — normalization, rights, IIIF discovery, error fallbacks.
    • Utils (`src/utils/*`) — Linked Art helpers, date / dimension parsing, HTML `describedby` discovery, URL safety.
    • Contracts — `validate` rejects malformed input, `create` deterministically produces canonical shape.
    • API — golden responses for each route, JSON-LD validates against the SHACL shapes.
    • SHACL — every shape has at least one passing and one failing fixture.
    • Frontend widgets — Storybook + axe + visual regression.
    • E2E — Playwright: search → detail → IIIF deep-zoom → graph view; agent ingest dry-run; curator form submission.
    • A11y — axe-core in CI, no AA violations.
    • Performance — k6 smoke targeting the SLOs in §20.4 on every deploy.
    • Security — `npm audit` / `pip-audit`; Semgrep ruleset for prompt-injection patterns; ZAP baseline on staging.

    23.2 Existing convention (keep)

    • Node `node:test` + `node:assert`. No external test framework. Globs avoided — each test file listed in `package.json` `scripts.test`.

    23.3 Acceptance gates per merge

    • All tests green.
    • SHACL pass on fixtures.
    • a11y pass.
    • API contract diff reviewed (OpenAPI lint).
    • Loading / empty / error / success states in any new UI.
    • Source URL, provider, rights, attribution, import timestamp preserved on any record-touching code path.
    • Agent-facing features have documented input/output contracts.

    ---

    24. Deployment Topology

    24.1 Local dev

    • `docker-compose.yml` brings up: Postgres, Redis, MinIO, GraphDB, OpenSearch, Cantaloupe, Keycloak, Dagster, the Node gateway, the Next.js frontend.
    • Sample CSV → live SPARQL endpoint within 5 minutes of `docker compose up`.

    24.2 Staging / Prod

    • Kubernetes. One namespace per environment.
    • Helm charts per service.
    • Blue/green for the API gateway. Canary for the SPA.
    • Backups: Postgres daily WAL + weekly snapshot; triple store dump weekly; OpenSearch snapshot daily to S3.
    • DR: cross-region S3 replication for raw + materialized records; RTO 4h, RPO 1h.

    24.3 Cost guidance

    • Smallest viable cloud footprint: DigitalOcean Kubernetes + GraphDB CE + managed Postgres + S3-compatible object storage. ~$50-200/mo for portfolio-scale.
    • Production at 10M+ records: budget for MarkLogic or Stardog licensing if SPARQL p95 matters.

    ---

    25. Phased Build Plan (AI-agent-friendly)

    Each phase ships an independently demoable artifact. Coding agents should pick tasks from one phase at a time and never start a later phase until the prior phase's acceptance criteria pass.

    Phase 0 — Foundations (Weeks 1-2)

    Goal: Lock the contracts and patterns. No new UI.

    Tasks:

    1. Author SHACL shapes for every entity type in `shapes/linked-art/*.shacl.ttl`.
    1. Add Zod mirrors of every contract in `src/contracts/zod/*.ts`.
    1. Extend `src/contracts/` with `Entity`, `MappingTemplate`, `ReconciliationCandidate`, `ProvenanceEvent`, `Citation`, `WikiDraft`, `AgentTask`, `ValidationReport`, `PROVRecord` (§5.1).
    1. Wire PySHACL into a Python validation microservice exposed at `/api/validate`.
    1. Add a Dagster project skeleton with one asset graph for the Met Open Access source as the canary.
    1. Replace ad-hoc fetches in `server.js` with calls into `src/adapters/*` per the existing module-dependency rules.

    Acceptance: Every existing fixture validates; new contracts have `create` + `validate` + Zod; one Dagster run materializes 10 Met records end-to-end into PostgreSQL + OpenSearch + a triple store.

    Phase 1 — Multi-modal Storage & HAL API (Weeks 3-5)

    Goal: Stand up the Yale-LUX-pattern hybrid store and HAL hypermedia.

    Tasks:

    1. Provision Postgres + Redis + OpenSearch + GraphDB + MinIO via `docker-compose.yml`.
    1. Build `src/utils/record-materializer.{js,ts}` that produces the denormalized `Record` doc + shortcut triples per §20.1.
    1. Add `/api/objects/:id`, `/api/agents/:id`, `/api/places/:id`, `/api/concepts/:id`, `/api/events/:id` with HAL `_links`.
    1. Add OpenSearch indexing on every record materialization.
    1. Add `/api/search` with faceted query, ActivityStreams paged collection responses.
    1. Add `/api/activity` syndication endpoint (paged, monotonically increasing IDs).
    1. Add CDN-friendly cache headers + ETag.

    Acceptance: A single HAL link traversal can navigate from a search hit → object → agent → other works by agent → place of birth without the client constructing any SPARQL.

    Phase 2 — Ingest Pipeline & Reconciliation (Weeks 6-9)

    Goal: Curators can ingest a CSV and see it reconciled and published.

    Tasks:

    1. Implement Dagster assets for Extract → Profile → Map → Reconcile → Validate → Materialize → Index → Publish.
    1. Build the Reconciliation microservice (FastAPI) talking to Getty SPARQL, VIAF, Wikidata, GeoNames with Redis-backed URI cache.
    1. Add the human-review queue UI in the Curator Console (`/console/reconciliation`).
    1. Build the Visual ETL Mapper (ReactFlow) with `MappingTemplate` save / load / dry-run.
    1. Build the Form Builder with vocabulary-driven inputs.

    Acceptance: A user can upload `met_sample.csv`, define a mapping, hit Dry Run, see generated JSON-LD + SHACL, queue 50 records for review, approve, and see them in `/api/objects` with HAL links.

    Phase 3 — Discovery UI & Visualizations (Weeks 10-13)

    Goal: Public discovery SPA, IIIF, timelines, maps, graph view.

    Tasks:

    1. Migrate the existing `public/app.js` views into a Next.js 15 App Router scaffold (route-by-route — see §27). Keep `index.html` SPA fallback working.
    1. Build `@metamuseum/linked-art-ui` v0 with all atomics (§12.1) + ObjectCard + ActorCard + PlaceCard.
    1. Add `<IIIFCanvasViewer>`, `<ProvenanceTimeline>`, `<GeoMapViewer>`, `<NetworkGraph>`.
    1. Implement the "complexity toggle" pattern globally.
    1. Storybook + a11y + visual regression for every widget.

    Acceptance: A visitor can land on `/artwork/{id}`, see image / attribution / dates / materials / rights / provenance timeline / map / network graph without any custom per-domain logic in the page.

    Phase 4 — AI Layer (Weeks 14-17)

    Goal: NL→Query, Graph-RAG chat, AI-assisted mapping.

    Tasks:

    1. Build `/api/ai/query` (NL → SPARQL/HAL) with schema embeddings stored in pgvector.
    1. Build `/api/ai/chat` with graph-grounded RAG and mandatory citations.
    1. Surface a chat affordance in the Researcher Workspace and a "Suggest mapping" affordance in the Visual ETL Mapper.
    1. Add agent telemetry: every call logs intermediate SPARQL, result counts, refinement iterations, cost.

    Acceptance: A researcher asks "Find works by French painters depicting Paris before 1900" and gets a result set with citations resolvable to canonical records, plus the SPARQL the agent generated, plus an editable link.

    Phase 5 — Syndication, Meta Wiki Art Bridge, Hardening (Weeks 18-20)

    Tasks:

    1. Open the ActivityStreams subscriptions endpoint to external aggregators.
    1. Implement the Meta Wiki Art publish flow (WikiDraft → review → publish with citations).
    1. WCAG 2.1 AA audit on every public page.
    1. Pen test, k6 load test, DR drill.
    1. Public OpenAPI 3.1 docs at `/api/docs`.

    Acceptance: Uptime > 99.9% over a 30-day window with all SLOs in §20.4 met.

    ---

    26. KPIs & Success Metrics

    • Data quality: ≥ 80% of imported records enriched with ≥ 3 authority URIs (ULAN, AAT, TGN/Wikidata/VIAF/GeoNames).
    • Validation: 100% pass rate on published records.
    • Reconciliation: ≥ 90% auto-approve at confidence ≥ 0.95 across the Met canary; ≥ 95% precision on human-reviewed samples.
    • Performance: §20.4 SLOs met at p95.
    • UX: 80% of users find the desired object within 3 clicks; 90% successful search rate.
    • Adoption: ≥ 3 external Linked Art systems consume the ActivityStreams feed; ≥ 50 MAU on the discovery portal in the first quarter post-launch.
    • A11y: zero AA violations.
    • AI: ≥ 70% of NL queries return non-empty grounded answers on the gold-set; cost per query < $0.01 at scale.

    ---

    27. Migration from the Current Prototype

    Current state (per `README.md`, `CLAUDE.md`, `AGENTS.md`, `server.js`):

    • Monolithic Node.js HTTP server, no framework, no deps.
    • Vanilla-JS SPA in `public/`.
    • JSON file storage in `storage/`.
    • Adapters for Getty and Met. Contracts in `src/contracts/`.
    • Existing routes for `/api/explorer/`, `/api/getty/`, `/api/met/`, `/api/linked-art/`, `/api/patterns`, `/api/agents/run`, `/api/content/generate`, `/api/jobs`, `/api/issues`, etc.

    Migration rules:

    1. Keep `server.js` alive as the API surface until the gateway in §9 is ready. Do not delete working routes.
    1. Add, don't replace, new routes per Phase 1. The new HAL routes (`/api/objects/:id`, etc.) live alongside the existing ones. Public SPA gradually points at the new routes; the old ones stay for backward compatibility for at least one release.
    1. Storage migration: write a one-time exporter that reads `storage/records.json` + `storage/jobs.json` and emits Linked Art JSON-LD documents into the new materialization pipeline. The current `storage/` becomes the legacy canary fixture.
    1. No Next.js until contracts are stable — explicit rule from `README.md` line ~107. Migrate route-by-route once Phase 0 acceptance passes.
    1. Provider adapter split: continue extracting Getty + Met fetch/normalize logic out of `server.js` (already a stated next-build slice in `README.md`).
    1. Tests: every migrated path adds a `node:test` case under `tests/`. Add the file path explicitly to `package.json` `scripts.test`.

    27.1 Files / dirs to add (alongside what exists)

    
    src/
      contracts/
        zod/                       # NEW — Zod mirrors of every contract
        entity.js entity.zod.ts    # NEW
        mapping-template.js
        reconciliation-candidate.js
        provenance-event.js
        citation.js
        wikidraft.js
        agent-task.js
        validation-report.js
        prov-record.js
      utils/
        record-materializer.js     # NEW — Yale-LUX-pattern denormalizer
        shortcut-triples.js        # NEW
        hal.js                     # NEW — HAL link factory
        activity-stream.js         # NEW — paged collection helper
      adapters/
        getty.js                   # existing
        met.js                     # existing
        harvard.js                 # NEW
        smithsonian.js             # NEW
        rijks.js                   # NEW
        europeana.js               # NEW
        british-museum.js          # NEW
        aic.js                     # NEW
        cma.js                     # NEW
        wikidata.js                # NEW
        viaf.js                    # NEW
    shapes/
      linked-art/
        object.shacl.ttl           # NEW
        person.shacl.ttl
        place.shacl.ttl
        activity.shacl.ttl
        digital-object.shacl.ttl
    fixtures/
      linked-art/
        pass/
        fail/
    pipeline/                      # NEW — Python / Dagster
      pyproject.toml
      dagster.yaml
      assets/
        extract.py
        profile.py
        map.py
        reconcile.py
        validate.py
        materialize.py
        index.py
        publish.py
      services/
        validation_api/
        reconciliation_api/
        nl_query_agent/
        rag_chat/
    web/                           # NEW — Next.js 15 (when Phase 3 starts)
      app/
      components/
      lib/
    packages/
      linked-art-ui/               # NEW — shared component library
        src/
        stories/
        package.json
    ops/
      docker-compose.yml
      k8s/
      helm/
    docs/
      architecture/
      contracts/
      shapes/
      api/
    

    27.2 Hard rules during migration

    • Never mutate `_source.raw`.
    • Never import across adapters (use `artwork-builder.js` as the bridge — already enforced).
    • Never create a sibling `web/` Next.js app until Phase 3 acceptance passes (per README).
    • Always keep contracts as leaf modules — no upstream deps.
    • Always add a test for any risky transform.
    • Always preserve loading / empty / error / success states on any new UI surface.

    ---

    28. Risks & Mitigations

    | Risk | Likelihood | Impact | Mitigation |

    |---|---|---|---|

    | Legacy CMS data quality varies wildly | High | High | Three-tier confidence reconciliation (§7.3); human review queue; SHACL gate; explicit `unknown` rights state. |

    | CIDOC-CRM expertise gap | High | High | Form Builder + Visual ETL Mapper + NL→SPARQL hide complexity. Pedagogical scaffolding (Storytelling). |

    | External authorities change format / retire | Medium | Medium | Getty retires XML/Relational end-2025 — already using LOD. VIAF dumps frozen — fall back to Wikidata `P214`. Wrap every authority in an adapter with a feature flag. |

    | Pure SPARQL too slow at scale | Medium | High | Materialized `Record` docs + shortcut triples + aggressive CDN caching (§20). |

    | LLM hallucinations in Graph-RAG | High | High | Mandatory citations; "cite or refuse" rule; SHACL-validated SPARQL pre-execution; deterministic eval suite. |

    | Rights misclassification | Low | High | `unknown` is the default. Per-provider analyzers (existing). Image actions gated on `reuse_label`. |

    | Vendor lock-in (MarkLogic / Stardog) | Medium | Medium | Storage layer behind an interface; OSS fallback (GraphDB CE) tested in CI. |

    | IIIF server unavailable for partner institutions | Medium | Low | Adapter degrades gracefully to source-hosted image with attribution; manifest-on-the-fly fallback via `iiif-prezi3`. |

    | Prompt injection via NL queries | Medium | High | WAF rules; query sandbox; SPARQL-only execution path; no shell / file system tools in the discovery agent. |

    | Sustainability of links (URI rot) | High | Medium | Persistent identifier strategy; ARK or DOI for canonical records; CDN-cached `sameAs` clusters. |

    ---

    29. Glossary

    • AAT — Art & Architecture Thesaurus (Getty).
    • ActivityStreams — W3C 2.0 spec for activity feeds; basis of IIIF Change Discovery.
    • Arches — Open-source heritage data platform (Getty Conservation Institute + WMF) used as the curator backend.
    • BagIt — IETF spec for hierarchical file packaging used for audit bundles.
    • CIDOC-CRM 7.1 — ISO 21127, the formal ontology for cultural heritage, in its current RDFS-encoded version. Linked Art is a profile of it.
    • Crom (cromulent) — Python lib for Linked Art instance creation.
    • FAST — Faceted Application of Subject Terminology (OCLC); subject headings derived from LCSH.
    • HAL — Hypertext Application Language; `_links` envelope.
    • Histopedia — Interactive timeline visualization tool used for publication histories and overlapping exhibitions.
    • Linked Art — JSON-LD profile of CIDOC-CRM optimized for developer usability.
    • LOUD — Linked Open Usable Data (Yale LUX framing).
    • Ogee — Open-source Arches package that maps Arches data entry forms to compliant Linked Art JSON-LD output.
    • PeriodO — Authority for named historical periods and historical place names.
    • PROV-O — W3C Provenance Ontology.
    • SHACL — Shapes Constraint Language; W3C spec for RDF validation.
    • Shortcut triple — Materialized 1-hop predicate that collapses multi-hop CIDOC-CRM paths for query performance.
    • TGN — Thesaurus of Geographic Names (Getty).
    • ULAN — Union List of Artist Names (Getty).
    • VIAF — Virtual International Authority File (OCLC).
    • Zelge — Documentation-first configuration system. Declares the canonical mapping from each UI form field to its CIDOC-CRM semantic path; consumed by form generator, validator, and JSON-LD emitter so all three stay in sync.

    ---

    30. Open Questions (decide before Phase 0 ships)

    1. Primary triple store — MarkLogic (paid, Yale-grade) vs GraphDB CE (OSS) vs Stardog vs Fuseki? Default recommendation: GraphDB CE for the OSS path until p95 SLOs demand otherwise.
    1. Vector store — pgvector (simplest) vs Qdrant (best perf) vs Weaviate (graph-aware)? Default: pgvector until > 5M embeddings.
    1. Frontend styling — Tailwind 4 vs Chakra UI 3? Default: Tailwind 4 for new work; respect existing vanilla styles in `public/`.

    3a. Search engine — Solr 9 vs OpenSearch 2 vs Elasticsearch 8? Default for LUX alignment: Solr 9 (best nested-document faceting). Default for ops familiarity: OpenSearch 2.

    3b. Curator backend — Adopt full Arches + Ogee + Zelge, or build a lighter NestJS/Next.js curator console that emits the same JSON-LD? Default: Arches + Ogee + Zelge if a curator team exists; light console if dev-only.

    1. Orchestrator — Dagster vs Airflow? Default: Dagster for asset-centric semantics + lineage UI.
    1. Mono-repo or multi-repo? Recommendation: mono-repo with `pnpm` workspaces + `turbo` for the Node/TS side, plus a `pipeline/` Python subtree using `uv` + `pyproject.toml`.
    1. License of new code — match existing repo LICENSE.
    1. ID scheme — ULID + namespaced URIs (`https://lod.metamuseum.org/{type}/{ulid}`) — confirm domain.
    1. Meta Wiki Art namespace + citation template + agent approval gates — needs a separate spec doc (see `LinkedArtCollaboration.md` for context).

    When any of the above is decided, update §3.4 / §3.5 / §11 accordingly and remove the question from this list.

    ---

    End of master spec. Linked context: `LinkedArtPRD.md` · `LinkedArtAppFeatures.md` · `YaleLux.md` · `LODPipeline.md` · `LODEngine.md` · `LODTools.md` · `Vocabularies.md` · `SPARQL.md` · `LinkedArtPatterns.md` · `LinkedArtDashboard.md` · `LinkedArtRoadmap.md` · `LinkedArtWidgets.md` · `LinkedArtChallenges.md` · `LinkedArtSaaS.md` · `LinkedArtUnmetNeeds.md` · `LinkedArtUseCases.md` · `LinkedArtJobReady.md` · `LinkedArtSoftware.md` · `LinkedArtCollaboration.md` · `LinkedArtFeatureRoadmap.md` · `LinkedDesign.md` · `Linked Open Art Data Web App - Must-have Data Sources.md` · `Linked Art Notes.md`.

    AI/agent quick endpoints