{"id":"linked-art/LinkedArtSOTAWebApp","relativePath":"linked-art/LinkedArtSOTAWebApp.md","title":"LinkedArt SOTA Web App — Master Build Specification","markdown":"# LinkedArt SOTA Web App — Master Build Specification\n\n> **Audience:** AI coding agents (Claude Code, Codex, Cursor, Aider) and human engineers building the next-generation Meta Museum / Linked Art platform.\n> **Status:** Authoritative consolidation of every file in `LinkedArt/`. Treat this as the single source of truth. When this document conflicts with the older `LinkedArt*.md` notes, this document wins. When this document conflicts with `CLAUDE.md` / `AGENTS.md` (repo-level conventions), the repo-level conventions win for code-style only — the architectural targets here are aspirational and supersede the current monolithic `server.js` prototype over time.\n> **Reading time budget:** ~45 min. Granular intentionally. Skim §0 → §3 → §27 first; everything else is reference depth.\n\n---\n\n## 0. TL;DR for Coding Agents\n\nYou are building a **multi-modal, event-centric, hypermedia-driven cultural-heritage discovery platform** with the following non-negotiables:\n\n1. **Linked Art JSON-LD** is the canonical data layer. Everything serializes to / from it.\n2. **CIDOC-CRM 7.1 (RDFS encoding)** + Linked Art profile. Event-centric — never collapse production / acquisition / exhibition into flat fields.\n3. **Hybrid storage**: pre-materialized **JSON-LD \"Record\" documents** + RDF triple index + full-text/faceted index (Solr or OpenSearch) + vector index. Translating SPARQL graph queries into faceted UI results is non-trivial and requires this dual approach. (Yale LUX pattern — see §8.)\n4. **HAL hypermedia API** — clients follow named `_links`, they do not construct SPARQL. API design is **progressive enhancement**: simple, human-readable JSON-LD payloads first; full machine-readable graph network on opt-in.\n5. **Shortcut triples** materialized at ingest time so runtime queries are single-hop. (See §20.1.)\n6. **Rights-aware by default** — every record, image, and field carries reuse status (RightsStatements.org + CC).\n7. **Component library is data-pattern-driven**, not domain-typed. A `<LinkedDate>` renders for a fossil, a painting, or a book identically.\n8. **Reconciliation against Getty AAT/ULAN/TGN, PeriodO, VIAF, Wikidata, GeoNames, FAST, LoC** is part of the ingest path, not an afterthought. External authority dumps are downloaded and cached locally — never queried on the fly at request time.\n9. **NL → SPARQL / NL → HAL** with grounded Graph-RAG is the primary discovery affordance for non-technical users. Visual SPARQL builders (Sparnatural-style) are the secondary affordance.\n10. **WCAG 2.1 AA**, i18n via `language`-tagged `Name` objects, and 99.9% uptime are gates, not stretch goals.\n11. **10 primary entity boundaries, hard.** 5 main: `HumanMadeObject`, `LinguisticObject`, `VisualItem`, `DigitalObject`, `Set`. 5 connected/authority: `Person`, `Group`, `Place`, `Type` (Concepts), `Activity` (macro-events incl. provenance, destruction). (See §4.9.)\n12. **Modeling anti-patterns are forbidden.** No multiple instantiation (use partitioning into sub-activities — see §4.10). No blank nodes — every entity gets a minted URI. Always separate carrier (HumanMadeObject) from content (VisualItem / LinguisticObject) from surrogate (DigitalObject).\n\nExisting repo (`server.js`, `src/adapters/getty.js`, `src/adapters/met.js`, `public/app.js`) is the **MVP prototype**. Do **not** prematurely migrate to Next.js (see `README.md` line ~107). Migrate route-by-route only once the contracts in §5 are stable.\n\n---\n\n## 1. Product Vision & Scope\n\n### 1.1 Mission\nUnify cultural-heritage data across museums, libraries, archives, and natural-history collections into a single discoverable, queryable, and machine-actionable graph using the Linked Art profile of CIDOC-CRM. Cover ~90% of common use cases with ~10% of the ontology complexity.\n\n### 1.2 In-Scope (MVP → v1)\n- Ingest from JSON APIs, CSV, XML, MARC, IIIF manifests.\n- Normalize to Linked Art JSON-LD with strict pattern conformance.\n- Reconcile entities to Getty AAT/ULAN/TGN, Wikidata, VIAF, GeoNames, LoC.\n- Public discovery UI: faceted search, entity hubs, timelines, maps, network graphs.\n- Curator tools: form-based editor, visual ETL mapper, graph debugger.\n- Public read API: REST + JSON-LD + HAL + SPARQL + GraphQL.\n- AI layer: NL→Query, Graph-RAG chat with verifiable CIDOC-CRM citations.\n- IIIF integration: deep-zoom, side-by-side compare, IIIF Change Discovery sync.\n- ActivityStreams 2.0 syndication for downstream aggregators.\n\n### 1.3 Out-of-Scope (explicitly)\n- Replacing institutional CMS (TMS, EMu, Adlib, Qi, CollectionSpace). This is a **downstream publication layer**.\n- Image hosting — defer to IIIF Image API servers (Cantaloupe, IIPImage, Loris).\n- Native audio/video ingestion in v1 (metadata only).\n- Full CIDOC-CRM implementation. Linked Art profile only.\n- Upstream collection management workflows.\n- Full ML model training (use off-shelf embeddings & fine-tunes).\n\n### 1.4 Scale Targets (Yale LUX-derived)\n- **Phase 3 target**: ~15M metadata records.\n- **Phase 5 target**: ~45M total entities (records × authority + reconciled sameAs clusters).\n- **Reference production sizing**: 3× 16-vCPU nodes (AWS m6i.4xlarge equivalent) for the storage tier delivers sub-second p95 for the common query set. Scale horizontally; the storage tier is the bottleneck, not the API gateway.\n- Triple count at the upper bound approaches ~2B triples once shortcut predicates (§20.1) are materialized — plan storage IOPS accordingly.\n\n### 1.5 Personas (build-driving)\n| Persona | Primary need | Tool surface |\n|---|---|---|\n| Museum Curator / Cataloger | Publish authoritative metadata without writing JSON-LD | Form Builder, Visual ETL Mapper |\n| Digital Humanities Researcher | Cross-collection graph queries, provenance chains | Sparnatural-style builder, NL→SPARQL, Notebook exports |\n| Software Developer | Predictable REST + JSON-LD APIs | HAL-linked endpoints, GraphQL, OpenAPI/SHACL contracts |\n| Public Visitor / Student | Visual exploration, no graph literacy required | Discovery SPA, IIIF deep-zoom, timelines, storytelling |\n| AI Agent (claude/codex) | Programmatic ingest, validation, content generation | Agent endpoints, contract-validated I/O, dry-run modes |\n\n---\n\n## 2. Reference Architecture (Yale-LUX-derived, modernized)\n\n```\n            ┌──────────────────────────────────────────────────────┐\n            │  External Sources                                    │\n            │  Getty REST/IIIF/SPARQL · Met Open Access · Harvard  │\n            │  SAAM · Walters · YCBA · Rijks · Europeana · British │\n            │  Museum · AIC · V&A · Tate · MoMA · CMA · LoC ·      │\n            │  Wikidata · DBpedia · VIAF · GeoNames · PeriodO      │\n            └────────────┬─────────────────────────────────────────┘\n                         │ IIIF Change Discovery (W3C ActivityStreams 2.0)\n                         │ + REST polling + SPARQL federation\n                         ▼\n            ┌──────────────────────────────────────────────────────┐\n            │  Ingestion (Python)                                  │\n            │  Dagster / Airflow / Temporal orchestrator           │\n            │  ├─ Extract  → MinIO/S3 raw lake (immutable)         │\n            │  ├─ Profile  → Great Expectations + spaCy            │\n            │  ├─ Map      → Crom / PyLD / custom mappers          │\n            │  ├─ Reconcile→ RecordLinkage + sentence-transformers │\n            │  └─ Materialize \"Record\" + shortcut triples          │\n            │  State: PostgreSQL (jobs/lineage) + Redis (URI cache)│\n            └────────────┬─────────────────────────────────────────┘\n                         │ JSON-LD records + N-Quads\n                         ▼\n            ┌──────────────────────────────────────────────────────┐\n            │  Multi-Modal Storage                                 │\n            │  ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐  │\n            │  │MarkLogic│ │  Triple  │ │Elastic / │ │ pgvector │  │\n            │  │  -OR-   │ │  store   │ │OpenSearch│ │/Qdrant/  │  │\n            │  │ GraphDB │ │ (Fuseki/ │ │  (text + │ │ Weaviate │  │\n            │  │  +PG    │ │  Stardog)│ │  facet)  │ │ (vectors)│  │\n            │  └─────────┘ └──────────┘ └──────────┘ └──────────┘  │\n            │  All four stay in sync via Kafka/NATS event bus      │\n            └────────────┬─────────────────────────────────────────┘\n                         │ Static JSON-LD docs (CDN-cacheable)\n                         │ HAL _links · SPARQL · GraphQL · ActivityStreams\n                         ▼\n            ┌──────────────────────────────────────────────────────┐\n            │  API Gateway (Kong / Tyk / Cloudflare)               │\n            │  · OAuth2/OIDC (Keycloak) · API keys · rate-limit    │\n            │  · CORS · OpenAPI 3.1 · WAF · request tracing        │\n            └────────────┬─────────────────────────────────────────┘\n                         ▼\n            ┌──────────────────────────────────────────────────────┐\n            │  Application Services                                │\n            │  · Discovery API     (Node/NestJS or Python/FastAPI) │\n            │  · NL→Query Agent    (LLM + schema embeddings)       │\n            │  · Graph-RAG Chat    (vector + triple + citations)   │\n            │  · Validation API    (PySHACL + JSON Schema)         │\n            │  · Reconciliation API(Getty / VIAF / Wikidata)       │\n            │  · IIIF Proxy        (Cantaloupe + auth)             │\n            │  · ActivityStreams   (publishing endpoint)           │\n            └────────────┬─────────────────────────────────────────┘\n                         ▼\n            ┌──────────────────────────────────────────────────────┐\n            │  Frontends (Next.js 15 App Router + TS)              │\n            │  · Public SPA (discovery, IIIF, timelines, maps)     │\n            │  · Curator Console (Form Builder, ETL Mapper)        │\n            │  · Researcher Workspace (Graph Explorer, NB exports) │\n            │  · Agent Workspace (job runner, validation reports)  │\n            │  · Meta Wiki Art bridge (MediaWiki publish layer)    │\n            │  Module-federated micro-frontends, shared component  │\n            │  library (`@metamuseum/linked-art-ui`)               │\n            └──────────────────────────────────────────────────────┘\n```\n\n### 2.1 Architectural Invariants\n1. **Adapters do not import each other.** Bridge via `src/utils/artwork-builder.js` (already enforced — keep it).\n2. **Contracts are leaf modules.** No deps on adapters or utils.\n3. **Provider logic stays in adapters.** `server.js` (or future API gateway) only orchestrates.\n4. **Source snapshots are immutable.** `_source.raw` preserves the original provider response — transform at read time, never mutate stored records in place.\n5. **Linked Art JSON-LD is canonical.** UI DTOs (`Artwork`) are separate; map at the boundary.\n6. **Every write path is idempotent and reprocessable from any stage.** No destructive transformations without a recoverable upstream artifact in MinIO/S3.\n\n---\n\n## 3. Tech Stack (concrete, version-pinned)\n\n### 3.1 Languages & Runtimes\n- **Node.js ≥ 22 LTS** — public API gateway and lightweight services (continuity with current `server.js`).\n- **Python 3.12+** — ETL, reconciliation, validation, LLM tooling.\n- **TypeScript 5.4+** — all new frontend and any new TS services. `strict: true` non-negotiable.\n\n### 3.2 Frontend\n- **Next.js 15+** (App Router, RSC). SSR for entity pages, SSG for vocabularies, ISR for facets.\n- **React 19**.\n- **Styling**: Tailwind CSS 4 (preferred) **or** Chakra UI 3. Pick one per project, do not mix.\n- **State**: TanStack Query 5 for server state; Zustand for local UI state. Avoid Redux for v1.\n- **Forms**: React Hook Form + Zod schemas mirroring the contracts in §5.\n- **Visualization**:\n  - `d3` 7.x for bespoke chart primitives.\n  - `@visx/*` for composable charts.\n  - `cytoscape` + `cytoscape-cose-bilkent` for force-directed graphs.\n  - `reactflow` 11.x for the Visual ETL Mapper drag-and-drop canvas.\n  - `react-leaflet` 4.x + `leaflet` 1.9.x (or `mapbox-gl` 3.x) for maps. Prefer Leaflet for no-account-required dev.\n  - `openseadragon` 4.x for IIIF deep-zoom; wrap in `<IIIFCanvasViewer>`.\n  - `vis-timeline` 7.x or `@nivo/timeline` for chronologies.\n- **IIIF helpers**: `manifesto.js`, `@iiif/presentation-3` types.\n- **Testing**: Vitest + Testing Library + Playwright for e2e + `axe-core` for a11y.\n\n### 3.3 Backend Services\n- **Python**: FastAPI 0.110+, Pydantic v2, uvicorn, httpx, tenacity, structlog.\n- **Linked Art libs**: `cromulent` (Crom) for instantiation, `PyLD` for JSON-LD framing/compaction/expansion, `rdflib` 7.x for triple manipulation, `pyshacl` for SHACL.\n- **Node**: NestJS 10 (if you choose Node for the gateway) or stay vanilla-Node per current `server.js` conventions until contracts are stable.\n\n### 3.4 Storage\n- **Primary**: choose one of:\n  - **MarkLogic** (commercial, Yale LUX-grade — multi-model out of box, CTS query language for hybrid text+graph).\n  - **GraphDB** (Ontotext, free Community Edition, SPARQL 1.1 + Lucene plugin) — **recommended for OSS path**.\n  - **Stardog** (commercial, strong reasoning).\n  - **Apache Jena Fuseki + Lucene** for pure-OSS minimal setups.\n- **Document store** (if NOT MarkLogic): **PostgreSQL 16 + JSONB** for materialized `Record` documents and operational state (jobs, lineage, RBAC).\n- **Search / facets**: **Apache Solr 9.x** (the Yale LUX choice, strongest JSON-LD-friendly faceting + nested document support), **OpenSearch 2.x**, or **Elasticsearch 8.x**. Stores `Record`-derived flattened search docs. Pick one per environment. Default for the OSS LUX-aligned path: **Solr 9.x**.\n- **Vectors**: **pgvector** (simple) **or** **Qdrant** / **Weaviate** (scale). Stores embeddings for `LinguisticObject` text, `VisualItem` image descriptors, and entity summaries.\n- **Cache**: **Redis 7** — URI concordance (auth-string → URI), facet results, HAL link cache.\n- **Object storage**: **MinIO** (dev/self-host) or **S3** for raw extracts, ETL outputs, image originals, BagIt audit bundles.\n\n### 3.5 Pipeline & Eventing\n- **Orchestrator**: **Dagster** (preferred — asset-centric, strong lineage UI) **or** Apache Airflow 2.9+. For long-running curator workflows that need durable retries, **Temporal**.\n- **Stream**: **Kafka** (Confluent or Redpanda) or **NATS JetStream** for ingest → reconcile → publish events.\n- **Validation**: PySHACL + Great Expectations.\n- **Reconciliation**: `recordlinkage`, `rapidfuzz`, `sentence-transformers` (all-MiniLM-L6-v2 baseline; instructor-xl for higher fidelity).\n\n### 3.6 AI / LLM Stack\n- **Primary model**: Claude (Sonnet 4.6 default; Opus 4.7+ for complex SPARQL synthesis).\n- **Embeddings**: `voyage-3` (cultural-heritage-friendly), fallback `text-embedding-3-large`. Image embeddings via SigLIP or OpenCLIP for \"find similar\" by IIIF region.\n- **Agents**: Claude Agent SDK / LangGraph. Tool-use endpoints described in §10.\n- **Guardrails**: every NL→SPARQL must return (a) the SPARQL it generated, (b) the resulting triples, (c) the citation paths in CIDOC-CRM terms. No free-text claims without graph backing.\n\n### 3.7 IIIF\n- **Image server**: Cantaloupe 5.x.\n- **Manifest serving**: static (S3 + CDN) or on-the-fly via `iiif-prezi3` (Python).\n- **Spec conformance**: Image API 3.0, Presentation API 3.0, Change Discovery API 1.0, Content State API 1.0.\n\n### 3.8 Auth, Ops, CI/CD\n- **Auth**: Keycloak (OIDC) — Admin, Data Engineer, Curator, Educator, Public roles. Tenant-aware.\n- **Containers**: Docker + Docker Compose for dev; Helm charts on Kubernetes (DOKS / EKS / GKE) for prod.\n- **Observability**: OpenTelemetry → Grafana Tempo (traces) + Prometheus (metrics) + Loki (logs).\n- **CI/CD**: GitHub Actions. Required checks: `npm test`, `pytest`, SHACL validation of fixture records, OpenAPI lint, Playwright smoke, axe a11y.\n- **Feature flags**: Unleash (OSS) or GrowthBook.\n- **Secrets**: Doppler or HashiCorp Vault. Never commit `.env`.\n\n---\n\n## 4. Linked Art Patterns (the canonical 6)\n\nEvery entity emitted by this system MUST conform to these patterns. The validator (§17) rejects anything that violates them.\n\n### 4.1 Activity / Event Pattern\nNever link Object → Person directly. Always Object → Activity → Person.\n\n```jsonc\n{\n  \"id\": \"https://lod.example.org/object/123\",\n  \"type\": \"HumanMadeObject\",\n  \"_label\": \"The Starry Night\",\n  \"produced_by\": {\n    \"type\": \"Production\",\n    \"carried_out_by\": [{ \"id\": \"http://vocab.getty.edu/ulan/500115493\", \"type\": \"Person\", \"_label\": \"Vincent van Gogh\" }],\n    \"took_place_at\": [{ \"id\": \"http://vocab.getty.edu/tgn/7008038\", \"type\": \"Place\", \"_label\": \"Saint-Rémy-de-Provence\" }],\n    \"timespan\": {\n      \"type\": \"TimeSpan\",\n      \"begin_of_the_begin\": \"1889-06-01T00:00:00Z\",\n      \"end_of_the_end\":   \"1889-06-30T23:59:59Z\",\n      \"identified_by\": [{ \"type\": \"Name\", \"content\": \"June 1889\" }]\n    },\n    \"technique\": [{ \"id\": \"http://vocab.getty.edu/aat/300053349\", \"type\": \"Type\", \"_label\": \"oil painting (technique)\" }]\n  }\n}\n```\n\n### 4.2 Classification Pattern\nType with `classified_as` against controlled vocabularies (Getty AAT primary). Distinguish from `broader` (hierarchical placement) and `type` (CRM class).\n\n### 4.3 Core Properties Pattern\nEvery entity MUST have:\n- `id` — stable, dereferenceable URI.\n- `type` — CIDOC-CRM class (Linked Art profile).\n- `_label` — human-readable summary, language-undeclared default.\n- `identified_by` — at least one `Name` (with `language` if known) and any `Identifier` (accession nums, ULAN IDs, etc.).\n\n### 4.4 Statement Pattern\nWhen you cannot fully structure a fact, attach a `LinguisticObject` to express it as text. Always classify the statement with an AAT term (e.g., `description`, `provenance statement`, `materials statement`).\n\n### 4.5 TimeSpan Pattern\nISO 8601 (`xsd:dateTime`) in `begin_of_the_begin` / `end_of_the_end`, plus a `Name` for display. For uncertainty, widen the span; for fuzzy boundaries, also populate `end_of_the_begin` and `begin_of_the_end`.\n\n### 4.6 Expansion Pattern\nKeep primary data simple. Add nuance with `AttributeAssignment` (e.g., previous attributions, contested authorship) so consumers can ignore detail safely.\n\n### 4.7 Uncertainty & Contested Attributions\nUse `AttributeAssignment` with `assigned_property_type` and a confidence statement. Never overwrite the primary attribution; layer alternatives.\n\n### 4.8 Multilingual Names\nMultiple `Name` objects each with `language` (referenced via AAT or ISO 639). Default UI label resolution: user locale → `en` → first.\n\n### 4.9 The 10 Primary Entity Boundaries (hard rule)\nThe graph is segmented into exactly 10 entity types. The validator (§17) rejects records that smuggle data into the wrong boundary.\n\n**Main entities (5):**\n- `HumanMadeObject` — physical holdings (paintings, sculptures, books-as-objects, specimens, monuments).\n- `LinguisticObject` — texts and transcribed recordings (the work, not the carrier).\n- `VisualItem` — image content / iconography depicted (the work, not the carrier).\n- `DigitalObject` — surrogates: IIIF manifests, JPGs, PDFs, born-digital files.\n- `Set` — archival subdivisions and curated collections (nested for arrangement vs location).\n\n**Connected / authority entities (5):**\n- `Person` — individuals.\n- `Group` — organizations, families, studios, firms.\n- `Place` — geospatial entities with coordinates (TGN/GeoNames/PeriodO-grounded).\n- `Type` (Concepts) — AAT-aligned classifications.\n- `Activity` — macro-events: production, acquisition, exhibition, provenance transfers, destruction, conservation.\n\nArchives = nested `Set` objects separating physical box from intellectual arrangement. Natural-history specimens map to `HumanMadeObject` for cross-domain UI consistency. A `Person` and a `Group` are never collapsed into one record — even sole-proprietor studios.\n\n### 4.10 Anti-Patterns (forbidden)\n**No multiple instantiation.** Do not type a node as multiple unrelated classes. Instead **partition** into sub-activities. A provenance event becomes two children: `TransferOfCustody` + `TransferOfTitle`, each its own `Activity`. A composite acquisition becomes `Purchase` + `Loan` partitioned events.\n\n**No blank nodes.** Every entity — even an intermediate `TimeSpan`, `Name`, `Identifier`, or sub-`Activity` — gets a minted URI. This is what lets two records referring to the same exhibition align. Validator rejects any node lacking an `id`.\n\n**No carrier/content collapse.** The physical painting (`HumanMadeObject`) is never the depicted scene (`VisualItem`) and is never the digital JPG (`DigitalObject`). Link with `shows` / `digitally_shown_by` / `digitally_carries`.\n\n**No silent reconciliation overrides.** Never replace a curator-asserted URI in place. Add an `AttributeAssignment` recording the alternative.\n\n### 4.11 Human vs Machine Dates\nEvery `TimeSpan` carries **both**:\n- Machine-queryable bounds: `begin_of_the_begin` + `end_of_the_end` (and optionally `end_of_the_begin` + `begin_of_the_end` for fuzzy boundaries), all `xsd:dateTime` ISO 8601.\n- Human display: a `Name` with the original phrasing (\"circa 1873\", \"late 19th century\", \"after 1492\").\n\nNever attempt to mathematically quantify the uncertainty of \"circa\" — preserve the phrase, widen the bounds to a sensible range (e.g., ±5 years for \"circa\"), and let the UI render both.\n\n---\n\n## 5. Runtime Contracts (extend the existing `src/contracts/`)\n\nKeep the current convention: plain JS objects with `create*()` and `validate*()` — not a schema library. Add Zod mirrors for TS clients.\n\n### 5.1 Required contracts\n- `Artwork` (UI DTO — already exists)\n- `SourceRecord` (provider envelope — already exists)\n- `RightsReport` (already exists; extend with RightsStatements.org IDs and CC licenses)\n- `ImportJob` (already exists)\n- **NEW** `Entity` — discriminated union of `HumanMadeObject | Person | Group | Place | Type | Activity | Period | DigitalObject | LinguisticObject | VisualItem | Set`.\n- **NEW** `MappingTemplate` — Visual ETL mapper output, JSON-serializable rules.\n- **NEW** `ReconciliationCandidate` — `{ string, candidates: [{ uri, label, source, score, evidence }] }`.\n- **NEW** `ProvenanceEvent` — `Acquisition | TransferOfCustody | Activity` shape with buyer, seller, place, timespan, price.\n- **NEW** `Citation` — `{ text, source: { id, type, _label }, page?, locator? }`.\n- **NEW** `WikiDraft` — MediaWiki markup + citation rollup + agent review state.\n- **NEW** `AgentTask` — input contract, output contract, dry-run flag, audit trail.\n- **NEW** `ValidationReport` — `{ entityId, shaclResults[], jsonSchemaResults[], businessRules[], severity, suggestedFixes[] }`.\n- **NEW** `PROVRecord` — PROV-O serialization of every pipeline activity.\n\n### 5.2 Contract rules\n- Contracts MUST NOT import from `src/adapters/*` or `src/utils/*`. (Already enforced.)\n- Every contract exports `create<Name>`, `validate<Name>`, and a Zod schema in a parallel `.ts` file under `src/contracts/zod/`.\n- Adding a field: bump a `schemaVersion` in the contract, and write a migration in `src/utils/migrations/`.\n\n### 5.3 Backward-compatibility migration\nUse `normalizeIncomingRecord(source)` (already in repo) for any field shape change. Never write destructive migrations.\n\n---\n\n## 6. Vocabularies & Authority Files (URI cheat sheet)\n\n| Authority | URI pattern | Access |\n|---|---|---|\n| Getty AAT | `http://vocab.getty.edu/aat/{id}` | SPARQL: `https://data.getty.edu/vocab/sparql` · OpenRefine reconcile · N-Triples dumps |\n| Getty ULAN | `http://vocab.getty.edu/ulan/{id}` (agent: `…/ulan/{id}-agent`) | same as AAT |\n| Getty TGN | `http://vocab.getty.edu/tgn/{id}` (place: `…/tgn/{id}-place`) | same |\n| Getty CONA | `http://vocab.getty.edu/page/cona/{id}` | API only (not full LOD yet) |\n| Getty Iconography | (via CONA Web Services) | API |\n| VIAF | `https://viaf.org/viaf/{id}` | REST + SRU, RDF/N-Triples dumps, no public SPARQL |\n| GeoNames | `http://sws.geonames.org/{id}/` | REST + RDF |\n| Wikidata | `http://www.wikidata.org/entity/Q{id}` | SPARQL: `https://query.wikidata.org/sparql` |\n| LoC NAF | `http://id.loc.gov/authorities/names/{id}` | REST + RDF |\n| Iconclass | `http://iconclass.org/{notation}` | REST + RDF |\n| PeriodO | `http://n2t.net/ark:/99152/p0{id}` | REST + JSON-LD |\n| RightsStatements.org | `http://rightsstatements.org/vocab/{code}/1.0/` | static RDF |\n| Creative Commons | `https://creativecommons.org/licenses/{code}/{version}/` | static |\n\nLicense notes:\n- Getty Vocabularies: **ODC-By 1.0** (attribution required). XML/Relational formats retired end-2025 — **must use LOD path**.\n- VIAF: **ODC-By 1.0**. Dumps frozen as of Aug 2024 during OCLC infra work; plan a fallback to Wikidata `P214`.\n- Wikidata: **CC0**.\n- GeoNames: **CC-BY 4.0**.\n\n### 6.1 Reconciliation routing rules\n- **People**: ULAN (primary) → VIAF → Wikidata → LoC NAF. Merge via `owl:sameAs` cluster.\n- **Places**: TGN → PeriodO (for historical-period place names) → GeoNames → Wikidata. Capture coordinates; flag historical-vs-modern boundary ambiguity.\n- **Concepts/Materials/Techniques**: AAT → FAST → Wikidata. Iconclass for iconographic subjects.\n- **Cultural objects (depicted works, monuments)**: CONA → Wikidata.\n- **Bibliographic**: LoC NAF → VIAF → FAST.\n\n### 6.2 Local-cache rule (performance + reliability)\n**Never query an external authority on the request path.** Download authority dumps (Wikidata, LoC, Getty N-Triples, GeoNames) into local storage on a scheduled job; reconciliation reads from the local copy. Refresh cadence: Getty monthly, Wikidata weekly delta, LoC monthly, GeoNames weekly. Track dump version in Postgres so a reconciliation result can be reproduced.\n\n### 6.3 Graph Crawling Enrichment (hierarchy generation)\nAfter a first authority match (say, an LoC subject heading), the pipeline crawls outward through `owl:sameAs` / `skos:exactMatch` / `wdt:P244` to collect equivalent URIs across authorities (FAST, Wikidata, AAT). It then walks `skos:broader` / `wdt:P131` (place-in-place) / `gvp:broader` to materialize the hierarchical context (e.g., **Yosemite → California → Western United States → North America**). These broader-context URIs are written into the materialized `Record` so faceted UIs (concertina lists, regional rollups) can be served without runtime traversal.\n\n---\n\n## 7. ETL Pipeline (Extract → Load → Transform → Reconcile → Publish)\n\nFollow ELT, not ETL: preserve raw originals before any transformation. (`alation` best-practice, confirmed by Yale LUX.)\n\n### 7.1 Pipeline DAG (Dagster assets)\n\n```\nraw_extract (IO: MinIO/S3, immutable, hashed)\n   │\n   ▼\nprofile (Great Expectations + spaCy entity extraction)\n   │\n   ▼\ncanonical_intermediate (JSON, schema-validated, idempotent ID minting)\n   │\n   ▼\nmapped_to_linked_art (Crom + PyLD, pattern-conformant JSON-LD)\n   │\n   ▼\nreconciled (RecordLinkage + sentence-transformers + authority APIs)\n   │              \\\n   │               > queue for human review if confidence < 0.90\n   ▼\nvalidated (SHACL + JSON Schema + business rules)\n   │\n   ▼\nmaterialized_record (denormalized JSON-LD doc + shortcut triples)\n   │              \\\n   │               > emit ActivityStream item\n   ▼\nindexed (OpenSearch + triple store + vector store)\n   │\n   ▼\npublished (CDN-cacheable static JSON-LD, ActivityStream entry)\n```\n\nEach asset is **idempotent**. Re-running from any stage MUST NOT create duplicates. Use SHA-256 of canonical input as the dedupe key.\n\n### 7.2 ID minting policy\n- Prefer the authority URI if reconciled.\n- Otherwise mint `{base}/{type-slug}/{ulid}`.\n- Never reuse IDs across re-ingests of the same source row — use a `sameAs` cluster instead.\n\n### 7.3 Confidence thresholds\n- **≥ 0.95** auto-approve.\n- **0.85 – 0.95** auto-approve with `ReviewFlag` and weekly digest to curators.\n- **0.70 – 0.85** queue for human review in Curator Console.\n- **< 0.70** drop the candidate, log evidence for retraining.\n\n### 7.4 Synchronization\n- Pull updates from sources that expose **IIIF Change Discovery** (W3C ActivityStreams 2.0) — walk backwards through `prev` pages until you reach your last-seen cursor. Persist cursor in PostgreSQL.\n- For sources without Change Discovery (most Met / Harvard / AIC REST APIs), poll on a Dagster schedule with `If-Modified-Since` or compare hash of `_source.raw`.\n\n### 7.5 Error handling\n- Every stage emits a structured error to Postgres `pipeline_errors` with `entityId`, `stage`, `error`, `inputHash`, `agentVersion`.\n- Retry policy: exponential backoff, max 5 attempts, then human queue.\n- Never drop a record silently. Failure = visible.\n\n---\n\n## 8. Storage Strategy (multi-modal)\n\n### 8.1 The \"Record\" abstraction (load-bearing)\nYale LUX's key insight: **do not** construct sub-graphs on the fly. At materialization time, build a self-contained JSON-LD document per entity that includes everything the UI needs in one fetch:\n- Core entity fields.\n- Pre-resolved labels for all referenced URIs.\n- HAL `_links` for navigation.\n- Shortcut properties for common queries (see §20.1).\n- Optional `_embedded` summaries for hot relationships (top 5 productions for a person, etc.).\n\nThese documents are cached aggressively (CDN, browser, service worker).\n\n### 8.2 Triple store\nIndexes the RDF projection of every `Record` so SPARQL queries can traverse relationships. Use named graphs per source institution for provenance partitioning. Reasoning kept minimal (RDFS + SHACL); no full OWL DL at runtime.\n\n### 8.3 Search index\nFlatten each `Record` into a denormalized document for OpenSearch with:\n- `text_all` — concatenation of all `LinguisticObject` and `Name` content.\n- Facets: `type`, `classified_as.id[]`, `produced_by.timespan.year`, `produced_by.took_place_at.id[]`, `current_owner.id[]`, `material.id[]`, `rights.code`, `image_available`, `source_institution`.\n- Bidirectional pointers for \"follow your nose\" navigation.\n\n### 8.4 Vector index\n- One collection per entity type.\n- Embed `_label` + top-N classifications + statement texts (truncated to 512 tokens).\n- Separately embed IIIF thumbnails via SigLIP for visual similarity.\n- Always store the source URI as payload so vector hits resolve to canonical records.\n\n### 8.5 Operational state (PostgreSQL)\n- `jobs` — ImportJobs with status, error counts, lineage references.\n- `mappings` — versioned MappingTemplate rows.\n- `reconciliation_queue` — pending human reviews.\n- `audit_log` — append-only, hash-chained.\n- `users` / `roles` / `org_tenants`.\n\n---\n\n## 9. API Design\n\n### 9.1 Endpoint shape (REST + JSON-LD + HAL)\n\n| Resource | Path | Verb | Notes |\n|---|---|---|---|\n| Object | `/api/objects/{id}` | GET | Returns Linked Art `HumanMadeObject` + HAL `_links` |\n| Object collection | `/api/objects` | GET | `?type=`, `?classified_as=`, `?date=`, `?place=`, `?material=`, `?source=`, `?page=`, `?expand=production,provenance` |\n| Person/Group | `/api/agents/{id}` | GET | |\n| Event | `/api/events/{id}` | GET | Activities, Productions, Acquisitions, Exhibitions |\n| Place | `/api/places/{id}` | GET | |\n| Concept | `/api/concepts/{id}` | GET | AAT-aligned types |\n| Set | `/api/sets/{id}` | GET | Archival groupings |\n| Search | `/api/search` | GET, POST | Mirrors OpenSearch facets; returns HAL `_links` with next/prev |\n| SPARQL | `/api/sparql` | POST | Read-only, rate-limited, query whitelist optional |\n| NL Query | `/api/ai/query` | POST | NL → HAL link or SPARQL, with explanations |\n| Reconcile | `/api/reconcile` | POST | OpenRefine reconciliation service spec |\n| Ingest | `/api/ingest/jobs` | POST | Trigger Dagster run |\n| Validate | `/api/validate` | POST | JSON Schema + SHACL |\n| Activity Stream | `/api/activity` | GET | W3C ActivityStreams 2.0, paged |\n| Best practices | `/api/best-practices` | GET | (Already exists — keep.) |\n| Roadmap | `/api/roadmap` | GET | (Already exists.) |\n| Health | `/api/health` | GET | (Already exists.) |\n\nExisting routes in `server.js` (`/api/explorer/*`, `/api/getty/*`, `/api/met/*`, `/api/linked-art/*`, etc.) are the **MVP prototype**. Keep them working. Layer new routes alongside per the §27 migration plan.\n\n### 9.2 HAL conventions\nEvery response includes:\n```json\n{\n  \"_links\": {\n    \"self\":              { \"href\": \"/api/objects/123\" },\n    \"production\":        { \"href\": \"/api/objects/123/production\" },\n    \"provenance\":        { \"href\": \"/api/objects/123/provenance\" },\n    \"iiif_manifest\":     { \"href\": \"https://iiif.example.org/3/123/manifest.json\" },\n    \"search_works_by_agent\": { \"href\": \"/api/search?agent=ulan:500115493\" },\n    \"json_ld\":           { \"href\": \"/api/objects/123.jsonld\", \"type\": \"application/ld+json\" }\n  }\n}\n```\nClients **MUST** prefer following links over constructing query strings. This is the Yale LUX decoupling.\n\n### 9.3 Content negotiation\nServe `application/ld+json` with the Linked Art `@context` by default. Also respect:\n- `text/turtle`\n- `application/n-quads`\n- `application/rdf+xml`\n- `application/activity+json` for ActivityStreams pages\n- `application/hal+json` for HAL-only summaries\n\n### 9.4 Pagination\nW3C ActivityStreams \"Paged Collection\" pattern for everything paginated. `first`, `last`, `next`, `prev`, `current`, `partOf`.\n\n### 9.5 Graph expansion\n`?expand=production,provenance,exhibitions` returns embedded sub-resources. Hard cap depth = 2. Beyond that, clients must follow links.\n\n### 9.5.1 Progressive enhancement principle\nThe API ramps in complexity:\n- **Default tier** — simple, flat, human-presentable JSON-LD. Pre-resolved labels. The kind of payload a junior developer can `console.log` and ship.\n- **`?profile=expanded`** — fully expanded JSON-LD with nested entities (depth ≤ 2).\n- **`?profile=graph`** — full RDF graph context for machine consumers.\n- **`/api/sparql`** — direct graph access for power users.\n\nDocumented in OpenAPI 3.1. Default responses MUST be usable without reading the Linked Art spec — that is the bar.\n\n### 9.6 GraphQL\nOptional but recommended for the curator console. Schema mirrors the contracts in §5. Use Apollo or `graphql-yoga` with persisted queries to control surface area.\n\n### 9.7 Security & governance\n- **API keys** for non-anonymous tiers.\n- **Rate limits**: 100 req/min anonymous; 1000 req/min keyed; 60 SPARQL/min keyed.\n- **CORS**: whitelist per environment.\n- **SPARQL query timeout**: 10s default, 60s for keyed `researcher` tier.\n- **Read-only by default**. Write endpoints behind OIDC + RBAC.\n- **Audit log**: every write goes to append-only Postgres + S3 mirror (BagIt for export).\n\n---\n\n## 10. AI / LLM Layer\n\n### 10.1 NL → SPARQL / NL → HAL agent\n1. **Schema embeddings**: precompute embeddings for every AAT/ULAN/TGN term used in the corpus + the CIDOC-CRM property list. Store in pgvector.\n2. **Retrieval**: query NL question → top-k schema elements (classes + properties).\n3. **Prompt assembly**: ground the LLM with retrieved schema, examples (few-shot), and the user's facet context.\n4. **Generate**: produce intermediate AST → SPARQL query OR a HAL link if a prebuilt query exists.\n5. **Validate against SHACL** before execution.\n6. **Execute**, returning rows + provenance trail.\n7. **Refinement loop**: if empty results, the agent re-plans (loosen filters, switch authority, federate via `SERVICE`).\n\n### 10.2 Graph-RAG chat\n- Hybrid retrieval: vector hit on text + 1-hop graph expansion.\n- Generate answer with **mandatory** citations: each sentence carries `[entityId, propertyPath]` references resolvable to source records.\n- No claim ungrounded by a triple or a `LinguisticObject` statement.\n\n### 10.3 LLM-assisted mapping\nFor Visual ETL Mapper: when a user uploads a new CSV, an agent proposes a mapping template by matching column headers + sample values against past `MappingTemplate` rows and Linked Art property semantics. User accepts / edits — no auto-publish.\n\n### 10.4 LLM-assisted reconciliation\nUsed as **tiebreaker only**. The agent receives the top-5 candidates from RecordLinkage + sentence-transformers and explains the choice. Confidence still must clear thresholds (§7.3).\n\n### 10.5 Safety\n- All agent calls are `AgentTask` contracts: input + output + dry-run.\n- `dryRun=true` is the default for any write-side agent. UI surfaces a diff before commit.\n- Cite or refuse. Hallucinated content is a P0 bug.\n\n---\n\n## 11. Frontend Architecture\n\n### 11.1 App shell\n- Next.js 15 App Router.\n- Module federation (`@module-federation/nextjs-mf` 8.x) for micro-frontend composition: Public SPA, Curator Console, Researcher Workspace, Agent Workspace are independently buildable.\n- Shared component library `@metamuseum/linked-art-ui` published to a private npm registry (Verdaccio for dev, GitHub Packages for prod).\n\n### 11.2 Routing\n- Public SPA mirrors the existing `app.js` views (`Home`, `Explore`, `ArtworkDetail`, `Stories`, `Compare`, `About Data`) — keep the `history.pushState` semantics. URLs:\n  - `/artwork/{id}`, `/agent/{id}`, `/place/{id}`, `/concept/{id}`, `/set/{id}`, `/event/{id}`.\n- Server falls back to `index.html` for SPA routing — already in place.\n\n### 11.3 Data fetching\n- TanStack Query with HAL link following: `useEntity(id)` returns the record + a `follow(linkName)` helper.\n- Cache key includes the entity URI + facet hash so list views and detail views share cache.\n- Persistent cache via `IDBPersist` so the SPA is usable offline-first within a session.\n\n### 11.4 Accessibility\n- WCAG 2.1 AA. Lighthouse + axe-core in CI.\n- Keyboard navigation on all interactive surfaces (already a project quality gate per `README.md`).\n- ARIA live regions for facet updates and search progress.\n- High-contrast theme.\n- Reduced-motion preference respected for timelines and force-directed graphs.\n\n### 11.5 Internationalization\n- `next-intl` for app chrome.\n- Entity labels resolved at render via §4.8 rules.\n- Vocabulary terms via AAT preferred labels in the user's locale; fallback to `en`.\n\n---\n\n## 12. UI Widget Catalog (`@metamuseum/linked-art-ui`)\n\nBuild in this order. Atomics first, macros last. Every widget MUST work with **any** `Entity` from §5.1 without bespoke domain logic.\n\n### 12.1 Atomic\n- `<LinkedDate timespan>` — `TimeSpan` → readable date / range / \"Circa 19th C.\" Falls through `display_label` first.\n- `<LinkedDimensions dimensions[]>` — flattens `Dimension[]` with units. e.g. `24.5 × 30 cm (sheet)`.\n- `<LinkedAttribution object>` — Walks `produced_by → carried_out_by` for primary; surfaces secondary `AttributeAssignment`s as tooltip.\n- `<MaterialTechniqueList object>` — `made_of` + `technique` aggregator.\n- `<LinkedLabel entity locale?>` — multilingual `_label` resolver.\n- `<UriBadge uri>` — pill that shows the authority (AAT/ULAN/TGN/Wikidata/VIAF), opens external link with `rel=external noopener`.\n- `<RightsBadge rights>` — RightsStatements.org + CC.\n- `<SourceBadge source>` — provider attribution.\n- `<CitationBlock citations[]>` — bibliographic formatter (Chicago / MLA / APA toggle).\n\n### 12.2 Entity Cards (preview)\n- `<ObjectCard object>` — image, label, date, attribution, status badge, rights.\n- `<ActorCard actor>` — portrait, name, dates, nationality, role.\n- `<PlaceCard place>` — map thumbnail, name, relationship.\n- `<ConceptCard concept>` — AAT term with broader/narrower context.\n\n### 12.3 Visualizations\n- `<ProvenanceTimeline provenance>` — vertical/horizontal stepper. Click → buyer / seller / price / place. Shaded ranges for uncertainty.\n- `<ExhibitionHistoryList exhibitions[]>` — chronological, grouped by venue.\n- `<GeoMapViewer entity>` — pins for `produced_at`, `current_location`, `moved_to`. Animation time slider for object movement.\n- `<IIIFCanvasViewer manifestUrl>` — OpenSeadragon wrapper. Multi-canvas (recto/verso), annotations overlay, side-by-side compare mode.\n- `<HierarchyTree set>` — `members_of` traversal.\n- `<NetworkGraph entity depth=2>` — Cytoscape force-directed, drill-down expand, contextual JSON inspector side-panel.\n- `<TimelineRange entities[]>` — overlapping timelines across many entities, with concertina list folding. Histopedia-style rendering for creator publication histories and concurrent exhibitions.\n- `<ConcertinaList entities[] facet>` — accordion-style navigation for dense datasets (e.g., thousands of artists by birth year or nationality). Pairs with a `<FacetHistogram>` head that lets users drill into a year / decade / region; expansion reveals alphabetically sorted entries dynamically without page-by-page pagination.\n- `<FacetHistogram facet bins>` — interactive bar/density chart driving the concertina; clicking a bin filters the list below. Use for date-bands, place-rollups, material counts.\n- `<EntityKnowledgePanel entityId>` — first-order \"Knowledge Panel\" page for non-work entities (Persons, Places, Concepts, Groups). Merges internal catalog hits with external context: DBpedia biography summary, Wikidata influence graph, ULAN variant names, AAT broader/narrower terms. Treats authorities as peers, not footnotes.\n\n### 12.4 Metadata & Context\n- `<ClassificationsBlock classified_as[]>` — clickable AAT pills.\n- `<ReferenceList referred_to_by[]>` — citation list, deep-link to `LinguisticObject`.\n- `<IdentifierBlock identified_by[]>` — accession, ULAN, Wikidata, internal.\n- `<StatementBlock statements[]>` — `LinguisticObject` text with attribution.\n\n### 12.5 Participatory\n- `<AnnotationOverlay manifestUrl>` — W3C Web Annotations on IIIF canvas.\n- `<CrowdCorrector entityId fieldPath>` — \"Suggest an Edit\" modal.\n- `<StoryBuilder>` — drag-and-drop narrative composer (ObjectCard + TextBlock).\n- `<NotebookExport entities[]>` — generate a Jupyter notebook with SPARQL + Python that reproduces the user's current view.\n\n### 12.6 Implementation rules\n- All widgets accept the canonical entity shape (§5.1) — no provider-specific props.\n- All widgets degrade gracefully on missing data (use the Statement pattern §4.4 as fallback).\n- All widgets are tree-shakeable and SSR-safe.\n- All widgets ship a Storybook story + visual regression snapshot + axe a11y check.\n- All widgets emit a `data-la-entity-id` attribute for crawlers and agents.\n\n---\n\n## 13. IIIF Integration\n\n- **Image API 3.0** for high-resolution serving via Cantaloupe behind a CDN.\n- **Presentation API 3.0** manifests link `HumanMadeObject` → `DigitalObject` (`digitally_shown_by`).\n- **Change Discovery 1.0** for ingest sync (§7.4).\n- **Content State 1.0** for deep-links into specific canvas regions.\n- **Authorization 2.0** for tiered access (public / Creative Commons / member-only / restricted).\n- **Provenance**: capture rights at the IIIF level (`requiredStatement`, `rights`) and bubble into `RightsReport`.\n\nUI: `<IIIFCanvasViewer>` is the single integration point. Supports compare-mode, region annotations, and deep zoom.\n\n---\n\n## 14. ActivityStreams Syndication\n\n- Publish a W3C ActivityStreams 2.0 endpoint at `/api/activity`.\n- Activities: `Create`, `Update`, `Delete` for every materialized record.\n- Paged collection pattern (first/last/next/prev/current/partOf).\n- Webhook subscribers register at `/api/activity/subscriptions` (admin-only).\n- Aggregators (Yale LUX-style harvesters) walk backwards via `prev` to a cursor — your endpoint MUST return stable, monotonically increasing IDs and never resequence.\n\n---\n\n## 15. Search & Discovery\n\n### 15.1 Layers\n1. **Keyword** (OpenSearch, BM25, multi-field).\n2. **Faceted** (typed facets from §8.3).\n3. **Semantic / vector** (similar-by-text and similar-by-image).\n4. **Graph** (SPARQL or HAL-link-follow).\n5. **NL agent** (§10.1).\n\n### 15.2 Autocomplete\n- Edge n-gram analyzer on `_label` + `identified_by.content` + alternate names.\n- Type-ahead returns `{ uri, _label, type, summary }`.\n\n### 15.3 \"Follow your nose\"\nImplement bidirectional links by querying the inverse from the target rather than embedding lists on hubs (Yale LUX rule). For example, a Concept page does **not** carry an array of items — it carries a HAL link `items_using_concept` that runs a parameterized OpenSearch query.\n\n### 15.4 Complexity toggle (Art Tracks model)\nEvery page renders a \"basic\" view by default. A \"See full scholarly details\" toggle reveals `AttributeAssignment`s, raw JSON-LD, statements, full provenance.\n\n---\n\n## 16. Curator & Researcher Tooling\n\n### 16.1 Visual ETL Mapper\n- ReactFlow canvas, columns on the left, Linked Art graph nodes on the right.\n- Drag a column → connect to a node → choose property → optional reconciliation step.\n- Save mapping as `MappingTemplate` (§5.1).\n- Dry-run preview: shows generated JSON-LD for the first 5 rows + SHACL results.\n- Run: submits a Dagster job via `/api/ingest/jobs`.\n\n### 16.2 Linked Art Form Builder (Arches / Ogee-based)\n- Built on **Arches** (open-source heritage data platform) with the **Ogee** package for Linked Art JSON-LD output. Curators see human-readable forms; the system emits compliant JSON-LD.\n- **Documentation-first configuration via Zelge.** Every UI field (e.g. \"Birthplace\") is declared in a Zelge config that maps it programmatically to a specific CIDOC-CRM semantic path (e.g. `crm:P98i_was_born / crm:P7_took_place_at / crm:E53_Place`). The form, the validator, and the JSON-LD generator all read from the same Zelge file — there is one source of truth. Repeatable patterns become reusable templates.\n- **Controlled dropdowns only.** All classification, typing, and role fields are restricted to a Reference Data Manager that loads local vocabularies or AAT terms as dropdowns. Free-text in a typed field is a validation error, not a warning. This is how you prevent the \"1,000 spellings of 'oil paint'\" problem at source.\n- Vocabulary-driven inputs auto-fetch URIs (debounced, cached against Redis URI concordance).\n- Real-time validation against `/api/validate`.\n- Domain-specific warnings: artist birthdate vs object production date, place existence in TGN at the production date (using PeriodO).\n- Outputs strict JSON-LD; never persists a freeform string where a URI is expected.\n\n### 16.3 Dynamic Graph Explorer\n- Loads JSON-LD by URL, paste, or upload.\n- Cytoscape force-directed, progressive disclosure (top-level entities first).\n- Click node → JSON inspector side-panel with full triple list.\n- Orphan-node detector.\n- Export: PNG, SVG, GraphML, JSON-LD subset.\n\n### 16.4 Sparnatural-style Visual SPARQL Builder\n- Faceted query construction grounded in the SHACL shape catalog.\n- Live preview of generated SPARQL.\n- Result preview as RDF triples table + optional CSV/Notebook export.\n\n### 16.5 Validation Console\n- Endpoint: `/api/validate` (POST). Returns `ValidationReport`.\n- UI: drag-and-drop JSON-LD, see SHACL violations grouped by shape, with suggested fixes.\n\n---\n\n## 17. Validation Stack\n\n### 17.1 Layers\n1. **JSON Schema** (Draft 2020-12) — shape sanity checks. Mirrors the Zod schemas in §5.\n2. **SHACL** (PySHACL) over **CIDOC-CRM 7.1 RDFS** + Linked Art profile shapes — semantic constraints (cardinality, allowed `classified_as` ranges, required `identified_by` minimum, no blank nodes, 10-entity-boundary conformance, etc.).\n3. **Business rules** — date sanity (production within agent lifespan), place-existence-at-time, provenance gap detection.\n4. **Linked Art conformance** — pattern-by-pattern checks from §4 plus the anti-pattern rejections from §4.10.\n\n### 17.2 Authoring\nMaintain SHACL shapes in `shapes/linked-art/*.shacl.ttl`. CI runs PySHACL against fixtures in `fixtures/linked-art/*.jsonld`. Pull requests that change shapes require a maintainer review.\n\n### 17.3 Acceptance\n- 100% of public-facing records must validate.\n- A failing record cannot be published — only queued.\n- The system always returns a `ValidationReport` even on success, for transparency.\n\n---\n\n## 18. Provenance & PROV-O\n\n- Every pipeline activity emits a PROV-O `Activity` to an append-only Postgres table + S3 BagIt mirror.\n- Hash-chained log (each entry includes prev hash) for tamper evidence.\n- The Curator Console renders the lineage graph (`prov:wasDerivedFrom`, `prov:wasGeneratedBy`, `prov:wasAttributedTo`) per record.\n- Audit bundle export = BagIt zip containing raw inputs, mappings, validation reports, output record.\n\n---\n\n## 19. Rights & Licensing\n\n### 19.1 Required fields\nEvery `HumanMadeObject` and `DigitalObject` MUST carry:\n- `rights` (RightsStatements.org code or CC URI).\n- `rights_holder` (Person/Group).\n- `attribution_string` (display-ready credit line).\n- `reuse_label` (computed: `public_domain | cc_by | cc_by_sa | cc0 | in_copyright | restricted | unknown`).\n\n### 19.2 UI rules\n- A record with `reuse_label = unknown` shows an explicit \"Rights unknown — do not reuse\" badge.\n- Image downloads only enabled for `public_domain`, `cc0`, `cc_by`, `cc_by_sa`.\n- Every Open Access dataset (Met, CMA, Smithsonian Open Access, Rijks, MoMA CSV) carries its source attribution all the way to the UI.\n\n### 19.3 Existing helpers (keep)\n- `src/utils/rights.js`, `src/adapters/getty.js::analyzeGettyRights`, `src/adapters/met.js::analyzeMetRights` already implement provider-specific logic. Extend rather than rewrite.\n\n---\n\n## 20. Performance Engineering\n\n### 20.1 Shortcut triples (Yale LUX)\nAt materialization, inject 1-hop predicates:\n```\n:object lux:agentOfProduction :person .\n:object lux:placeOfProduction :place .\n:object lux:dateOfProductionStart \"1889-06-01\"^^xsd:dateTime .\n:object lux:dateOfProductionEnd   \"1889-06-30\"^^xsd:dateTime .\n:object lux:currentOwner :institution .\n:object lux:depicts :concept .\n```\nRuntime queries that previously required 3+ joins now do single-hop lookups. The shortcut predicates live in a `lux:` (or `mm:`) namespace under `https://lod.metamuseum.org/ns/`.\n\n### 20.2 Caching layers\n- **CDN** (Cloudflare / Fastly): cache static JSON-LD with `Cache-Control: public, max-age=3600, stale-while-revalidate=86400`.\n- **Browser**: service worker for the SPA + IndexedDB for entity cache.\n- **Redis**: URI concordance, facet results (5-min TTL), HAL link cache.\n- **Search**: OpenSearch request cache.\n\n### 20.3 Pre-computation\n- Aggregate counts per facet are pre-computed nightly.\n- \"Top 5 productions per agent\", \"top 5 owned objects per institution\" etc. live in the materialized `Record`.\n\n### 20.4 Targets\n- API p95 < 200ms for cached `Record` hits.\n- API p95 < 500ms for cold `Record` reads.\n- Search p95 < 300ms for keyword + facet.\n- SPARQL p95 < 2s for whitelisted queries.\n- IIIF tile serving p95 < 100ms (CDN-backed).\n- Ingest: 10k records in < 5 minutes for incremental updates.\n- Uptime: 99.9% public read; 99.5% write.\n\n---\n\n## 21. Observability\n\n- **OpenTelemetry** SDKs in every service.\n- **Traces** → Grafana Tempo. Span every adapter call, validation, SHACL evaluation, LLM call.\n- **Metrics** → Prometheus + Grafana. Required dashboards:\n  - Ingest throughput / error rate / lag per source.\n  - Reconciliation accuracy per authority.\n  - API latency p50/p95/p99 per route.\n  - SPARQL query mix and timeouts.\n  - LLM call cost + cache hit rate.\n  - Cache hit ratios (CDN, Redis, OpenSearch).\n- **Logs** → Loki. Structured (JSON). Required fields: `trace_id`, `tenant`, `actor`, `route`, `entity_id`.\n- **Alerts**: paging on validation pass rate < 99%, ingest lag > 30 min, API error rate > 1%, vector store sync lag > 10 min.\n\n---\n\n## 22. Security & API Governance\n\n- OAuth2/OIDC via Keycloak. Roles: `admin`, `data_engineer`, `curator`, `educator`, `researcher`, `public`.\n- API keys for non-anonymous read tiers; HMAC-signed webhooks for ingest triggers.\n- Rate limits enforced at the gateway (§9.7).\n- SPARQL injection: queries pass through `rdflib`-based parser to reject `INSERT/DELETE/LOAD/CLEAR/DROP` for non-admin tokens.\n- WAF rules: block known LLM-prompt-injection patterns in query params (e.g., `ignore previous instructions`).\n- Secrets in Vault. Never in env files in repo.\n- CSP headers on the SPA. Strict `connect-src` to known authorities only.\n- IIIF Authorization 2.0 for restricted-tier images.\n\n---\n\n## 23. Testing & TDD Requirements\n\n### 23.1 Required coverage\n- **Adapters** (`src/adapters/*`) — normalization, rights, IIIF discovery, error fallbacks.\n- **Utils** (`src/utils/*`) — Linked Art helpers, date / dimension parsing, HTML `describedby` discovery, URL safety.\n- **Contracts** — `validate*` rejects malformed input, `create*` deterministically produces canonical shape.\n- **API** — golden responses for each route, JSON-LD validates against the SHACL shapes.\n- **SHACL** — every shape has at least one passing and one failing fixture.\n- **Frontend widgets** — Storybook + axe + visual regression.\n- **E2E** — Playwright: search → detail → IIIF deep-zoom → graph view; agent ingest dry-run; curator form submission.\n- **A11y** — axe-core in CI, no AA violations.\n- **Performance** — k6 smoke targeting the SLOs in §20.4 on every deploy.\n- **Security** — `npm audit` / `pip-audit`; Semgrep ruleset for prompt-injection patterns; ZAP baseline on staging.\n\n### 23.2 Existing convention (keep)\n- Node `node:test` + `node:assert`. No external test framework. Globs avoided — each test file listed in `package.json` `scripts.test`.\n\n### 23.3 Acceptance gates per merge\n- All tests green.\n- SHACL pass on fixtures.\n- a11y pass.\n- API contract diff reviewed (OpenAPI lint).\n- Loading / empty / error / success states in any new UI.\n- Source URL, provider, rights, attribution, import timestamp preserved on any record-touching code path.\n- Agent-facing features have documented input/output contracts.\n\n---\n\n## 24. Deployment Topology\n\n### 24.1 Local dev\n- `docker-compose.yml` brings up: Postgres, Redis, MinIO, GraphDB, OpenSearch, Cantaloupe, Keycloak, Dagster, the Node gateway, the Next.js frontend.\n- Sample CSV → live SPARQL endpoint within 5 minutes of `docker compose up`.\n\n### 24.2 Staging / Prod\n- Kubernetes. One namespace per environment.\n- Helm charts per service.\n- Blue/green for the API gateway. Canary for the SPA.\n- Backups: Postgres daily WAL + weekly snapshot; triple store dump weekly; OpenSearch snapshot daily to S3.\n- DR: cross-region S3 replication for raw + materialized records; RTO 4h, RPO 1h.\n\n### 24.3 Cost guidance\n- Smallest viable cloud footprint: DigitalOcean Kubernetes + GraphDB CE + managed Postgres + S3-compatible object storage. ~$50-200/mo for portfolio-scale.\n- Production at 10M+ records: budget for MarkLogic or Stardog licensing if SPARQL p95 matters.\n\n---\n\n## 25. Phased Build Plan (AI-agent-friendly)\n\nEach phase ships an independently demoable artifact. Coding agents should pick tasks from one phase at a time and never start a later phase until the prior phase's acceptance criteria pass.\n\n### Phase 0 — Foundations (Weeks 1-2)\n**Goal:** Lock the contracts and patterns. No new UI.\n\nTasks:\n1. Author SHACL shapes for every entity type in `shapes/linked-art/*.shacl.ttl`.\n2. Add Zod mirrors of every contract in `src/contracts/zod/*.ts`.\n3. Extend `src/contracts/` with `Entity`, `MappingTemplate`, `ReconciliationCandidate`, `ProvenanceEvent`, `Citation`, `WikiDraft`, `AgentTask`, `ValidationReport`, `PROVRecord` (§5.1).\n4. Wire PySHACL into a Python validation microservice exposed at `/api/validate`.\n5. Add a Dagster project skeleton with one asset graph for the Met Open Access source as the canary.\n6. Replace ad-hoc fetches in `server.js` with calls into `src/adapters/*` per the existing module-dependency rules.\n\nAcceptance: Every existing fixture validates; new contracts have `create*` + `validate*` + Zod; one Dagster run materializes 10 Met records end-to-end into PostgreSQL + OpenSearch + a triple store.\n\n### Phase 1 — Multi-modal Storage & HAL API (Weeks 3-5)\n**Goal:** Stand up the Yale-LUX-pattern hybrid store and HAL hypermedia.\n\nTasks:\n1. Provision Postgres + Redis + OpenSearch + GraphDB + MinIO via `docker-compose.yml`.\n2. Build `src/utils/record-materializer.{js,ts}` that produces the denormalized `Record` doc + shortcut triples per §20.1.\n3. Add `/api/objects/:id`, `/api/agents/:id`, `/api/places/:id`, `/api/concepts/:id`, `/api/events/:id` with HAL `_links`.\n4. Add OpenSearch indexing on every record materialization.\n5. Add `/api/search` with faceted query, ActivityStreams paged collection responses.\n6. Add `/api/activity` syndication endpoint (paged, monotonically increasing IDs).\n7. Add CDN-friendly cache headers + ETag.\n\nAcceptance: A single HAL link traversal can navigate from a search hit → object → agent → other works by agent → place of birth without the client constructing any SPARQL.\n\n### Phase 2 — Ingest Pipeline & Reconciliation (Weeks 6-9)\n**Goal:** Curators can ingest a CSV and see it reconciled and published.\n\nTasks:\n1. Implement Dagster assets for Extract → Profile → Map → Reconcile → Validate → Materialize → Index → Publish.\n2. Build the Reconciliation microservice (FastAPI) talking to Getty SPARQL, VIAF, Wikidata, GeoNames with Redis-backed URI cache.\n3. Add the human-review queue UI in the Curator Console (`/console/reconciliation`).\n4. Build the Visual ETL Mapper (ReactFlow) with `MappingTemplate` save / load / dry-run.\n5. Build the Form Builder with vocabulary-driven inputs.\n\nAcceptance: A user can upload `met_sample.csv`, define a mapping, hit Dry Run, see generated JSON-LD + SHACL, queue 50 records for review, approve, and see them in `/api/objects` with HAL links.\n\n### Phase 3 — Discovery UI & Visualizations (Weeks 10-13)\n**Goal:** Public discovery SPA, IIIF, timelines, maps, graph view.\n\nTasks:\n1. Migrate the existing `public/app.js` views into a Next.js 15 App Router scaffold (route-by-route — see §27). Keep `index.html` SPA fallback working.\n2. Build `@metamuseum/linked-art-ui` v0 with all atomics (§12.1) + ObjectCard + ActorCard + PlaceCard.\n3. Add `<IIIFCanvasViewer>`, `<ProvenanceTimeline>`, `<GeoMapViewer>`, `<NetworkGraph>`.\n4. Implement the \"complexity toggle\" pattern globally.\n5. Storybook + a11y + visual regression for every widget.\n\nAcceptance: A visitor can land on `/artwork/{id}`, see image / attribution / dates / materials / rights / provenance timeline / map / network graph without any custom per-domain logic in the page.\n\n### Phase 4 — AI Layer (Weeks 14-17)\n**Goal:** NL→Query, Graph-RAG chat, AI-assisted mapping.\n\nTasks:\n1. Build `/api/ai/query` (NL → SPARQL/HAL) with schema embeddings stored in pgvector.\n2. Build `/api/ai/chat` with graph-grounded RAG and mandatory citations.\n3. Surface a chat affordance in the Researcher Workspace and a \"Suggest mapping\" affordance in the Visual ETL Mapper.\n4. Add agent telemetry: every call logs intermediate SPARQL, result counts, refinement iterations, cost.\n\nAcceptance: A researcher asks \"Find works by French painters depicting Paris before 1900\" and gets a result set with citations resolvable to canonical records, plus the SPARQL the agent generated, plus an editable link.\n\n### Phase 5 — Syndication, Meta Wiki Art Bridge, Hardening (Weeks 18-20)\nTasks:\n1. Open the ActivityStreams subscriptions endpoint to external aggregators.\n2. Implement the Meta Wiki Art publish flow (WikiDraft → review → publish with citations).\n3. WCAG 2.1 AA audit on every public page.\n4. Pen test, k6 load test, DR drill.\n5. Public OpenAPI 3.1 docs at `/api/docs`.\n\nAcceptance: Uptime > 99.9% over a 30-day window with all SLOs in §20.4 met.\n\n---\n\n## 26. KPIs & Success Metrics\n\n- **Data quality**: ≥ 80% of imported records enriched with ≥ 3 authority URIs (ULAN, AAT, TGN/Wikidata/VIAF/GeoNames).\n- **Validation**: 100% pass rate on published records.\n- **Reconciliation**: ≥ 90% auto-approve at confidence ≥ 0.95 across the Met canary; ≥ 95% precision on human-reviewed samples.\n- **Performance**: §20.4 SLOs met at p95.\n- **UX**: 80% of users find the desired object within 3 clicks; 90% successful search rate.\n- **Adoption**: ≥ 3 external Linked Art systems consume the ActivityStreams feed; ≥ 50 MAU on the discovery portal in the first quarter post-launch.\n- **A11y**: zero AA violations.\n- **AI**: ≥ 70% of NL queries return non-empty grounded answers on the gold-set; cost per query < $0.01 at scale.\n\n---\n\n## 27. Migration from the Current Prototype\n\nCurrent state (per `README.md`, `CLAUDE.md`, `AGENTS.md`, `server.js`):\n- Monolithic Node.js HTTP server, no framework, no deps.\n- Vanilla-JS SPA in `public/`.\n- JSON file storage in `storage/`.\n- Adapters for Getty and Met. Contracts in `src/contracts/`.\n- Existing routes for `/api/explorer/*`, `/api/getty/*`, `/api/met/*`, `/api/linked-art/*`, `/api/patterns`, `/api/agents/run`, `/api/content/generate`, `/api/jobs`, `/api/issues`, etc.\n\nMigration rules:\n1. **Keep `server.js` alive** as the API surface until the gateway in §9 is ready. Do not delete working routes.\n2. **Add, don't replace**, new routes per Phase 1. The new HAL routes (`/api/objects/:id`, etc.) live alongside the existing ones. Public SPA gradually points at the new routes; the old ones stay for backward compatibility for at least one release.\n3. **Storage migration**: write a one-time exporter that reads `storage/records.json` + `storage/jobs.json` and emits Linked Art JSON-LD documents into the new materialization pipeline. The current `storage/` becomes the legacy canary fixture.\n4. **No Next.js until contracts are stable** — explicit rule from `README.md` line ~107. Migrate route-by-route once Phase 0 acceptance passes.\n5. **Provider adapter split**: continue extracting Getty + Met fetch/normalize logic out of `server.js` (already a stated next-build slice in `README.md`).\n6. **Tests**: every migrated path adds a `node:test` case under `tests/`. Add the file path explicitly to `package.json` `scripts.test`.\n\n### 27.1 Files / dirs to add (alongside what exists)\n```\nsrc/\n  contracts/\n    zod/                       # NEW — Zod mirrors of every contract\n    entity.js entity.zod.ts    # NEW\n    mapping-template.js\n    reconciliation-candidate.js\n    provenance-event.js\n    citation.js\n    wikidraft.js\n    agent-task.js\n    validation-report.js\n    prov-record.js\n  utils/\n    record-materializer.js     # NEW — Yale-LUX-pattern denormalizer\n    shortcut-triples.js        # NEW\n    hal.js                     # NEW — HAL link factory\n    activity-stream.js         # NEW — paged collection helper\n  adapters/\n    getty.js                   # existing\n    met.js                     # existing\n    harvard.js                 # NEW\n    smithsonian.js             # NEW\n    rijks.js                   # NEW\n    europeana.js               # NEW\n    british-museum.js          # NEW\n    aic.js                     # NEW\n    cma.js                     # NEW\n    wikidata.js                # NEW\n    viaf.js                    # NEW\nshapes/\n  linked-art/\n    object.shacl.ttl           # NEW\n    person.shacl.ttl\n    place.shacl.ttl\n    activity.shacl.ttl\n    digital-object.shacl.ttl\nfixtures/\n  linked-art/\n    pass/\n    fail/\npipeline/                      # NEW — Python / Dagster\n  pyproject.toml\n  dagster.yaml\n  assets/\n    extract.py\n    profile.py\n    map.py\n    reconcile.py\n    validate.py\n    materialize.py\n    index.py\n    publish.py\n  services/\n    validation_api/\n    reconciliation_api/\n    nl_query_agent/\n    rag_chat/\nweb/                           # NEW — Next.js 15 (when Phase 3 starts)\n  app/\n  components/\n  lib/\npackages/\n  linked-art-ui/               # NEW — shared component library\n    src/\n    stories/\n    package.json\nops/\n  docker-compose.yml\n  k8s/\n  helm/\ndocs/\n  architecture/\n  contracts/\n  shapes/\n  api/\n```\n\n### 27.2 Hard rules during migration\n- **Never** mutate `_source.raw`.\n- **Never** import across adapters (use `artwork-builder.js` as the bridge — already enforced).\n- **Never** create a sibling `web/` Next.js app until Phase 3 acceptance passes (per README).\n- **Always** keep contracts as leaf modules — no upstream deps.\n- **Always** add a test for any risky transform.\n- **Always** preserve loading / empty / error / success states on any new UI surface.\n\n---\n\n## 28. Risks & Mitigations\n\n| Risk | Likelihood | Impact | Mitigation |\n|---|---|---|---|\n| Legacy CMS data quality varies wildly | High | High | Three-tier confidence reconciliation (§7.3); human review queue; SHACL gate; explicit `unknown` rights state. |\n| CIDOC-CRM expertise gap | High | High | Form Builder + Visual ETL Mapper + NL→SPARQL hide complexity. Pedagogical scaffolding (Storytelling). |\n| External authorities change format / retire | Medium | Medium | Getty retires XML/Relational end-2025 — already using LOD. VIAF dumps frozen — fall back to Wikidata `P214`. Wrap every authority in an adapter with a feature flag. |\n| Pure SPARQL too slow at scale | Medium | High | Materialized `Record` docs + shortcut triples + aggressive CDN caching (§20). |\n| LLM hallucinations in Graph-RAG | High | High | Mandatory citations; \"cite or refuse\" rule; SHACL-validated SPARQL pre-execution; deterministic eval suite. |\n| Rights misclassification | Low | High | `unknown` is the default. Per-provider analyzers (existing). Image actions gated on `reuse_label`. |\n| Vendor lock-in (MarkLogic / Stardog) | Medium | Medium | Storage layer behind an interface; OSS fallback (GraphDB CE) tested in CI. |\n| IIIF server unavailable for partner institutions | Medium | Low | Adapter degrades gracefully to source-hosted image with attribution; manifest-on-the-fly fallback via `iiif-prezi3`. |\n| Prompt injection via NL queries | Medium | High | WAF rules; query sandbox; SPARQL-only execution path; no shell / file system tools in the discovery agent. |\n| Sustainability of links (URI rot) | High | Medium | Persistent identifier strategy; ARK or DOI for canonical records; CDN-cached `sameAs` clusters. |\n\n---\n\n## 29. Glossary\n\n- **AAT** — Art & Architecture Thesaurus (Getty).\n- **ActivityStreams** — W3C 2.0 spec for activity feeds; basis of IIIF Change Discovery.\n- **Arches** — Open-source heritage data platform (Getty Conservation Institute + WMF) used as the curator backend.\n- **BagIt** — IETF spec for hierarchical file packaging used for audit bundles.\n- **CIDOC-CRM 7.1** — ISO 21127, the formal ontology for cultural heritage, in its current RDFS-encoded version. Linked Art is a profile of it.\n- **Crom (cromulent)** — Python lib for Linked Art instance creation.\n- **FAST** — Faceted Application of Subject Terminology (OCLC); subject headings derived from LCSH.\n- **HAL** — Hypertext Application Language; `_links` envelope.\n- **Histopedia** — Interactive timeline visualization tool used for publication histories and overlapping exhibitions.\n- **Linked Art** — JSON-LD profile of CIDOC-CRM optimized for developer usability.\n- **LOUD** — Linked Open Usable Data (Yale LUX framing).\n- **Ogee** — Open-source Arches package that maps Arches data entry forms to compliant Linked Art JSON-LD output.\n- **PeriodO** — Authority for named historical periods and historical place names.\n- **PROV-O** — W3C Provenance Ontology.\n- **SHACL** — Shapes Constraint Language; W3C spec for RDF validation.\n- **Shortcut triple** — Materialized 1-hop predicate that collapses multi-hop CIDOC-CRM paths for query performance.\n- **TGN** — Thesaurus of Geographic Names (Getty).\n- **ULAN** — Union List of Artist Names (Getty).\n- **VIAF** — Virtual International Authority File (OCLC).\n- **Zelge** — Documentation-first configuration system. Declares the canonical mapping from each UI form field to its CIDOC-CRM semantic path; consumed by form generator, validator, and JSON-LD emitter so all three stay in sync.\n\n---\n\n## 30. Open Questions (decide before Phase 0 ships)\n\n1. **Primary triple store** — MarkLogic (paid, Yale-grade) vs GraphDB CE (OSS) vs Stardog vs Fuseki? Default recommendation: **GraphDB CE** for the OSS path until p95 SLOs demand otherwise.\n2. **Vector store** — pgvector (simplest) vs Qdrant (best perf) vs Weaviate (graph-aware)? Default: **pgvector** until > 5M embeddings.\n3. **Frontend styling** — Tailwind 4 vs Chakra UI 3? Default: **Tailwind 4** for new work; respect existing vanilla styles in `public/`.\n3a. **Search engine** — Solr 9 vs OpenSearch 2 vs Elasticsearch 8? Default for LUX alignment: **Solr 9** (best nested-document faceting). Default for ops familiarity: **OpenSearch 2**.\n3b. **Curator backend** — Adopt full Arches + Ogee + Zelge, or build a lighter NestJS/Next.js curator console that emits the same JSON-LD? Default: **Arches + Ogee + Zelge** if a curator team exists; **light console** if dev-only.\n4. **Orchestrator** — Dagster vs Airflow? Default: **Dagster** for asset-centric semantics + lineage UI.\n5. **Mono-repo or multi-repo?** Recommendation: **mono-repo** with `pnpm` workspaces + `turbo` for the Node/TS side, plus a `pipeline/` Python subtree using `uv` + `pyproject.toml`.\n6. **License of new code** — match existing repo LICENSE.\n7. **ID scheme** — ULID + namespaced URIs (`https://lod.metamuseum.org/{type}/{ulid}`) — confirm domain.\n8. **Meta Wiki Art** namespace + citation template + agent approval gates — needs a separate spec doc (see `LinkedArtCollaboration.md` for context).\n\nWhen any of the above is decided, update §3.4 / §3.5 / §11 accordingly and remove the question from this list.\n\n---\n\n*End of master spec. Linked context: `LinkedArtPRD.md` · `LinkedArtAppFeatures.md` · `YaleLux.md` · `LODPipeline.md` · `LODEngine.md` · `LODTools.md` · `Vocabularies.md` · `SPARQL.md` · `LinkedArtPatterns.md` · `LinkedArtDashboard.md` · `LinkedArtRoadmap.md` · `LinkedArtWidgets.md` · `LinkedArtChallenges.md` · `LinkedArtSaaS.md` · `LinkedArtUnmetNeeds.md` · `LinkedArtUseCases.md` · `LinkedArtJobReady.md` · `LinkedArtSoftware.md` · `LinkedArtCollaboration.md` · `LinkedArtFeatureRoadmap.md` · `LinkedDesign.md` · `Linked Open Art Data Web App - Must-have Data Sources.md` · `Linked Art Notes.md`.*\n","sections":[{"level":2,"heading":"0. TL;DR for Coding Agents","anchor":"0-tl-dr-for-coding-agents"},{"level":2,"heading":"1. Product Vision & Scope","anchor":"1-product-vision-scope"},{"level":3,"heading":"1.1 Mission","anchor":"1-1-mission"},{"level":3,"heading":"1.2 In-Scope (MVP → v1)","anchor":"1-2-in-scope-mvp-v1"},{"level":3,"heading":"1.3 Out-of-Scope (explicitly)","anchor":"1-3-out-of-scope-explicitly"},{"level":3,"heading":"1.4 Scale Targets (Yale LUX-derived)","anchor":"1-4-scale-targets-yale-lux-derived"},{"level":3,"heading":"1.5 Personas (build-driving)","anchor":"1-5-personas-build-driving"},{"level":2,"heading":"2. Reference Architecture (Yale-LUX-derived, modernized)","anchor":"2-reference-architecture-yale-lux-derived-modernized"},{"level":3,"heading":"2.1 Architectural Invariants","anchor":"2-1-architectural-invariants"},{"level":2,"heading":"3. Tech Stack (concrete, version-pinned)","anchor":"3-tech-stack-concrete-version-pinned"},{"level":3,"heading":"3.1 Languages & Runtimes","anchor":"3-1-languages-runtimes"},{"level":3,"heading":"3.2 Frontend","anchor":"3-2-frontend"},{"level":3,"heading":"3.3 Backend Services","anchor":"3-3-backend-services"},{"level":3,"heading":"3.4 Storage","anchor":"3-4-storage"},{"level":3,"heading":"3.5 Pipeline & Eventing","anchor":"3-5-pipeline-eventing"},{"level":3,"heading":"3.6 AI / LLM Stack","anchor":"3-6-ai-llm-stack"},{"level":3,"heading":"3.7 IIIF","anchor":"3-7-iiif"},{"level":3,"heading":"3.8 Auth, Ops, CI/CD","anchor":"3-8-auth-ops-ci-cd"},{"level":2,"heading":"4. Linked Art Patterns (the canonical 6)","anchor":"4-linked-art-patterns-the-canonical-6"},{"level":3,"heading":"4.1 Activity / Event Pattern","anchor":"4-1-activity-event-pattern"},{"level":3,"heading":"4.2 Classification Pattern","anchor":"4-2-classification-pattern"},{"level":3,"heading":"4.3 Core Properties Pattern","anchor":"4-3-core-properties-pattern"},{"level":3,"heading":"4.4 Statement Pattern","anchor":"4-4-statement-pattern"},{"level":3,"heading":"4.5 TimeSpan Pattern","anchor":"4-5-timespan-pattern"},{"level":3,"heading":"4.6 Expansion Pattern","anchor":"4-6-expansion-pattern"},{"level":3,"heading":"4.7 Uncertainty & Contested Attributions","anchor":"4-7-uncertainty-contested-attributions"},{"level":3,"heading":"4.8 Multilingual Names","anchor":"4-8-multilingual-names"},{"level":3,"heading":"4.9 The 10 Primary Entity Boundaries (hard rule)","anchor":"4-9-the-10-primary-entity-boundaries-hard-rule"},{"level":3,"heading":"4.10 Anti-Patterns (forbidden)","anchor":"4-10-anti-patterns-forbidden"},{"level":3,"heading":"4.11 Human vs Machine Dates","anchor":"4-11-human-vs-machine-dates"},{"level":2,"heading":"5. Runtime Contracts (extend the existing `src/contracts/`)","anchor":"5-runtime-contracts-extend-the-existing-src-contracts"},{"level":3,"heading":"5.1 Required contracts","anchor":"5-1-required-contracts"},{"level":3,"heading":"5.2 Contract rules","anchor":"5-2-contract-rules"},{"level":3,"heading":"5.3 Backward-compatibility migration","anchor":"5-3-backward-compatibility-migration"},{"level":2,"heading":"6. Vocabularies & Authority Files (URI cheat sheet)","anchor":"6-vocabularies-authority-files-uri-cheat-sheet"},{"level":3,"heading":"6.1 Reconciliation routing rules","anchor":"6-1-reconciliation-routing-rules"},{"level":3,"heading":"6.2 Local-cache rule (performance + reliability)","anchor":"6-2-local-cache-rule-performance-reliability"},{"level":3,"heading":"6.3 Graph Crawling Enrichment (hierarchy generation)","anchor":"6-3-graph-crawling-enrichment-hierarchy-generation"},{"level":2,"heading":"7. ETL Pipeline (Extract → Load → Transform → Reconcile → Publish)","anchor":"7-etl-pipeline-extract-load-transform-reconcile-publish"},{"level":3,"heading":"7.1 Pipeline DAG (Dagster assets)","anchor":"7-1-pipeline-dag-dagster-assets"},{"level":3,"heading":"7.2 ID minting policy","anchor":"7-2-id-minting-policy"},{"level":3,"heading":"7.3 Confidence thresholds","anchor":"7-3-confidence-thresholds"},{"level":3,"heading":"7.4 Synchronization","anchor":"7-4-synchronization"},{"level":3,"heading":"7.5 Error handling","anchor":"7-5-error-handling"},{"level":2,"heading":"8. Storage Strategy (multi-modal)","anchor":"8-storage-strategy-multi-modal"},{"level":3,"heading":"8.1 The \"Record\" abstraction (load-bearing)","anchor":"8-1-the-record-abstraction-load-bearing"},{"level":3,"heading":"8.2 Triple store","anchor":"8-2-triple-store"},{"level":3,"heading":"8.3 Search index","anchor":"8-3-search-index"},{"level":3,"heading":"8.4 Vector index","anchor":"8-4-vector-index"},{"level":3,"heading":"8.5 Operational state (PostgreSQL)","anchor":"8-5-operational-state-postgresql"},{"level":2,"heading":"9. API Design","anchor":"9-api-design"},{"level":3,"heading":"9.1 Endpoint shape (REST + JSON-LD + HAL)","anchor":"9-1-endpoint-shape-rest-json-ld-hal"},{"level":3,"heading":"9.2 HAL conventions","anchor":"9-2-hal-conventions"},{"level":3,"heading":"9.3 Content negotiation","anchor":"9-3-content-negotiation"},{"level":3,"heading":"9.4 Pagination","anchor":"9-4-pagination"},{"level":3,"heading":"9.5 Graph expansion","anchor":"9-5-graph-expansion"},{"level":3,"heading":"9.5.1 Progressive enhancement principle","anchor":"9-5-1-progressive-enhancement-principle"},{"level":3,"heading":"9.6 GraphQL","anchor":"9-6-graphql"},{"level":3,"heading":"9.7 Security & governance","anchor":"9-7-security-governance"},{"level":2,"heading":"10. AI / LLM Layer","anchor":"10-ai-llm-layer"},{"level":3,"heading":"10.1 NL → SPARQL / NL → HAL agent","anchor":"10-1-nl-sparql-nl-hal-agent"},{"level":3,"heading":"10.2 Graph-RAG chat","anchor":"10-2-graph-rag-chat"},{"level":3,"heading":"10.3 LLM-assisted mapping","anchor":"10-3-llm-assisted-mapping"},{"level":3,"heading":"10.4 LLM-assisted reconciliation","anchor":"10-4-llm-assisted-reconciliation"},{"level":3,"heading":"10.5 Safety","anchor":"10-5-safety"},{"level":2,"heading":"11. Frontend Architecture","anchor":"11-frontend-architecture"},{"level":3,"heading":"11.1 App shell","anchor":"11-1-app-shell"},{"level":3,"heading":"11.2 Routing","anchor":"11-2-routing"},{"level":3,"heading":"11.3 Data fetching","anchor":"11-3-data-fetching"},{"level":3,"heading":"11.4 Accessibility","anchor":"11-4-accessibility"},{"level":3,"heading":"11.5 Internationalization","anchor":"11-5-internationalization"},{"level":2,"heading":"12. UI Widget Catalog (`@metamuseum/linked-art-ui`)","anchor":"12-ui-widget-catalog-metamuseum-linked-art-ui"},{"level":3,"heading":"12.1 Atomic","anchor":"12-1-atomic"},{"level":3,"heading":"12.2 Entity Cards (preview)","anchor":"12-2-entity-cards-preview"},{"level":3,"heading":"12.3 Visualizations","anchor":"12-3-visualizations"},{"level":3,"heading":"12.4 Metadata & Context","anchor":"12-4-metadata-context"},{"level":3,"heading":"12.5 Participatory","anchor":"12-5-participatory"},{"level":3,"heading":"12.6 Implementation rules","anchor":"12-6-implementation-rules"},{"level":2,"heading":"13. IIIF Integration","anchor":"13-iiif-integration"},{"level":2,"heading":"14. ActivityStreams Syndication","anchor":"14-activitystreams-syndication"},{"level":2,"heading":"15. Search & Discovery","anchor":"15-search-discovery"},{"level":3,"heading":"15.1 Layers","anchor":"15-1-layers"},{"level":3,"heading":"15.2 Autocomplete","anchor":"15-2-autocomplete"},{"level":3,"heading":"15.3 \"Follow your nose\"","anchor":"15-3-follow-your-nose"},{"level":3,"heading":"15.4 Complexity toggle (Art Tracks model)","anchor":"15-4-complexity-toggle-art-tracks-model"},{"level":2,"heading":"16. Curator & Researcher Tooling","anchor":"16-curator-researcher-tooling"},{"level":3,"heading":"16.1 Visual ETL Mapper","anchor":"16-1-visual-etl-mapper"},{"level":3,"heading":"16.2 Linked Art Form Builder (Arches / Ogee-based)","anchor":"16-2-linked-art-form-builder-arches-ogee-based"},{"level":3,"heading":"16.3 Dynamic Graph Explorer","anchor":"16-3-dynamic-graph-explorer"},{"level":3,"heading":"16.4 Sparnatural-style Visual SPARQL Builder","anchor":"16-4-sparnatural-style-visual-sparql-builder"},{"level":3,"heading":"16.5 Validation Console","anchor":"16-5-validation-console"},{"level":2,"heading":"17. Validation Stack","anchor":"17-validation-stack"},{"level":3,"heading":"17.1 Layers","anchor":"17-1-layers"},{"level":3,"heading":"17.2 Authoring","anchor":"17-2-authoring"},{"level":3,"heading":"17.3 Acceptance","anchor":"17-3-acceptance"},{"level":2,"heading":"18. Provenance & PROV-O","anchor":"18-provenance-prov-o"},{"level":2,"heading":"19. Rights & Licensing","anchor":"19-rights-licensing"},{"level":3,"heading":"19.1 Required fields","anchor":"19-1-required-fields"},{"level":3,"heading":"19.2 UI rules","anchor":"19-2-ui-rules"},{"level":3,"heading":"19.3 Existing helpers (keep)","anchor":"19-3-existing-helpers-keep"},{"level":2,"heading":"20. Performance Engineering","anchor":"20-performance-engineering"},{"level":3,"heading":"20.1 Shortcut triples (Yale LUX)","anchor":"20-1-shortcut-triples-yale-lux"},{"level":3,"heading":"20.2 Caching layers","anchor":"20-2-caching-layers"},{"level":3,"heading":"20.3 Pre-computation","anchor":"20-3-pre-computation"},{"level":3,"heading":"20.4 Targets","anchor":"20-4-targets"},{"level":2,"heading":"21. Observability","anchor":"21-observability"},{"level":2,"heading":"22. Security & API Governance","anchor":"22-security-api-governance"},{"level":2,"heading":"23. Testing & TDD Requirements","anchor":"23-testing-tdd-requirements"},{"level":3,"heading":"23.1 Required coverage","anchor":"23-1-required-coverage"},{"level":3,"heading":"23.2 Existing convention (keep)","anchor":"23-2-existing-convention-keep"},{"level":3,"heading":"23.3 Acceptance gates per merge","anchor":"23-3-acceptance-gates-per-merge"},{"level":2,"heading":"24. Deployment Topology","anchor":"24-deployment-topology"},{"level":3,"heading":"24.1 Local dev","anchor":"24-1-local-dev"},{"level":3,"heading":"24.2 Staging / Prod","anchor":"24-2-staging-prod"},{"level":3,"heading":"24.3 Cost guidance","anchor":"24-3-cost-guidance"},{"level":2,"heading":"25. Phased Build Plan (AI-agent-friendly)","anchor":"25-phased-build-plan-ai-agent-friendly"},{"level":3,"heading":"Phase 0 — Foundations (Weeks 1-2)","anchor":"phase-0-foundations-weeks-1-2"},{"level":3,"heading":"Phase 1 — Multi-modal Storage & HAL API (Weeks 3-5)","anchor":"phase-1-multi-modal-storage-hal-api-weeks-3-5"},{"level":3,"heading":"Phase 2 — Ingest Pipeline & Reconciliation (Weeks 6-9)","anchor":"phase-2-ingest-pipeline-reconciliation-weeks-6-9"},{"level":3,"heading":"Phase 3 — Discovery UI & Visualizations (Weeks 10-13)","anchor":"phase-3-discovery-ui-visualizations-weeks-10-13"},{"level":3,"heading":"Phase 4 — AI Layer (Weeks 14-17)","anchor":"phase-4-ai-layer-weeks-14-17"},{"level":3,"heading":"Phase 5 — Syndication, Meta Wiki Art Bridge, Hardening (Weeks 18-20)","anchor":"phase-5-syndication-meta-wiki-art-bridge-hardening-weeks-18-20"},{"level":2,"heading":"26. KPIs & Success Metrics","anchor":"26-kpis-success-metrics"},{"level":2,"heading":"27. Migration from the Current Prototype","anchor":"27-migration-from-the-current-prototype"},{"level":3,"heading":"27.1 Files / dirs to add (alongside what exists)","anchor":"27-1-files-dirs-to-add-alongside-what-exists"},{"level":3,"heading":"27.2 Hard rules during migration","anchor":"27-2-hard-rules-during-migration"},{"level":2,"heading":"28. Risks & Mitigations","anchor":"28-risks-mitigations"},{"level":2,"heading":"29. Glossary","anchor":"29-glossary"},{"level":2,"heading":"30. Open Questions (decide before Phase 0 ships)","anchor":"30-open-questions-decide-before-phase-0-ships"}],"html":"<h1 id=\"linkedart-sota-web-app-master-build-specification\">LinkedArt SOTA Web App — Master Build Specification</h1>\n<blockquote><strong>Audience:</strong> AI coding agents (Claude Code, Codex, Cursor, Aider) and human engineers building the next-generation Meta Museum / Linked Art platform.</blockquote>\n<blockquote><strong>Status:</strong> Authoritative consolidation of every file in `LinkedArt/`. Treat this as the single source of truth. When this document conflicts with the older `LinkedArt*.md` notes, this document wins. When this document conflicts with `CLAUDE.md` / `AGENTS.md` (repo-level conventions), the repo-level conventions win for code-style only — the architectural targets here are aspirational and supersede the current monolithic `server.js` prototype over time.</blockquote>\n<blockquote><strong>Reading time budget:</strong> ~45 min. Granular intentionally. Skim §0 → §3 → §27 first; everything else is reference depth.</blockquote>\n<p>---</p>\n<h2 id=\"0-tl-dr-for-coding-agents\">0. TL;DR for Coding Agents</h2>\n<p>You are building a <strong>multi-modal, event-centric, hypermedia-driven cultural-heritage discovery platform</strong> with the following non-negotiables:</p>\n<ol><li><strong>Linked Art JSON-LD</strong> is the canonical data layer. Everything serializes to / from it.</li></ol>\n<ol><li><strong>CIDOC-CRM 7.1 (RDFS encoding)</strong> + Linked Art profile. Event-centric — never collapse production / acquisition / exhibition into flat fields.</li></ol>\n<ol><li><strong>Hybrid storage</strong>: pre-materialized <strong>JSON-LD &quot;Record&quot; documents</strong> + RDF triple index + full-text/faceted index (Solr or OpenSearch) + vector index. Translating SPARQL graph queries into faceted UI results is non-trivial and requires this dual approach. (Yale LUX pattern — see §8.)</li></ol>\n<ol><li><strong>HAL hypermedia API</strong> — clients follow named `_links`, they do not construct SPARQL. API design is <strong>progressive enhancement</strong>: simple, human-readable JSON-LD payloads first; full machine-readable graph network on opt-in.</li></ol>\n<ol><li><strong>Shortcut triples</strong> materialized at ingest time so runtime queries are single-hop. (See §20.1.)</li></ol>\n<ol><li><strong>Rights-aware by default</strong> — every record, image, and field carries reuse status (RightsStatements.org + CC).</li></ol>\n<ol><li><strong>Component library is data-pattern-driven</strong>, not domain-typed. A `&lt;LinkedDate&gt;` renders for a fossil, a painting, or a book identically.</li></ol>\n<ol><li><strong>Reconciliation against Getty AAT/ULAN/TGN, PeriodO, VIAF, Wikidata, GeoNames, FAST, LoC</strong> is part of the ingest path, not an afterthought. External authority dumps are downloaded and cached locally — never queried on the fly at request time.</li></ol>\n<ol><li><strong>NL → SPARQL / NL → HAL</strong> with grounded Graph-RAG is the primary discovery affordance for non-technical users. Visual SPARQL builders (Sparnatural-style) are the secondary affordance.</li></ol>\n<ol><li><strong>WCAG 2.1 AA</strong>, i18n via `language`-tagged `Name` objects, and 99.9% uptime are gates, not stretch goals.</li></ol>\n<ol><li><strong>10 primary entity boundaries, hard.</strong> 5 main: `HumanMadeObject`, `LinguisticObject`, `VisualItem`, `DigitalObject`, `Set`. 5 connected/authority: `Person`, `Group`, `Place`, `Type` (Concepts), `Activity` (macro-events incl. provenance, destruction). (See §4.9.)</li></ol>\n<ol><li><strong>Modeling anti-patterns are forbidden.</strong> No multiple instantiation (use partitioning into sub-activities — see §4.10). No blank nodes — every entity gets a minted URI. Always separate carrier (HumanMadeObject) from content (VisualItem / LinguisticObject) from surrogate (DigitalObject).</li></ol>\n<p>Existing repo (`server.js`, `src/adapters/getty.js`, `src/adapters/met.js`, `public/app.js`) is the <strong>MVP prototype</strong>. Do <strong>not</strong> prematurely migrate to Next.js (see `README.md` line ~107). Migrate route-by-route only once the contracts in §5 are stable.</p>\n<p>---</p>\n<h2 id=\"1-product-vision-scope\">1. Product Vision &amp; Scope</h2>\n<h3 id=\"1-1-mission\">1.1 Mission</h3>\n<p>Unify cultural-heritage data across museums, libraries, archives, and natural-history collections into a single discoverable, queryable, and machine-actionable graph using the Linked Art profile of CIDOC-CRM. Cover ~90% of common use cases with ~10% of the ontology complexity.</p>\n<h3 id=\"1-2-in-scope-mvp-v1\">1.2 In-Scope (MVP → v1)</h3>\n<ul><li>Ingest from JSON APIs, CSV, XML, MARC, IIIF manifests.</li><li>Normalize to Linked Art JSON-LD with strict pattern conformance.</li><li>Reconcile entities to Getty AAT/ULAN/TGN, Wikidata, VIAF, GeoNames, LoC.</li><li>Public discovery UI: faceted search, entity hubs, timelines, maps, network graphs.</li><li>Curator tools: form-based editor, visual ETL mapper, graph debugger.</li><li>Public read API: REST + JSON-LD + HAL + SPARQL + GraphQL.</li><li>AI layer: NL→Query, Graph-RAG chat with verifiable CIDOC-CRM citations.</li><li>IIIF integration: deep-zoom, side-by-side compare, IIIF Change Discovery sync.</li><li>ActivityStreams 2.0 syndication for downstream aggregators.</li></ul>\n<h3 id=\"1-3-out-of-scope-explicitly\">1.3 Out-of-Scope (explicitly)</h3>\n<ul><li>Replacing institutional CMS (TMS, EMu, Adlib, Qi, CollectionSpace). This is a <strong>downstream publication layer</strong>.</li><li>Image hosting — defer to IIIF Image API servers (Cantaloupe, IIPImage, Loris).</li><li>Native audio/video ingestion in v1 (metadata only).</li><li>Full CIDOC-CRM implementation. Linked Art profile only.</li><li>Upstream collection management workflows.</li><li>Full ML model training (use off-shelf embeddings &amp; fine-tunes).</li></ul>\n<h3 id=\"1-4-scale-targets-yale-lux-derived\">1.4 Scale Targets (Yale LUX-derived)</h3>\n<ul><li><strong>Phase 3 target</strong>: ~15M metadata records.</li><li><strong>Phase 5 target</strong>: ~45M total entities (records × authority + reconciled sameAs clusters).</li><li><strong>Reference production sizing</strong>: 3× 16-vCPU nodes (AWS m6i.4xlarge equivalent) for the storage tier delivers sub-second p95 for the common query set. Scale horizontally; the storage tier is the bottleneck, not the API gateway.</li><li>Triple count at the upper bound approaches ~2B triples once shortcut predicates (§20.1) are materialized — plan storage IOPS accordingly.</li></ul>\n<h3 id=\"1-5-personas-build-driving\">1.5 Personas (build-driving)</h3>\n<p>| Persona | Primary need | Tool surface |</p>\n<p>|---|---|---|</p>\n<p>| Museum Curator / Cataloger | Publish authoritative metadata without writing JSON-LD | Form Builder, Visual ETL Mapper |</p>\n<p>| Digital Humanities Researcher | Cross-collection graph queries, provenance chains | Sparnatural-style builder, NL→SPARQL, Notebook exports |</p>\n<p>| Software Developer | Predictable REST + JSON-LD APIs | HAL-linked endpoints, GraphQL, OpenAPI/SHACL contracts |</p>\n<p>| Public Visitor / Student | Visual exploration, no graph literacy required | Discovery SPA, IIIF deep-zoom, timelines, storytelling |</p>\n<p>| AI Agent (claude/codex) | Programmatic ingest, validation, content generation | Agent endpoints, contract-validated I/O, dry-run modes |</p>\n<p>---</p>\n<h2 id=\"2-reference-architecture-yale-lux-derived-modernized\">2. Reference Architecture (Yale-LUX-derived, modernized)</h2>\n<pre><code>\n            ┌──────────────────────────────────────────────────────┐\n            │  External Sources                                    │\n            │  Getty REST/IIIF/SPARQL · Met Open Access · Harvard  │\n            │  SAAM · Walters · YCBA · Rijks · Europeana · British │\n            │  Museum · AIC · V&amp;A · Tate · MoMA · CMA · LoC ·      │\n            │  Wikidata · DBpedia · VIAF · GeoNames · PeriodO      │\n            └────────────┬─────────────────────────────────────────┘\n                         │ IIIF Change Discovery (W3C ActivityStreams 2.0)\n                         │ + REST polling + SPARQL federation\n                         ▼\n            ┌──────────────────────────────────────────────────────┐\n            │  Ingestion (Python)                                  │\n            │  Dagster / Airflow / Temporal orchestrator           │\n            │  ├─ Extract  → MinIO/S3 raw lake (immutable)         │\n            │  ├─ Profile  → Great Expectations + spaCy            │\n            │  ├─ Map      → Crom / PyLD / custom mappers          │\n            │  ├─ Reconcile→ RecordLinkage + sentence-transformers │\n            │  └─ Materialize &quot;Record&quot; + shortcut triples          │\n            │  State: PostgreSQL (jobs/lineage) + Redis (URI cache)│\n            └────────────┬─────────────────────────────────────────┘\n                         │ JSON-LD records + N-Quads\n                         ▼\n            ┌──────────────────────────────────────────────────────┐\n            │  Multi-Modal Storage                                 │\n            │  ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐  │\n            │  │MarkLogic│ │  Triple  │ │Elastic / │ │ pgvector │  │\n            │  │  -OR-   │ │  store   │ │OpenSearch│ │/Qdrant/  │  │\n            │  │ GraphDB │ │ (Fuseki/ │ │  (text + │ │ Weaviate │  │\n            │  │  +PG    │ │  Stardog)│ │  facet)  │ │ (vectors)│  │\n            │  └─────────┘ └──────────┘ └──────────┘ └──────────┘  │\n            │  All four stay in sync via Kafka/NATS event bus      │\n            └────────────┬─────────────────────────────────────────┘\n                         │ Static JSON-LD docs (CDN-cacheable)\n                         │ HAL _links · SPARQL · GraphQL · ActivityStreams\n                         ▼\n            ┌──────────────────────────────────────────────────────┐\n            │  API Gateway (Kong / Tyk / Cloudflare)               │\n            │  · OAuth2/OIDC (Keycloak) · API keys · rate-limit    │\n            │  · CORS · OpenAPI 3.1 · WAF · request tracing        │\n            └────────────┬─────────────────────────────────────────┘\n                         ▼\n            ┌──────────────────────────────────────────────────────┐\n            │  Application Services                                │\n            │  · Discovery API     (Node/NestJS or Python/FastAPI) │\n            │  · NL→Query Agent    (LLM + schema embeddings)       │\n            │  · Graph-RAG Chat    (vector + triple + citations)   │\n            │  · Validation API    (PySHACL + JSON Schema)         │\n            │  · Reconciliation API(Getty / VIAF / Wikidata)       │\n            │  · IIIF Proxy        (Cantaloupe + auth)             │\n            │  · ActivityStreams   (publishing endpoint)           │\n            └────────────┬─────────────────────────────────────────┘\n                         ▼\n            ┌──────────────────────────────────────────────────────┐\n            │  Frontends (Next.js 15 App Router + TS)              │\n            │  · Public SPA (discovery, IIIF, timelines, maps)     │\n            │  · Curator Console (Form Builder, ETL Mapper)        │\n            │  · Researcher Workspace (Graph Explorer, NB exports) │\n            │  · Agent Workspace (job runner, validation reports)  │\n            │  · Meta Wiki Art bridge (MediaWiki publish layer)    │\n            │  Module-federated micro-frontends, shared component  │\n            │  library (`@metamuseum/linked-art-ui`)               │\n            └──────────────────────────────────────────────────────┘\n</code></pre>\n<h3 id=\"2-1-architectural-invariants\">2.1 Architectural Invariants</h3>\n<ol><li><strong>Adapters do not import each other.</strong> Bridge via `src/utils/artwork-builder.js` (already enforced — keep it).</li></ol>\n<ol><li><strong>Contracts are leaf modules.</strong> No deps on adapters or utils.</li></ol>\n<ol><li><strong>Provider logic stays in adapters.</strong> `server.js` (or future API gateway) only orchestrates.</li></ol>\n<ol><li><strong>Source snapshots are immutable.</strong> `_source.raw` preserves the original provider response — transform at read time, never mutate stored records in place.</li></ol>\n<ol><li><strong>Linked Art JSON-LD is canonical.</strong> UI DTOs (`Artwork`) are separate; map at the boundary.</li></ol>\n<ol><li><strong>Every write path is idempotent and reprocessable from any stage.</strong> No destructive transformations without a recoverable upstream artifact in MinIO/S3.</li></ol>\n<p>---</p>\n<h2 id=\"3-tech-stack-concrete-version-pinned\">3. Tech Stack (concrete, version-pinned)</h2>\n<h3 id=\"3-1-languages-runtimes\">3.1 Languages &amp; Runtimes</h3>\n<ul><li><strong>Node.js ≥ 22 LTS</strong> — public API gateway and lightweight services (continuity with current `server.js`).</li><li><strong>Python 3.12+</strong> — ETL, reconciliation, validation, LLM tooling.</li><li><strong>TypeScript 5.4+</strong> — all new frontend and any new TS services. `strict: true` non-negotiable.</li></ul>\n<h3 id=\"3-2-frontend\">3.2 Frontend</h3>\n<ul><li><strong>Next.js 15+</strong> (App Router, RSC). SSR for entity pages, SSG for vocabularies, ISR for facets.</li><li><strong>React 19</strong>.</li><li><strong>Styling</strong>: Tailwind CSS 4 (preferred) <strong>or</strong> Chakra UI 3. Pick one per project, do not mix.</li><li><strong>State</strong>: TanStack Query 5 for server state; Zustand for local UI state. Avoid Redux for v1.</li><li><strong>Forms</strong>: React Hook Form + Zod schemas mirroring the contracts in §5.</li><li><strong>Visualization</strong>:</li><li>`d3` 7.x for bespoke chart primitives.</li><li>`@visx/*` for composable charts.</li><li>`cytoscape` + `cytoscape-cose-bilkent` for force-directed graphs.</li><li>`reactflow` 11.x for the Visual ETL Mapper drag-and-drop canvas.</li><li>`react-leaflet` 4.x + `leaflet` 1.9.x (or `mapbox-gl` 3.x) for maps. Prefer Leaflet for no-account-required dev.</li><li>`openseadragon` 4.x for IIIF deep-zoom; wrap in `&lt;IIIFCanvasViewer&gt;`.</li><li>`vis-timeline` 7.x or `@nivo/timeline` for chronologies.</li><li><strong>IIIF helpers</strong>: `manifesto.js`, `@iiif/presentation-3` types.</li><li><strong>Testing</strong>: Vitest + Testing Library + Playwright for e2e + `axe-core` for a11y.</li></ul>\n<h3 id=\"3-3-backend-services\">3.3 Backend Services</h3>\n<ul><li><strong>Python</strong>: FastAPI 0.110+, Pydantic v2, uvicorn, httpx, tenacity, structlog.</li><li><strong>Linked Art libs</strong>: `cromulent` (Crom) for instantiation, `PyLD` for JSON-LD framing/compaction/expansion, `rdflib` 7.x for triple manipulation, `pyshacl` for SHACL.</li><li><strong>Node</strong>: NestJS 10 (if you choose Node for the gateway) or stay vanilla-Node per current `server.js` conventions until contracts are stable.</li></ul>\n<h3 id=\"3-4-storage\">3.4 Storage</h3>\n<ul><li><strong>Primary</strong>: choose one of:</li><li><strong>MarkLogic</strong> (commercial, Yale LUX-grade — multi-model out of box, CTS query language for hybrid text+graph).</li><li><strong>GraphDB</strong> (Ontotext, free Community Edition, SPARQL 1.1 + Lucene plugin) — <strong>recommended for OSS path</strong>.</li><li><strong>Stardog</strong> (commercial, strong reasoning).</li><li><strong>Apache Jena Fuseki + Lucene</strong> for pure-OSS minimal setups.</li><li><strong>Document store</strong> (if NOT MarkLogic): <strong>PostgreSQL 16 + JSONB</strong> for materialized `Record` documents and operational state (jobs, lineage, RBAC).</li><li><strong>Search / facets</strong>: <strong>Apache Solr 9.x</strong> (the Yale LUX choice, strongest JSON-LD-friendly faceting + nested document support), <strong>OpenSearch 2.x</strong>, or <strong>Elasticsearch 8.x</strong>. Stores `Record`-derived flattened search docs. Pick one per environment. Default for the OSS LUX-aligned path: <strong>Solr 9.x</strong>.</li><li><strong>Vectors</strong>: <strong>pgvector</strong> (simple) <strong>or</strong> <strong>Qdrant</strong> / <strong>Weaviate</strong> (scale). Stores embeddings for `LinguisticObject` text, `VisualItem` image descriptors, and entity summaries.</li><li><strong>Cache</strong>: <strong>Redis 7</strong> — URI concordance (auth-string → URI), facet results, HAL link cache.</li><li><strong>Object storage</strong>: <strong>MinIO</strong> (dev/self-host) or <strong>S3</strong> for raw extracts, ETL outputs, image originals, BagIt audit bundles.</li></ul>\n<h3 id=\"3-5-pipeline-eventing\">3.5 Pipeline &amp; Eventing</h3>\n<ul><li><strong>Orchestrator</strong>: <strong>Dagster</strong> (preferred — asset-centric, strong lineage UI) <strong>or</strong> Apache Airflow 2.9+. For long-running curator workflows that need durable retries, <strong>Temporal</strong>.</li><li><strong>Stream</strong>: <strong>Kafka</strong> (Confluent or Redpanda) or <strong>NATS JetStream</strong> for ingest → reconcile → publish events.</li><li><strong>Validation</strong>: PySHACL + Great Expectations.</li><li><strong>Reconciliation</strong>: `recordlinkage`, `rapidfuzz`, `sentence-transformers` (all-MiniLM-L6-v2 baseline; instructor-xl for higher fidelity).</li></ul>\n<h3 id=\"3-6-ai-llm-stack\">3.6 AI / LLM Stack</h3>\n<ul><li><strong>Primary model</strong>: Claude (Sonnet 4.6 default; Opus 4.7+ for complex SPARQL synthesis).</li><li><strong>Embeddings</strong>: `voyage-3` (cultural-heritage-friendly), fallback `text-embedding-3-large`. Image embeddings via SigLIP or OpenCLIP for &quot;find similar&quot; by IIIF region.</li><li><strong>Agents</strong>: Claude Agent SDK / LangGraph. Tool-use endpoints described in §10.</li><li><strong>Guardrails</strong>: every NL→SPARQL must return (a) the SPARQL it generated, (b) the resulting triples, (c) the citation paths in CIDOC-CRM terms. No free-text claims without graph backing.</li></ul>\n<h3 id=\"3-7-iiif\">3.7 IIIF</h3>\n<ul><li><strong>Image server</strong>: Cantaloupe 5.x.</li><li><strong>Manifest serving</strong>: static (S3 + CDN) or on-the-fly via `iiif-prezi3` (Python).</li><li><strong>Spec conformance</strong>: Image API 3.0, Presentation API 3.0, Change Discovery API 1.0, Content State API 1.0.</li></ul>\n<h3 id=\"3-8-auth-ops-ci-cd\">3.8 Auth, Ops, CI/CD</h3>\n<ul><li><strong>Auth</strong>: Keycloak (OIDC) — Admin, Data Engineer, Curator, Educator, Public roles. Tenant-aware.</li><li><strong>Containers</strong>: Docker + Docker Compose for dev; Helm charts on Kubernetes (DOKS / EKS / GKE) for prod.</li><li><strong>Observability</strong>: OpenTelemetry → Grafana Tempo (traces) + Prometheus (metrics) + Loki (logs).</li><li><strong>CI/CD</strong>: GitHub Actions. Required checks: `npm test`, `pytest`, SHACL validation of fixture records, OpenAPI lint, Playwright smoke, axe a11y.</li><li><strong>Feature flags</strong>: Unleash (OSS) or GrowthBook.</li><li><strong>Secrets</strong>: Doppler or HashiCorp Vault. Never commit `.env`.</li></ul>\n<p>---</p>\n<h2 id=\"4-linked-art-patterns-the-canonical-6\">4. Linked Art Patterns (the canonical 6)</h2>\n<p>Every entity emitted by this system MUST conform to these patterns. The validator (§17) rejects anything that violates them.</p>\n<h3 id=\"4-1-activity-event-pattern\">4.1 Activity / Event Pattern</h3>\n<p>Never link Object → Person directly. Always Object → Activity → Person.</p>\n<pre><code>\n{\n  &quot;id&quot;: &quot;https://lod.example.org/object/123&quot;,\n  &quot;type&quot;: &quot;HumanMadeObject&quot;,\n  &quot;_label&quot;: &quot;The Starry Night&quot;,\n  &quot;produced_by&quot;: {\n    &quot;type&quot;: &quot;Production&quot;,\n    &quot;carried_out_by&quot;: [{ &quot;id&quot;: &quot;http://vocab.getty.edu/ulan/500115493&quot;, &quot;type&quot;: &quot;Person&quot;, &quot;_label&quot;: &quot;Vincent van Gogh&quot; }],\n    &quot;took_place_at&quot;: [{ &quot;id&quot;: &quot;http://vocab.getty.edu/tgn/7008038&quot;, &quot;type&quot;: &quot;Place&quot;, &quot;_label&quot;: &quot;Saint-Rémy-de-Provence&quot; }],\n    &quot;timespan&quot;: {\n      &quot;type&quot;: &quot;TimeSpan&quot;,\n      &quot;begin_of_the_begin&quot;: &quot;1889-06-01T00:00:00Z&quot;,\n      &quot;end_of_the_end&quot;:   &quot;1889-06-30T23:59:59Z&quot;,\n      &quot;identified_by&quot;: [{ &quot;type&quot;: &quot;Name&quot;, &quot;content&quot;: &quot;June 1889&quot; }]\n    },\n    &quot;technique&quot;: [{ &quot;id&quot;: &quot;http://vocab.getty.edu/aat/300053349&quot;, &quot;type&quot;: &quot;Type&quot;, &quot;_label&quot;: &quot;oil painting (technique)&quot; }]\n  }\n}\n</code></pre>\n<h3 id=\"4-2-classification-pattern\">4.2 Classification Pattern</h3>\n<p>Type with `classified_as` against controlled vocabularies (Getty AAT primary). Distinguish from `broader` (hierarchical placement) and `type` (CRM class).</p>\n<h3 id=\"4-3-core-properties-pattern\">4.3 Core Properties Pattern</h3>\n<p>Every entity MUST have:</p>\n<ul><li>`id` — stable, dereferenceable URI.</li><li>`type` — CIDOC-CRM class (Linked Art profile).</li><li>`_label` — human-readable summary, language-undeclared default.</li><li>`identified_by` — at least one `Name` (with `language` if known) and any `Identifier` (accession nums, ULAN IDs, etc.).</li></ul>\n<h3 id=\"4-4-statement-pattern\">4.4 Statement Pattern</h3>\n<p>When you cannot fully structure a fact, attach a `LinguisticObject` to express it as text. Always classify the statement with an AAT term (e.g., `description`, `provenance statement`, `materials statement`).</p>\n<h3 id=\"4-5-timespan-pattern\">4.5 TimeSpan Pattern</h3>\n<p>ISO 8601 (`xsd:dateTime`) in `begin_of_the_begin` / `end_of_the_end`, plus a `Name` for display. For uncertainty, widen the span; for fuzzy boundaries, also populate `end_of_the_begin` and `begin_of_the_end`.</p>\n<h3 id=\"4-6-expansion-pattern\">4.6 Expansion Pattern</h3>\n<p>Keep primary data simple. Add nuance with `AttributeAssignment` (e.g., previous attributions, contested authorship) so consumers can ignore detail safely.</p>\n<h3 id=\"4-7-uncertainty-contested-attributions\">4.7 Uncertainty &amp; Contested Attributions</h3>\n<p>Use `AttributeAssignment` with `assigned_property_type` and a confidence statement. Never overwrite the primary attribution; layer alternatives.</p>\n<h3 id=\"4-8-multilingual-names\">4.8 Multilingual Names</h3>\n<p>Multiple `Name` objects each with `language` (referenced via AAT or ISO 639). Default UI label resolution: user locale → `en` → first.</p>\n<h3 id=\"4-9-the-10-primary-entity-boundaries-hard-rule\">4.9 The 10 Primary Entity Boundaries (hard rule)</h3>\n<p>The graph is segmented into exactly 10 entity types. The validator (§17) rejects records that smuggle data into the wrong boundary.</p>\n<p><strong>Main entities (5):</strong></p>\n<ul><li>`HumanMadeObject` — physical holdings (paintings, sculptures, books-as-objects, specimens, monuments).</li><li>`LinguisticObject` — texts and transcribed recordings (the work, not the carrier).</li><li>`VisualItem` — image content / iconography depicted (the work, not the carrier).</li><li>`DigitalObject` — surrogates: IIIF manifests, JPGs, PDFs, born-digital files.</li><li>`Set` — archival subdivisions and curated collections (nested for arrangement vs location).</li></ul>\n<p><strong>Connected / authority entities (5):</strong></p>\n<ul><li>`Person` — individuals.</li><li>`Group` — organizations, families, studios, firms.</li><li>`Place` — geospatial entities with coordinates (TGN/GeoNames/PeriodO-grounded).</li><li>`Type` (Concepts) — AAT-aligned classifications.</li><li>`Activity` — macro-events: production, acquisition, exhibition, provenance transfers, destruction, conservation.</li></ul>\n<p>Archives = nested `Set` objects separating physical box from intellectual arrangement. Natural-history specimens map to `HumanMadeObject` for cross-domain UI consistency. A `Person` and a `Group` are never collapsed into one record — even sole-proprietor studios.</p>\n<h3 id=\"4-10-anti-patterns-forbidden\">4.10 Anti-Patterns (forbidden)</h3>\n<p><strong>No multiple instantiation.</strong> Do not type a node as multiple unrelated classes. Instead <strong>partition</strong> into sub-activities. A provenance event becomes two children: `TransferOfCustody` + `TransferOfTitle`, each its own `Activity`. A composite acquisition becomes `Purchase` + `Loan` partitioned events.</p>\n<p><strong>No blank nodes.</strong> Every entity — even an intermediate `TimeSpan`, `Name`, `Identifier`, or sub-`Activity` — gets a minted URI. This is what lets two records referring to the same exhibition align. Validator rejects any node lacking an `id`.</p>\n<p><strong>No carrier/content collapse.</strong> The physical painting (`HumanMadeObject`) is never the depicted scene (`VisualItem`) and is never the digital JPG (`DigitalObject`). Link with `shows` / `digitally_shown_by` / `digitally_carries`.</p>\n<p><strong>No silent reconciliation overrides.</strong> Never replace a curator-asserted URI in place. Add an `AttributeAssignment` recording the alternative.</p>\n<h3 id=\"4-11-human-vs-machine-dates\">4.11 Human vs Machine Dates</h3>\n<p>Every `TimeSpan` carries <strong>both</strong>:</p>\n<ul><li>Machine-queryable bounds: `begin_of_the_begin` + `end_of_the_end` (and optionally `end_of_the_begin` + `begin_of_the_end` for fuzzy boundaries), all `xsd:dateTime` ISO 8601.</li><li>Human display: a `Name` with the original phrasing (&quot;circa 1873&quot;, &quot;late 19th century&quot;, &quot;after 1492&quot;).</li></ul>\n<p>Never attempt to mathematically quantify the uncertainty of &quot;circa&quot; — preserve the phrase, widen the bounds to a sensible range (e.g., ±5 years for &quot;circa&quot;), and let the UI render both.</p>\n<p>---</p>\n<h2 id=\"5-runtime-contracts-extend-the-existing-src-contracts\">5. Runtime Contracts (extend the existing `src/contracts/`)</h2>\n<p>Keep the current convention: plain JS objects with `create<em>()` and `validate</em>()` — not a schema library. Add Zod mirrors for TS clients.</p>\n<h3 id=\"5-1-required-contracts\">5.1 Required contracts</h3>\n<ul><li>`Artwork` (UI DTO — already exists)</li><li>`SourceRecord` (provider envelope — already exists)</li><li>`RightsReport` (already exists; extend with RightsStatements.org IDs and CC licenses)</li><li>`ImportJob` (already exists)</li><li><strong>NEW</strong> `Entity` — discriminated union of `HumanMadeObject | Person | Group | Place | Type | Activity | Period | DigitalObject | LinguisticObject | VisualItem | Set`.</li><li><strong>NEW</strong> `MappingTemplate` — Visual ETL mapper output, JSON-serializable rules.</li><li><strong>NEW</strong> `ReconciliationCandidate` — `{ string, candidates: [{ uri, label, source, score, evidence }] }`.</li><li><strong>NEW</strong> `ProvenanceEvent` — `Acquisition | TransferOfCustody | Activity` shape with buyer, seller, place, timespan, price.</li><li><strong>NEW</strong> `Citation` — `{ text, source: { id, type, _label }, page?, locator? }`.</li><li><strong>NEW</strong> `WikiDraft` — MediaWiki markup + citation rollup + agent review state.</li><li><strong>NEW</strong> `AgentTask` — input contract, output contract, dry-run flag, audit trail.</li><li><strong>NEW</strong> `ValidationReport` — `{ entityId, shaclResults[], jsonSchemaResults[], businessRules[], severity, suggestedFixes[] }`.</li><li><strong>NEW</strong> `PROVRecord` — PROV-O serialization of every pipeline activity.</li></ul>\n<h3 id=\"5-2-contract-rules\">5.2 Contract rules</h3>\n<ul><li>Contracts MUST NOT import from `src/adapters/<em>` or `src/utils/</em>`. (Already enforced.)</li><li>Every contract exports `create&lt;Name&gt;`, `validate&lt;Name&gt;`, and a Zod schema in a parallel `.ts` file under `src/contracts/zod/`.</li><li>Adding a field: bump a `schemaVersion` in the contract, and write a migration in `src/utils/migrations/`.</li></ul>\n<h3 id=\"5-3-backward-compatibility-migration\">5.3 Backward-compatibility migration</h3>\n<p>Use `normalizeIncomingRecord(source)` (already in repo) for any field shape change. Never write destructive migrations.</p>\n<p>---</p>\n<h2 id=\"6-vocabularies-authority-files-uri-cheat-sheet\">6. Vocabularies &amp; Authority Files (URI cheat sheet)</h2>\n<p>| Authority | URI pattern | Access |</p>\n<p>|---|---|---|</p>\n<p>| Getty AAT | `http://vocab.getty.edu/aat/{id}` | SPARQL: `https://data.getty.edu/vocab/sparql` · OpenRefine reconcile · N-Triples dumps |</p>\n<p>| Getty ULAN | `http://vocab.getty.edu/ulan/{id}` (agent: `…/ulan/{id}-agent`) | same as AAT |</p>\n<p>| Getty TGN | `http://vocab.getty.edu/tgn/{id}` (place: `…/tgn/{id}-place`) | same |</p>\n<p>| Getty CONA | `http://vocab.getty.edu/page/cona/{id}` | API only (not full LOD yet) |</p>\n<p>| Getty Iconography | (via CONA Web Services) | API |</p>\n<p>| VIAF | `https://viaf.org/viaf/{id}` | REST + SRU, RDF/N-Triples dumps, no public SPARQL |</p>\n<p>| GeoNames | `http://sws.geonames.org/{id}/` | REST + RDF |</p>\n<p>| Wikidata | `http://www.wikidata.org/entity/Q{id}` | SPARQL: `https://query.wikidata.org/sparql` |</p>\n<p>| LoC NAF | `http://id.loc.gov/authorities/names/{id}` | REST + RDF |</p>\n<p>| Iconclass | `http://iconclass.org/{notation}` | REST + RDF |</p>\n<p>| PeriodO | `http://n2t.net/ark:/99152/p0{id}` | REST + JSON-LD |</p>\n<p>| RightsStatements.org | `http://rightsstatements.org/vocab/{code}/1.0/` | static RDF |</p>\n<p>| Creative Commons | `https://creativecommons.org/licenses/{code}/{version}/` | static |</p>\n<p>License notes:</p>\n<ul><li>Getty Vocabularies: <strong>ODC-By 1.0</strong> (attribution required). XML/Relational formats retired end-2025 — <strong>must use LOD path</strong>.</li><li>VIAF: <strong>ODC-By 1.0</strong>. Dumps frozen as of Aug 2024 during OCLC infra work; plan a fallback to Wikidata `P214`.</li><li>Wikidata: <strong>CC0</strong>.</li><li>GeoNames: <strong>CC-BY 4.0</strong>.</li></ul>\n<h3 id=\"6-1-reconciliation-routing-rules\">6.1 Reconciliation routing rules</h3>\n<ul><li><strong>People</strong>: ULAN (primary) → VIAF → Wikidata → LoC NAF. Merge via `owl:sameAs` cluster.</li><li><strong>Places</strong>: TGN → PeriodO (for historical-period place names) → GeoNames → Wikidata. Capture coordinates; flag historical-vs-modern boundary ambiguity.</li><li><strong>Concepts/Materials/Techniques</strong>: AAT → FAST → Wikidata. Iconclass for iconographic subjects.</li><li><strong>Cultural objects (depicted works, monuments)</strong>: CONA → Wikidata.</li><li><strong>Bibliographic</strong>: LoC NAF → VIAF → FAST.</li></ul>\n<h3 id=\"6-2-local-cache-rule-performance-reliability\">6.2 Local-cache rule (performance + reliability)</h3>\n<p><strong>Never query an external authority on the request path.</strong> Download authority dumps (Wikidata, LoC, Getty N-Triples, GeoNames) into local storage on a scheduled job; reconciliation reads from the local copy. Refresh cadence: Getty monthly, Wikidata weekly delta, LoC monthly, GeoNames weekly. Track dump version in Postgres so a reconciliation result can be reproduced.</p>\n<h3 id=\"6-3-graph-crawling-enrichment-hierarchy-generation\">6.3 Graph Crawling Enrichment (hierarchy generation)</h3>\n<p>After a first authority match (say, an LoC subject heading), the pipeline crawls outward through `owl:sameAs` / `skos:exactMatch` / `wdt:P244` to collect equivalent URIs across authorities (FAST, Wikidata, AAT). It then walks `skos:broader` / `wdt:P131` (place-in-place) / `gvp:broader` to materialize the hierarchical context (e.g., <strong>Yosemite → California → Western United States → North America</strong>). These broader-context URIs are written into the materialized `Record` so faceted UIs (concertina lists, regional rollups) can be served without runtime traversal.</p>\n<p>---</p>\n<h2 id=\"7-etl-pipeline-extract-load-transform-reconcile-publish\">7. ETL Pipeline (Extract → Load → Transform → Reconcile → Publish)</h2>\n<p>Follow ELT, not ETL: preserve raw originals before any transformation. (`alation` best-practice, confirmed by Yale LUX.)</p>\n<h3 id=\"7-1-pipeline-dag-dagster-assets\">7.1 Pipeline DAG (Dagster assets)</h3>\n<pre><code>\nraw_extract (IO: MinIO/S3, immutable, hashed)\n   │\n   ▼\nprofile (Great Expectations + spaCy entity extraction)\n   │\n   ▼\ncanonical_intermediate (JSON, schema-validated, idempotent ID minting)\n   │\n   ▼\nmapped_to_linked_art (Crom + PyLD, pattern-conformant JSON-LD)\n   │\n   ▼\nreconciled (RecordLinkage + sentence-transformers + authority APIs)\n   │              \\\n   │               &gt; queue for human review if confidence &lt; 0.90\n   ▼\nvalidated (SHACL + JSON Schema + business rules)\n   │\n   ▼\nmaterialized_record (denormalized JSON-LD doc + shortcut triples)\n   │              \\\n   │               &gt; emit ActivityStream item\n   ▼\nindexed (OpenSearch + triple store + vector store)\n   │\n   ▼\npublished (CDN-cacheable static JSON-LD, ActivityStream entry)\n</code></pre>\n<p>Each asset is <strong>idempotent</strong>. Re-running from any stage MUST NOT create duplicates. Use SHA-256 of canonical input as the dedupe key.</p>\n<h3 id=\"7-2-id-minting-policy\">7.2 ID minting policy</h3>\n<ul><li>Prefer the authority URI if reconciled.</li><li>Otherwise mint `{base}/{type-slug}/{ulid}`.</li><li>Never reuse IDs across re-ingests of the same source row — use a `sameAs` cluster instead.</li></ul>\n<h3 id=\"7-3-confidence-thresholds\">7.3 Confidence thresholds</h3>\n<ul><li><strong>≥ 0.95</strong> auto-approve.</li><li><strong>0.85 – 0.95</strong> auto-approve with `ReviewFlag` and weekly digest to curators.</li><li><strong>0.70 – 0.85</strong> queue for human review in Curator Console.</li><li><strong>&lt; 0.70</strong> drop the candidate, log evidence for retraining.</li></ul>\n<h3 id=\"7-4-synchronization\">7.4 Synchronization</h3>\n<ul><li>Pull updates from sources that expose <strong>IIIF Change Discovery</strong> (W3C ActivityStreams 2.0) — walk backwards through `prev` pages until you reach your last-seen cursor. Persist cursor in PostgreSQL.</li><li>For sources without Change Discovery (most Met / Harvard / AIC REST APIs), poll on a Dagster schedule with `If-Modified-Since` or compare hash of `_source.raw`.</li></ul>\n<h3 id=\"7-5-error-handling\">7.5 Error handling</h3>\n<ul><li>Every stage emits a structured error to Postgres `pipeline_errors` with `entityId`, `stage`, `error`, `inputHash`, `agentVersion`.</li><li>Retry policy: exponential backoff, max 5 attempts, then human queue.</li><li>Never drop a record silently. Failure = visible.</li></ul>\n<p>---</p>\n<h2 id=\"8-storage-strategy-multi-modal\">8. Storage Strategy (multi-modal)</h2>\n<h3 id=\"8-1-the-record-abstraction-load-bearing\">8.1 The &quot;Record&quot; abstraction (load-bearing)</h3>\n<p>Yale LUX&#39;s key insight: <strong>do not</strong> construct sub-graphs on the fly. At materialization time, build a self-contained JSON-LD document per entity that includes everything the UI needs in one fetch:</p>\n<ul><li>Core entity fields.</li><li>Pre-resolved labels for all referenced URIs.</li><li>HAL `_links` for navigation.</li><li>Shortcut properties for common queries (see §20.1).</li><li>Optional `_embedded` summaries for hot relationships (top 5 productions for a person, etc.).</li></ul>\n<p>These documents are cached aggressively (CDN, browser, service worker).</p>\n<h3 id=\"8-2-triple-store\">8.2 Triple store</h3>\n<p>Indexes the RDF projection of every `Record` so SPARQL queries can traverse relationships. Use named graphs per source institution for provenance partitioning. Reasoning kept minimal (RDFS + SHACL); no full OWL DL at runtime.</p>\n<h3 id=\"8-3-search-index\">8.3 Search index</h3>\n<p>Flatten each `Record` into a denormalized document for OpenSearch with:</p>\n<ul><li>`text_all` — concatenation of all `LinguisticObject` and `Name` content.</li><li>Facets: `type`, `classified_as.id[]`, `produced_by.timespan.year`, `produced_by.took_place_at.id[]`, `current_owner.id[]`, `material.id[]`, `rights.code`, `image_available`, `source_institution`.</li><li>Bidirectional pointers for &quot;follow your nose&quot; navigation.</li></ul>\n<h3 id=\"8-4-vector-index\">8.4 Vector index</h3>\n<ul><li>One collection per entity type.</li><li>Embed `_label` + top-N classifications + statement texts (truncated to 512 tokens).</li><li>Separately embed IIIF thumbnails via SigLIP for visual similarity.</li><li>Always store the source URI as payload so vector hits resolve to canonical records.</li></ul>\n<h3 id=\"8-5-operational-state-postgresql\">8.5 Operational state (PostgreSQL)</h3>\n<ul><li>`jobs` — ImportJobs with status, error counts, lineage references.</li><li>`mappings` — versioned MappingTemplate rows.</li><li>`reconciliation_queue` — pending human reviews.</li><li>`audit_log` — append-only, hash-chained.</li><li>`users` / `roles` / `org_tenants`.</li></ul>\n<p>---</p>\n<h2 id=\"9-api-design\">9. API Design</h2>\n<h3 id=\"9-1-endpoint-shape-rest-json-ld-hal\">9.1 Endpoint shape (REST + JSON-LD + HAL)</h3>\n<p>| Resource | Path | Verb | Notes |</p>\n<p>|---|---|---|---|</p>\n<p>| Object | `/api/objects/{id}` | GET | Returns Linked Art `HumanMadeObject` + HAL `_links` |</p>\n<p>| Object collection | `/api/objects` | GET | `?type=`, `?classified_as=`, `?date=`, `?place=`, `?material=`, `?source=`, `?page=`, `?expand=production,provenance` |</p>\n<p>| Person/Group | `/api/agents/{id}` | GET | |</p>\n<p>| Event | `/api/events/{id}` | GET | Activities, Productions, Acquisitions, Exhibitions |</p>\n<p>| Place | `/api/places/{id}` | GET | |</p>\n<p>| Concept | `/api/concepts/{id}` | GET | AAT-aligned types |</p>\n<p>| Set | `/api/sets/{id}` | GET | Archival groupings |</p>\n<p>| Search | `/api/search` | GET, POST | Mirrors OpenSearch facets; returns HAL `_links` with next/prev |</p>\n<p>| SPARQL | `/api/sparql` | POST | Read-only, rate-limited, query whitelist optional |</p>\n<p>| NL Query | `/api/ai/query` | POST | NL → HAL link or SPARQL, with explanations |</p>\n<p>| Reconcile | `/api/reconcile` | POST | OpenRefine reconciliation service spec |</p>\n<p>| Ingest | `/api/ingest/jobs` | POST | Trigger Dagster run |</p>\n<p>| Validate | `/api/validate` | POST | JSON Schema + SHACL |</p>\n<p>| Activity Stream | `/api/activity` | GET | W3C ActivityStreams 2.0, paged |</p>\n<p>| Best practices | `/api/best-practices` | GET | (Already exists — keep.) |</p>\n<p>| Roadmap | `/api/roadmap` | GET | (Already exists.) |</p>\n<p>| Health | `/api/health` | GET | (Already exists.) |</p>\n<p>Existing routes in `server.js` (`/api/explorer/<em>`, `/api/getty/</em>`, `/api/met/<em>`, `/api/linked-art/</em>`, etc.) are the <strong>MVP prototype</strong>. Keep them working. Layer new routes alongside per the §27 migration plan.</p>\n<h3 id=\"9-2-hal-conventions\">9.2 HAL conventions</h3>\n<p>Every response includes:</p>\n<pre><code>\n{\n  &quot;_links&quot;: {\n    &quot;self&quot;:              { &quot;href&quot;: &quot;/api/objects/123&quot; },\n    &quot;production&quot;:        { &quot;href&quot;: &quot;/api/objects/123/production&quot; },\n    &quot;provenance&quot;:        { &quot;href&quot;: &quot;/api/objects/123/provenance&quot; },\n    &quot;iiif_manifest&quot;:     { &quot;href&quot;: &quot;https://iiif.example.org/3/123/manifest.json&quot; },\n    &quot;search_works_by_agent&quot;: { &quot;href&quot;: &quot;/api/search?agent=ulan:500115493&quot; },\n    &quot;json_ld&quot;:           { &quot;href&quot;: &quot;/api/objects/123.jsonld&quot;, &quot;type&quot;: &quot;application/ld+json&quot; }\n  }\n}\n</code></pre>\n<p>Clients <strong>MUST</strong> prefer following links over constructing query strings. This is the Yale LUX decoupling.</p>\n<h3 id=\"9-3-content-negotiation\">9.3 Content negotiation</h3>\n<p>Serve `application/ld+json` with the Linked Art `@context` by default. Also respect:</p>\n<ul><li>`text/turtle`</li><li>`application/n-quads`</li><li>`application/rdf+xml`</li><li>`application/activity+json` for ActivityStreams pages</li><li>`application/hal+json` for HAL-only summaries</li></ul>\n<h3 id=\"9-4-pagination\">9.4 Pagination</h3>\n<p>W3C ActivityStreams &quot;Paged Collection&quot; pattern for everything paginated. `first`, `last`, `next`, `prev`, `current`, `partOf`.</p>\n<h3 id=\"9-5-graph-expansion\">9.5 Graph expansion</h3>\n<p>`?expand=production,provenance,exhibitions` returns embedded sub-resources. Hard cap depth = 2. Beyond that, clients must follow links.</p>\n<h3 id=\"9-5-1-progressive-enhancement-principle\">9.5.1 Progressive enhancement principle</h3>\n<p>The API ramps in complexity:</p>\n<ul><li><strong>Default tier</strong> — simple, flat, human-presentable JSON-LD. Pre-resolved labels. The kind of payload a junior developer can `console.log` and ship.</li><li><strong>`?profile=expanded`</strong> — fully expanded JSON-LD with nested entities (depth ≤ 2).</li><li><strong>`?profile=graph`</strong> — full RDF graph context for machine consumers.</li><li><strong>`/api/sparql`</strong> — direct graph access for power users.</li></ul>\n<p>Documented in OpenAPI 3.1. Default responses MUST be usable without reading the Linked Art spec — that is the bar.</p>\n<h3 id=\"9-6-graphql\">9.6 GraphQL</h3>\n<p>Optional but recommended for the curator console. Schema mirrors the contracts in §5. Use Apollo or `graphql-yoga` with persisted queries to control surface area.</p>\n<h3 id=\"9-7-security-governance\">9.7 Security &amp; governance</h3>\n<ul><li><strong>API keys</strong> for non-anonymous tiers.</li><li><strong>Rate limits</strong>: 100 req/min anonymous; 1000 req/min keyed; 60 SPARQL/min keyed.</li><li><strong>CORS</strong>: whitelist per environment.</li><li><strong>SPARQL query timeout</strong>: 10s default, 60s for keyed `researcher` tier.</li><li><strong>Read-only by default</strong>. Write endpoints behind OIDC + RBAC.</li><li><strong>Audit log</strong>: every write goes to append-only Postgres + S3 mirror (BagIt for export).</li></ul>\n<p>---</p>\n<h2 id=\"10-ai-llm-layer\">10. AI / LLM Layer</h2>\n<h3 id=\"10-1-nl-sparql-nl-hal-agent\">10.1 NL → SPARQL / NL → HAL agent</h3>\n<ol><li><strong>Schema embeddings</strong>: precompute embeddings for every AAT/ULAN/TGN term used in the corpus + the CIDOC-CRM property list. Store in pgvector.</li></ol>\n<ol><li><strong>Retrieval</strong>: query NL question → top-k schema elements (classes + properties).</li></ol>\n<ol><li><strong>Prompt assembly</strong>: ground the LLM with retrieved schema, examples (few-shot), and the user&#39;s facet context.</li></ol>\n<ol><li><strong>Generate</strong>: produce intermediate AST → SPARQL query OR a HAL link if a prebuilt query exists.</li></ol>\n<ol><li><strong>Validate against SHACL</strong> before execution.</li></ol>\n<ol><li><strong>Execute</strong>, returning rows + provenance trail.</li></ol>\n<ol><li><strong>Refinement loop</strong>: if empty results, the agent re-plans (loosen filters, switch authority, federate via `SERVICE`).</li></ol>\n<h3 id=\"10-2-graph-rag-chat\">10.2 Graph-RAG chat</h3>\n<ul><li>Hybrid retrieval: vector hit on text + 1-hop graph expansion.</li><li>Generate answer with <strong>mandatory</strong> citations: each sentence carries `[entityId, propertyPath]` references resolvable to source records.</li><li>No claim ungrounded by a triple or a `LinguisticObject` statement.</li></ul>\n<h3 id=\"10-3-llm-assisted-mapping\">10.3 LLM-assisted mapping</h3>\n<p>For Visual ETL Mapper: when a user uploads a new CSV, an agent proposes a mapping template by matching column headers + sample values against past `MappingTemplate` rows and Linked Art property semantics. User accepts / edits — no auto-publish.</p>\n<h3 id=\"10-4-llm-assisted-reconciliation\">10.4 LLM-assisted reconciliation</h3>\n<p>Used as <strong>tiebreaker only</strong>. The agent receives the top-5 candidates from RecordLinkage + sentence-transformers and explains the choice. Confidence still must clear thresholds (§7.3).</p>\n<h3 id=\"10-5-safety\">10.5 Safety</h3>\n<ul><li>All agent calls are `AgentTask` contracts: input + output + dry-run.</li><li>`dryRun=true` is the default for any write-side agent. UI surfaces a diff before commit.</li><li>Cite or refuse. Hallucinated content is a P0 bug.</li></ul>\n<p>---</p>\n<h2 id=\"11-frontend-architecture\">11. Frontend Architecture</h2>\n<h3 id=\"11-1-app-shell\">11.1 App shell</h3>\n<ul><li>Next.js 15 App Router.</li><li>Module federation (`@module-federation/nextjs-mf` 8.x) for micro-frontend composition: Public SPA, Curator Console, Researcher Workspace, Agent Workspace are independently buildable.</li><li>Shared component library `@metamuseum/linked-art-ui` published to a private npm registry (Verdaccio for dev, GitHub Packages for prod).</li></ul>\n<h3 id=\"11-2-routing\">11.2 Routing</h3>\n<ul><li>Public SPA mirrors the existing `app.js` views (`Home`, `Explore`, `ArtworkDetail`, `Stories`, `Compare`, `About Data`) — keep the `history.pushState` semantics. URLs:</li><li>`/artwork/{id}`, `/agent/{id}`, `/place/{id}`, `/concept/{id}`, `/set/{id}`, `/event/{id}`.</li><li>Server falls back to `index.html` for SPA routing — already in place.</li></ul>\n<h3 id=\"11-3-data-fetching\">11.3 Data fetching</h3>\n<ul><li>TanStack Query with HAL link following: `useEntity(id)` returns the record + a `follow(linkName)` helper.</li><li>Cache key includes the entity URI + facet hash so list views and detail views share cache.</li><li>Persistent cache via `IDBPersist` so the SPA is usable offline-first within a session.</li></ul>\n<h3 id=\"11-4-accessibility\">11.4 Accessibility</h3>\n<ul><li>WCAG 2.1 AA. Lighthouse + axe-core in CI.</li><li>Keyboard navigation on all interactive surfaces (already a project quality gate per `README.md`).</li><li>ARIA live regions for facet updates and search progress.</li><li>High-contrast theme.</li><li>Reduced-motion preference respected for timelines and force-directed graphs.</li></ul>\n<h3 id=\"11-5-internationalization\">11.5 Internationalization</h3>\n<ul><li>`next-intl` for app chrome.</li><li>Entity labels resolved at render via §4.8 rules.</li><li>Vocabulary terms via AAT preferred labels in the user&#39;s locale; fallback to `en`.</li></ul>\n<p>---</p>\n<h2 id=\"12-ui-widget-catalog-metamuseum-linked-art-ui\">12. UI Widget Catalog (`@metamuseum/linked-art-ui`)</h2>\n<p>Build in this order. Atomics first, macros last. Every widget MUST work with <strong>any</strong> `Entity` from §5.1 without bespoke domain logic.</p>\n<h3 id=\"12-1-atomic\">12.1 Atomic</h3>\n<ul><li>`&lt;LinkedDate timespan&gt;` — `TimeSpan` → readable date / range / &quot;Circa 19th C.&quot; Falls through `display_label` first.</li><li>`&lt;LinkedDimensions dimensions[]&gt;` — flattens `Dimension[]` with units. e.g. `24.5 × 30 cm (sheet)`.</li><li>`&lt;LinkedAttribution object&gt;` — Walks `produced_by → carried_out_by` for primary; surfaces secondary `AttributeAssignment`s as tooltip.</li><li>`&lt;MaterialTechniqueList object&gt;` — `made_of` + `technique` aggregator.</li><li>`&lt;LinkedLabel entity locale?&gt;` — multilingual `_label` resolver.</li><li>`&lt;UriBadge uri&gt;` — pill that shows the authority (AAT/ULAN/TGN/Wikidata/VIAF), opens external link with `rel=external noopener`.</li><li>`&lt;RightsBadge rights&gt;` — RightsStatements.org + CC.</li><li>`&lt;SourceBadge source&gt;` — provider attribution.</li><li>`&lt;CitationBlock citations[]&gt;` — bibliographic formatter (Chicago / MLA / APA toggle).</li></ul>\n<h3 id=\"12-2-entity-cards-preview\">12.2 Entity Cards (preview)</h3>\n<ul><li>`&lt;ObjectCard object&gt;` — image, label, date, attribution, status badge, rights.</li><li>`&lt;ActorCard actor&gt;` — portrait, name, dates, nationality, role.</li><li>`&lt;PlaceCard place&gt;` — map thumbnail, name, relationship.</li><li>`&lt;ConceptCard concept&gt;` — AAT term with broader/narrower context.</li></ul>\n<h3 id=\"12-3-visualizations\">12.3 Visualizations</h3>\n<ul><li>`&lt;ProvenanceTimeline provenance&gt;` — vertical/horizontal stepper. Click → buyer / seller / price / place. Shaded ranges for uncertainty.</li><li>`&lt;ExhibitionHistoryList exhibitions[]&gt;` — chronological, grouped by venue.</li><li>`&lt;GeoMapViewer entity&gt;` — pins for `produced_at`, `current_location`, `moved_to`. Animation time slider for object movement.</li><li>`&lt;IIIFCanvasViewer manifestUrl&gt;` — OpenSeadragon wrapper. Multi-canvas (recto/verso), annotations overlay, side-by-side compare mode.</li><li>`&lt;HierarchyTree set&gt;` — `members_of` traversal.</li><li>`&lt;NetworkGraph entity depth=2&gt;` — Cytoscape force-directed, drill-down expand, contextual JSON inspector side-panel.</li><li>`&lt;TimelineRange entities[]&gt;` — overlapping timelines across many entities, with concertina list folding. Histopedia-style rendering for creator publication histories and concurrent exhibitions.</li><li>`&lt;ConcertinaList entities[] facet&gt;` — accordion-style navigation for dense datasets (e.g., thousands of artists by birth year or nationality). Pairs with a `&lt;FacetHistogram&gt;` head that lets users drill into a year / decade / region; expansion reveals alphabetically sorted entries dynamically without page-by-page pagination.</li><li>`&lt;FacetHistogram facet bins&gt;` — interactive bar/density chart driving the concertina; clicking a bin filters the list below. Use for date-bands, place-rollups, material counts.</li><li>`&lt;EntityKnowledgePanel entityId&gt;` — first-order &quot;Knowledge Panel&quot; page for non-work entities (Persons, Places, Concepts, Groups). Merges internal catalog hits with external context: DBpedia biography summary, Wikidata influence graph, ULAN variant names, AAT broader/narrower terms. Treats authorities as peers, not footnotes.</li></ul>\n<h3 id=\"12-4-metadata-context\">12.4 Metadata &amp; Context</h3>\n<ul><li>`&lt;ClassificationsBlock classified_as[]&gt;` — clickable AAT pills.</li><li>`&lt;ReferenceList referred_to_by[]&gt;` — citation list, deep-link to `LinguisticObject`.</li><li>`&lt;IdentifierBlock identified_by[]&gt;` — accession, ULAN, Wikidata, internal.</li><li>`&lt;StatementBlock statements[]&gt;` — `LinguisticObject` text with attribution.</li></ul>\n<h3 id=\"12-5-participatory\">12.5 Participatory</h3>\n<ul><li>`&lt;AnnotationOverlay manifestUrl&gt;` — W3C Web Annotations on IIIF canvas.</li><li>`&lt;CrowdCorrector entityId fieldPath&gt;` — &quot;Suggest an Edit&quot; modal.</li><li>`&lt;StoryBuilder&gt;` — drag-and-drop narrative composer (ObjectCard + TextBlock).</li><li>`&lt;NotebookExport entities[]&gt;` — generate a Jupyter notebook with SPARQL + Python that reproduces the user&#39;s current view.</li></ul>\n<h3 id=\"12-6-implementation-rules\">12.6 Implementation rules</h3>\n<ul><li>All widgets accept the canonical entity shape (§5.1) — no provider-specific props.</li><li>All widgets degrade gracefully on missing data (use the Statement pattern §4.4 as fallback).</li><li>All widgets are tree-shakeable and SSR-safe.</li><li>All widgets ship a Storybook story + visual regression snapshot + axe a11y check.</li><li>All widgets emit a `data-la-entity-id` attribute for crawlers and agents.</li></ul>\n<p>---</p>\n<h2 id=\"13-iiif-integration\">13. IIIF Integration</h2>\n<ul><li><strong>Image API 3.0</strong> for high-resolution serving via Cantaloupe behind a CDN.</li><li><strong>Presentation API 3.0</strong> manifests link `HumanMadeObject` → `DigitalObject` (`digitally_shown_by`).</li><li><strong>Change Discovery 1.0</strong> for ingest sync (§7.4).</li><li><strong>Content State 1.0</strong> for deep-links into specific canvas regions.</li><li><strong>Authorization 2.0</strong> for tiered access (public / Creative Commons / member-only / restricted).</li><li><strong>Provenance</strong>: capture rights at the IIIF level (`requiredStatement`, `rights`) and bubble into `RightsReport`.</li></ul>\n<p>UI: `&lt;IIIFCanvasViewer&gt;` is the single integration point. Supports compare-mode, region annotations, and deep zoom.</p>\n<p>---</p>\n<h2 id=\"14-activitystreams-syndication\">14. ActivityStreams Syndication</h2>\n<ul><li>Publish a W3C ActivityStreams 2.0 endpoint at `/api/activity`.</li><li>Activities: `Create`, `Update`, `Delete` for every materialized record.</li><li>Paged collection pattern (first/last/next/prev/current/partOf).</li><li>Webhook subscribers register at `/api/activity/subscriptions` (admin-only).</li><li>Aggregators (Yale LUX-style harvesters) walk backwards via `prev` to a cursor — your endpoint MUST return stable, monotonically increasing IDs and never resequence.</li></ul>\n<p>---</p>\n<h2 id=\"15-search-discovery\">15. Search &amp; Discovery</h2>\n<h3 id=\"15-1-layers\">15.1 Layers</h3>\n<ol><li><strong>Keyword</strong> (OpenSearch, BM25, multi-field).</li></ol>\n<ol><li><strong>Faceted</strong> (typed facets from §8.3).</li></ol>\n<ol><li><strong>Semantic / vector</strong> (similar-by-text and similar-by-image).</li></ol>\n<ol><li><strong>Graph</strong> (SPARQL or HAL-link-follow).</li></ol>\n<ol><li><strong>NL agent</strong> (§10.1).</li></ol>\n<h3 id=\"15-2-autocomplete\">15.2 Autocomplete</h3>\n<ul><li>Edge n-gram analyzer on `_label` + `identified_by.content` + alternate names.</li><li>Type-ahead returns `{ uri, _label, type, summary }`.</li></ul>\n<h3 id=\"15-3-follow-your-nose\">15.3 &quot;Follow your nose&quot;</h3>\n<p>Implement bidirectional links by querying the inverse from the target rather than embedding lists on hubs (Yale LUX rule). For example, a Concept page does <strong>not</strong> carry an array of items — it carries a HAL link `items_using_concept` that runs a parameterized OpenSearch query.</p>\n<h3 id=\"15-4-complexity-toggle-art-tracks-model\">15.4 Complexity toggle (Art Tracks model)</h3>\n<p>Every page renders a &quot;basic&quot; view by default. A &quot;See full scholarly details&quot; toggle reveals `AttributeAssignment`s, raw JSON-LD, statements, full provenance.</p>\n<p>---</p>\n<h2 id=\"16-curator-researcher-tooling\">16. Curator &amp; Researcher Tooling</h2>\n<h3 id=\"16-1-visual-etl-mapper\">16.1 Visual ETL Mapper</h3>\n<ul><li>ReactFlow canvas, columns on the left, Linked Art graph nodes on the right.</li><li>Drag a column → connect to a node → choose property → optional reconciliation step.</li><li>Save mapping as `MappingTemplate` (§5.1).</li><li>Dry-run preview: shows generated JSON-LD for the first 5 rows + SHACL results.</li><li>Run: submits a Dagster job via `/api/ingest/jobs`.</li></ul>\n<h3 id=\"16-2-linked-art-form-builder-arches-ogee-based\">16.2 Linked Art Form Builder (Arches / Ogee-based)</h3>\n<ul><li>Built on <strong>Arches</strong> (open-source heritage data platform) with the <strong>Ogee</strong> package for Linked Art JSON-LD output. Curators see human-readable forms; the system emits compliant JSON-LD.</li><li><strong>Documentation-first configuration via Zelge.</strong> Every UI field (e.g. &quot;Birthplace&quot;) is declared in a Zelge config that maps it programmatically to a specific CIDOC-CRM semantic path (e.g. `crm:P98i_was_born / crm:P7_took_place_at / crm:E53_Place`). The form, the validator, and the JSON-LD generator all read from the same Zelge file — there is one source of truth. Repeatable patterns become reusable templates.</li><li><strong>Controlled dropdowns only.</strong> All classification, typing, and role fields are restricted to a Reference Data Manager that loads local vocabularies or AAT terms as dropdowns. Free-text in a typed field is a validation error, not a warning. This is how you prevent the &quot;1,000 spellings of &#39;oil paint&#39;&quot; problem at source.</li><li>Vocabulary-driven inputs auto-fetch URIs (debounced, cached against Redis URI concordance).</li><li>Real-time validation against `/api/validate`.</li><li>Domain-specific warnings: artist birthdate vs object production date, place existence in TGN at the production date (using PeriodO).</li><li>Outputs strict JSON-LD; never persists a freeform string where a URI is expected.</li></ul>\n<h3 id=\"16-3-dynamic-graph-explorer\">16.3 Dynamic Graph Explorer</h3>\n<ul><li>Loads JSON-LD by URL, paste, or upload.</li><li>Cytoscape force-directed, progressive disclosure (top-level entities first).</li><li>Click node → JSON inspector side-panel with full triple list.</li><li>Orphan-node detector.</li><li>Export: PNG, SVG, GraphML, JSON-LD subset.</li></ul>\n<h3 id=\"16-4-sparnatural-style-visual-sparql-builder\">16.4 Sparnatural-style Visual SPARQL Builder</h3>\n<ul><li>Faceted query construction grounded in the SHACL shape catalog.</li><li>Live preview of generated SPARQL.</li><li>Result preview as RDF triples table + optional CSV/Notebook export.</li></ul>\n<h3 id=\"16-5-validation-console\">16.5 Validation Console</h3>\n<ul><li>Endpoint: `/api/validate` (POST). Returns `ValidationReport`.</li><li>UI: drag-and-drop JSON-LD, see SHACL violations grouped by shape, with suggested fixes.</li></ul>\n<p>---</p>\n<h2 id=\"17-validation-stack\">17. Validation Stack</h2>\n<h3 id=\"17-1-layers\">17.1 Layers</h3>\n<ol><li><strong>JSON Schema</strong> (Draft 2020-12) — shape sanity checks. Mirrors the Zod schemas in §5.</li></ol>\n<ol><li><strong>SHACL</strong> (PySHACL) over <strong>CIDOC-CRM 7.1 RDFS</strong> + Linked Art profile shapes — semantic constraints (cardinality, allowed `classified_as` ranges, required `identified_by` minimum, no blank nodes, 10-entity-boundary conformance, etc.).</li></ol>\n<ol><li><strong>Business rules</strong> — date sanity (production within agent lifespan), place-existence-at-time, provenance gap detection.</li></ol>\n<ol><li><strong>Linked Art conformance</strong> — pattern-by-pattern checks from §4 plus the anti-pattern rejections from §4.10.</li></ol>\n<h3 id=\"17-2-authoring\">17.2 Authoring</h3>\n<p>Maintain SHACL shapes in `shapes/linked-art/<em>.shacl.ttl`. CI runs PySHACL against fixtures in `fixtures/linked-art/</em>.jsonld`. Pull requests that change shapes require a maintainer review.</p>\n<h3 id=\"17-3-acceptance\">17.3 Acceptance</h3>\n<ul><li>100% of public-facing records must validate.</li><li>A failing record cannot be published — only queued.</li><li>The system always returns a `ValidationReport` even on success, for transparency.</li></ul>\n<p>---</p>\n<h2 id=\"18-provenance-prov-o\">18. Provenance &amp; PROV-O</h2>\n<ul><li>Every pipeline activity emits a PROV-O `Activity` to an append-only Postgres table + S3 BagIt mirror.</li><li>Hash-chained log (each entry includes prev hash) for tamper evidence.</li><li>The Curator Console renders the lineage graph (`prov:wasDerivedFrom`, `prov:wasGeneratedBy`, `prov:wasAttributedTo`) per record.</li><li>Audit bundle export = BagIt zip containing raw inputs, mappings, validation reports, output record.</li></ul>\n<p>---</p>\n<h2 id=\"19-rights-licensing\">19. Rights &amp; Licensing</h2>\n<h3 id=\"19-1-required-fields\">19.1 Required fields</h3>\n<p>Every `HumanMadeObject` and `DigitalObject` MUST carry:</p>\n<ul><li>`rights` (RightsStatements.org code or CC URI).</li><li>`rights_holder` (Person/Group).</li><li>`attribution_string` (display-ready credit line).</li><li>`reuse_label` (computed: `public_domain | cc_by | cc_by_sa | cc0 | in_copyright | restricted | unknown`).</li></ul>\n<h3 id=\"19-2-ui-rules\">19.2 UI rules</h3>\n<ul><li>A record with `reuse_label = unknown` shows an explicit &quot;Rights unknown — do not reuse&quot; badge.</li><li>Image downloads only enabled for `public_domain`, `cc0`, `cc_by`, `cc_by_sa`.</li><li>Every Open Access dataset (Met, CMA, Smithsonian Open Access, Rijks, MoMA CSV) carries its source attribution all the way to the UI.</li></ul>\n<h3 id=\"19-3-existing-helpers-keep\">19.3 Existing helpers (keep)</h3>\n<ul><li>`src/utils/rights.js`, `src/adapters/getty.js::analyzeGettyRights`, `src/adapters/met.js::analyzeMetRights` already implement provider-specific logic. Extend rather than rewrite.</li></ul>\n<p>---</p>\n<h2 id=\"20-performance-engineering\">20. Performance Engineering</h2>\n<h3 id=\"20-1-shortcut-triples-yale-lux\">20.1 Shortcut triples (Yale LUX)</h3>\n<p>At materialization, inject 1-hop predicates:</p>\n<pre><code>\n:object lux:agentOfProduction :person .\n:object lux:placeOfProduction :place .\n:object lux:dateOfProductionStart &quot;1889-06-01&quot;^^xsd:dateTime .\n:object lux:dateOfProductionEnd   &quot;1889-06-30&quot;^^xsd:dateTime .\n:object lux:currentOwner :institution .\n:object lux:depicts :concept .\n</code></pre>\n<p>Runtime queries that previously required 3+ joins now do single-hop lookups. The shortcut predicates live in a `lux:` (or `mm:`) namespace under `https://lod.metamuseum.org/ns/`.</p>\n<h3 id=\"20-2-caching-layers\">20.2 Caching layers</h3>\n<ul><li><strong>CDN</strong> (Cloudflare / Fastly): cache static JSON-LD with `Cache-Control: public, max-age=3600, stale-while-revalidate=86400`.</li><li><strong>Browser</strong>: service worker for the SPA + IndexedDB for entity cache.</li><li><strong>Redis</strong>: URI concordance, facet results (5-min TTL), HAL link cache.</li><li><strong>Search</strong>: OpenSearch request cache.</li></ul>\n<h3 id=\"20-3-pre-computation\">20.3 Pre-computation</h3>\n<ul><li>Aggregate counts per facet are pre-computed nightly.</li><li>&quot;Top 5 productions per agent&quot;, &quot;top 5 owned objects per institution&quot; etc. live in the materialized `Record`.</li></ul>\n<h3 id=\"20-4-targets\">20.4 Targets</h3>\n<ul><li>API p95 &lt; 200ms for cached `Record` hits.</li><li>API p95 &lt; 500ms for cold `Record` reads.</li><li>Search p95 &lt; 300ms for keyword + facet.</li><li>SPARQL p95 &lt; 2s for whitelisted queries.</li><li>IIIF tile serving p95 &lt; 100ms (CDN-backed).</li><li>Ingest: 10k records in &lt; 5 minutes for incremental updates.</li><li>Uptime: 99.9% public read; 99.5% write.</li></ul>\n<p>---</p>\n<h2 id=\"21-observability\">21. Observability</h2>\n<ul><li><strong>OpenTelemetry</strong> SDKs in every service.</li><li><strong>Traces</strong> → Grafana Tempo. Span every adapter call, validation, SHACL evaluation, LLM call.</li><li><strong>Metrics</strong> → Prometheus + Grafana. Required dashboards:</li><li>Ingest throughput / error rate / lag per source.</li><li>Reconciliation accuracy per authority.</li><li>API latency p50/p95/p99 per route.</li><li>SPARQL query mix and timeouts.</li><li>LLM call cost + cache hit rate.</li><li>Cache hit ratios (CDN, Redis, OpenSearch).</li><li><strong>Logs</strong> → Loki. Structured (JSON). Required fields: `trace_id`, `tenant`, `actor`, `route`, `entity_id`.</li><li><strong>Alerts</strong>: paging on validation pass rate &lt; 99%, ingest lag &gt; 30 min, API error rate &gt; 1%, vector store sync lag &gt; 10 min.</li></ul>\n<p>---</p>\n<h2 id=\"22-security-api-governance\">22. Security &amp; API Governance</h2>\n<ul><li>OAuth2/OIDC via Keycloak. Roles: `admin`, `data_engineer`, `curator`, `educator`, `researcher`, `public`.</li><li>API keys for non-anonymous read tiers; HMAC-signed webhooks for ingest triggers.</li><li>Rate limits enforced at the gateway (§9.7).</li><li>SPARQL injection: queries pass through `rdflib`-based parser to reject `INSERT/DELETE/LOAD/CLEAR/DROP` for non-admin tokens.</li><li>WAF rules: block known LLM-prompt-injection patterns in query params (e.g., `ignore previous instructions`).</li><li>Secrets in Vault. Never in env files in repo.</li><li>CSP headers on the SPA. Strict `connect-src` to known authorities only.</li><li>IIIF Authorization 2.0 for restricted-tier images.</li></ul>\n<p>---</p>\n<h2 id=\"23-testing-tdd-requirements\">23. Testing &amp; TDD Requirements</h2>\n<h3 id=\"23-1-required-coverage\">23.1 Required coverage</h3>\n<ul><li><strong>Adapters</strong> (`src/adapters/*`) — normalization, rights, IIIF discovery, error fallbacks.</li><li><strong>Utils</strong> (`src/utils/*`) — Linked Art helpers, date / dimension parsing, HTML `describedby` discovery, URL safety.</li><li><strong>Contracts</strong> — `validate<em>` rejects malformed input, `create</em>` deterministically produces canonical shape.</li><li><strong>API</strong> — golden responses for each route, JSON-LD validates against the SHACL shapes.</li><li><strong>SHACL</strong> — every shape has at least one passing and one failing fixture.</li><li><strong>Frontend widgets</strong> — Storybook + axe + visual regression.</li><li><strong>E2E</strong> — Playwright: search → detail → IIIF deep-zoom → graph view; agent ingest dry-run; curator form submission.</li><li><strong>A11y</strong> — axe-core in CI, no AA violations.</li><li><strong>Performance</strong> — k6 smoke targeting the SLOs in §20.4 on every deploy.</li><li><strong>Security</strong> — `npm audit` / `pip-audit`; Semgrep ruleset for prompt-injection patterns; ZAP baseline on staging.</li></ul>\n<h3 id=\"23-2-existing-convention-keep\">23.2 Existing convention (keep)</h3>\n<ul><li>Node `node:test` + `node:assert`. No external test framework. Globs avoided — each test file listed in `package.json` `scripts.test`.</li></ul>\n<h3 id=\"23-3-acceptance-gates-per-merge\">23.3 Acceptance gates per merge</h3>\n<ul><li>All tests green.</li><li>SHACL pass on fixtures.</li><li>a11y pass.</li><li>API contract diff reviewed (OpenAPI lint).</li><li>Loading / empty / error / success states in any new UI.</li><li>Source URL, provider, rights, attribution, import timestamp preserved on any record-touching code path.</li><li>Agent-facing features have documented input/output contracts.</li></ul>\n<p>---</p>\n<h2 id=\"24-deployment-topology\">24. Deployment Topology</h2>\n<h3 id=\"24-1-local-dev\">24.1 Local dev</h3>\n<ul><li>`docker-compose.yml` brings up: Postgres, Redis, MinIO, GraphDB, OpenSearch, Cantaloupe, Keycloak, Dagster, the Node gateway, the Next.js frontend.</li><li>Sample CSV → live SPARQL endpoint within 5 minutes of `docker compose up`.</li></ul>\n<h3 id=\"24-2-staging-prod\">24.2 Staging / Prod</h3>\n<ul><li>Kubernetes. One namespace per environment.</li><li>Helm charts per service.</li><li>Blue/green for the API gateway. Canary for the SPA.</li><li>Backups: Postgres daily WAL + weekly snapshot; triple store dump weekly; OpenSearch snapshot daily to S3.</li><li>DR: cross-region S3 replication for raw + materialized records; RTO 4h, RPO 1h.</li></ul>\n<h3 id=\"24-3-cost-guidance\">24.3 Cost guidance</h3>\n<ul><li>Smallest viable cloud footprint: DigitalOcean Kubernetes + GraphDB CE + managed Postgres + S3-compatible object storage. ~$50-200/mo for portfolio-scale.</li><li>Production at 10M+ records: budget for MarkLogic or Stardog licensing if SPARQL p95 matters.</li></ul>\n<p>---</p>\n<h2 id=\"25-phased-build-plan-ai-agent-friendly\">25. Phased Build Plan (AI-agent-friendly)</h2>\n<p>Each phase ships an independently demoable artifact. Coding agents should pick tasks from one phase at a time and never start a later phase until the prior phase&#39;s acceptance criteria pass.</p>\n<h3 id=\"phase-0-foundations-weeks-1-2\">Phase 0 — Foundations (Weeks 1-2)</h3>\n<p><strong>Goal:</strong> Lock the contracts and patterns. No new UI.</p>\n<p>Tasks:</p>\n<ol><li>Author SHACL shapes for every entity type in `shapes/linked-art/*.shacl.ttl`.</li></ol>\n<ol><li>Add Zod mirrors of every contract in `src/contracts/zod/*.ts`.</li></ol>\n<ol><li>Extend `src/contracts/` with `Entity`, `MappingTemplate`, `ReconciliationCandidate`, `ProvenanceEvent`, `Citation`, `WikiDraft`, `AgentTask`, `ValidationReport`, `PROVRecord` (§5.1).</li></ol>\n<ol><li>Wire PySHACL into a Python validation microservice exposed at `/api/validate`.</li></ol>\n<ol><li>Add a Dagster project skeleton with one asset graph for the Met Open Access source as the canary.</li></ol>\n<ol><li>Replace ad-hoc fetches in `server.js` with calls into `src/adapters/*` per the existing module-dependency rules.</li></ol>\n<p>Acceptance: Every existing fixture validates; new contracts have `create<em>` + `validate</em>` + Zod; one Dagster run materializes 10 Met records end-to-end into PostgreSQL + OpenSearch + a triple store.</p>\n<h3 id=\"phase-1-multi-modal-storage-hal-api-weeks-3-5\">Phase 1 — Multi-modal Storage &amp; HAL API (Weeks 3-5)</h3>\n<p><strong>Goal:</strong> Stand up the Yale-LUX-pattern hybrid store and HAL hypermedia.</p>\n<p>Tasks:</p>\n<ol><li>Provision Postgres + Redis + OpenSearch + GraphDB + MinIO via `docker-compose.yml`.</li></ol>\n<ol><li>Build `src/utils/record-materializer.{js,ts}` that produces the denormalized `Record` doc + shortcut triples per §20.1.</li></ol>\n<ol><li>Add `/api/objects/:id`, `/api/agents/:id`, `/api/places/:id`, `/api/concepts/:id`, `/api/events/:id` with HAL `_links`.</li></ol>\n<ol><li>Add OpenSearch indexing on every record materialization.</li></ol>\n<ol><li>Add `/api/search` with faceted query, ActivityStreams paged collection responses.</li></ol>\n<ol><li>Add `/api/activity` syndication endpoint (paged, monotonically increasing IDs).</li></ol>\n<ol><li>Add CDN-friendly cache headers + ETag.</li></ol>\n<p>Acceptance: A single HAL link traversal can navigate from a search hit → object → agent → other works by agent → place of birth without the client constructing any SPARQL.</p>\n<h3 id=\"phase-2-ingest-pipeline-reconciliation-weeks-6-9\">Phase 2 — Ingest Pipeline &amp; Reconciliation (Weeks 6-9)</h3>\n<p><strong>Goal:</strong> Curators can ingest a CSV and see it reconciled and published.</p>\n<p>Tasks:</p>\n<ol><li>Implement Dagster assets for Extract → Profile → Map → Reconcile → Validate → Materialize → Index → Publish.</li></ol>\n<ol><li>Build the Reconciliation microservice (FastAPI) talking to Getty SPARQL, VIAF, Wikidata, GeoNames with Redis-backed URI cache.</li></ol>\n<ol><li>Add the human-review queue UI in the Curator Console (`/console/reconciliation`).</li></ol>\n<ol><li>Build the Visual ETL Mapper (ReactFlow) with `MappingTemplate` save / load / dry-run.</li></ol>\n<ol><li>Build the Form Builder with vocabulary-driven inputs.</li></ol>\n<p>Acceptance: A user can upload `met_sample.csv`, define a mapping, hit Dry Run, see generated JSON-LD + SHACL, queue 50 records for review, approve, and see them in `/api/objects` with HAL links.</p>\n<h3 id=\"phase-3-discovery-ui-visualizations-weeks-10-13\">Phase 3 — Discovery UI &amp; Visualizations (Weeks 10-13)</h3>\n<p><strong>Goal:</strong> Public discovery SPA, IIIF, timelines, maps, graph view.</p>\n<p>Tasks:</p>\n<ol><li>Migrate the existing `public/app.js` views into a Next.js 15 App Router scaffold (route-by-route — see §27). Keep `index.html` SPA fallback working.</li></ol>\n<ol><li>Build `@metamuseum/linked-art-ui` v0 with all atomics (§12.1) + ObjectCard + ActorCard + PlaceCard.</li></ol>\n<ol><li>Add `&lt;IIIFCanvasViewer&gt;`, `&lt;ProvenanceTimeline&gt;`, `&lt;GeoMapViewer&gt;`, `&lt;NetworkGraph&gt;`.</li></ol>\n<ol><li>Implement the &quot;complexity toggle&quot; pattern globally.</li></ol>\n<ol><li>Storybook + a11y + visual regression for every widget.</li></ol>\n<p>Acceptance: A visitor can land on `/artwork/{id}`, see image / attribution / dates / materials / rights / provenance timeline / map / network graph without any custom per-domain logic in the page.</p>\n<h3 id=\"phase-4-ai-layer-weeks-14-17\">Phase 4 — AI Layer (Weeks 14-17)</h3>\n<p><strong>Goal:</strong> NL→Query, Graph-RAG chat, AI-assisted mapping.</p>\n<p>Tasks:</p>\n<ol><li>Build `/api/ai/query` (NL → SPARQL/HAL) with schema embeddings stored in pgvector.</li></ol>\n<ol><li>Build `/api/ai/chat` with graph-grounded RAG and mandatory citations.</li></ol>\n<ol><li>Surface a chat affordance in the Researcher Workspace and a &quot;Suggest mapping&quot; affordance in the Visual ETL Mapper.</li></ol>\n<ol><li>Add agent telemetry: every call logs intermediate SPARQL, result counts, refinement iterations, cost.</li></ol>\n<p>Acceptance: A researcher asks &quot;Find works by French painters depicting Paris before 1900&quot; and gets a result set with citations resolvable to canonical records, plus the SPARQL the agent generated, plus an editable link.</p>\n<h3 id=\"phase-5-syndication-meta-wiki-art-bridge-hardening-weeks-18-20\">Phase 5 — Syndication, Meta Wiki Art Bridge, Hardening (Weeks 18-20)</h3>\n<p>Tasks:</p>\n<ol><li>Open the ActivityStreams subscriptions endpoint to external aggregators.</li></ol>\n<ol><li>Implement the Meta Wiki Art publish flow (WikiDraft → review → publish with citations).</li></ol>\n<ol><li>WCAG 2.1 AA audit on every public page.</li></ol>\n<ol><li>Pen test, k6 load test, DR drill.</li></ol>\n<ol><li>Public OpenAPI 3.1 docs at `/api/docs`.</li></ol>\n<p>Acceptance: Uptime &gt; 99.9% over a 30-day window with all SLOs in §20.4 met.</p>\n<p>---</p>\n<h2 id=\"26-kpis-success-metrics\">26. KPIs &amp; Success Metrics</h2>\n<ul><li><strong>Data quality</strong>: ≥ 80% of imported records enriched with ≥ 3 authority URIs (ULAN, AAT, TGN/Wikidata/VIAF/GeoNames).</li><li><strong>Validation</strong>: 100% pass rate on published records.</li><li><strong>Reconciliation</strong>: ≥ 90% auto-approve at confidence ≥ 0.95 across the Met canary; ≥ 95% precision on human-reviewed samples.</li><li><strong>Performance</strong>: §20.4 SLOs met at p95.</li><li><strong>UX</strong>: 80% of users find the desired object within 3 clicks; 90% successful search rate.</li><li><strong>Adoption</strong>: ≥ 3 external Linked Art systems consume the ActivityStreams feed; ≥ 50 MAU on the discovery portal in the first quarter post-launch.</li><li><strong>A11y</strong>: zero AA violations.</li><li><strong>AI</strong>: ≥ 70% of NL queries return non-empty grounded answers on the gold-set; cost per query &lt; $0.01 at scale.</li></ul>\n<p>---</p>\n<h2 id=\"27-migration-from-the-current-prototype\">27. Migration from the Current Prototype</h2>\n<p>Current state (per `README.md`, `CLAUDE.md`, `AGENTS.md`, `server.js`):</p>\n<ul><li>Monolithic Node.js HTTP server, no framework, no deps.</li><li>Vanilla-JS SPA in `public/`.</li><li>JSON file storage in `storage/`.</li><li>Adapters for Getty and Met. Contracts in `src/contracts/`.</li><li>Existing routes for `/api/explorer/<em>`, `/api/getty/</em>`, `/api/met/<em>`, `/api/linked-art/</em>`, `/api/patterns`, `/api/agents/run`, `/api/content/generate`, `/api/jobs`, `/api/issues`, etc.</li></ul>\n<p>Migration rules:</p>\n<ol><li><strong>Keep `server.js` alive</strong> as the API surface until the gateway in §9 is ready. Do not delete working routes.</li></ol>\n<ol><li><strong>Add, don&#39;t replace</strong>, new routes per Phase 1. The new HAL routes (`/api/objects/:id`, etc.) live alongside the existing ones. Public SPA gradually points at the new routes; the old ones stay for backward compatibility for at least one release.</li></ol>\n<ol><li><strong>Storage migration</strong>: write a one-time exporter that reads `storage/records.json` + `storage/jobs.json` and emits Linked Art JSON-LD documents into the new materialization pipeline. The current `storage/` becomes the legacy canary fixture.</li></ol>\n<ol><li><strong>No Next.js until contracts are stable</strong> — explicit rule from `README.md` line ~107. Migrate route-by-route once Phase 0 acceptance passes.</li></ol>\n<ol><li><strong>Provider adapter split</strong>: continue extracting Getty + Met fetch/normalize logic out of `server.js` (already a stated next-build slice in `README.md`).</li></ol>\n<ol><li><strong>Tests</strong>: every migrated path adds a `node:test` case under `tests/`. Add the file path explicitly to `package.json` `scripts.test`.</li></ol>\n<h3 id=\"27-1-files-dirs-to-add-alongside-what-exists\">27.1 Files / dirs to add (alongside what exists)</h3>\n<pre><code>\nsrc/\n  contracts/\n    zod/                       # NEW — Zod mirrors of every contract\n    entity.js entity.zod.ts    # NEW\n    mapping-template.js\n    reconciliation-candidate.js\n    provenance-event.js\n    citation.js\n    wikidraft.js\n    agent-task.js\n    validation-report.js\n    prov-record.js\n  utils/\n    record-materializer.js     # NEW — Yale-LUX-pattern denormalizer\n    shortcut-triples.js        # NEW\n    hal.js                     # NEW — HAL link factory\n    activity-stream.js         # NEW — paged collection helper\n  adapters/\n    getty.js                   # existing\n    met.js                     # existing\n    harvard.js                 # NEW\n    smithsonian.js             # NEW\n    rijks.js                   # NEW\n    europeana.js               # NEW\n    british-museum.js          # NEW\n    aic.js                     # NEW\n    cma.js                     # NEW\n    wikidata.js                # NEW\n    viaf.js                    # NEW\nshapes/\n  linked-art/\n    object.shacl.ttl           # NEW\n    person.shacl.ttl\n    place.shacl.ttl\n    activity.shacl.ttl\n    digital-object.shacl.ttl\nfixtures/\n  linked-art/\n    pass/\n    fail/\npipeline/                      # NEW — Python / Dagster\n  pyproject.toml\n  dagster.yaml\n  assets/\n    extract.py\n    profile.py\n    map.py\n    reconcile.py\n    validate.py\n    materialize.py\n    index.py\n    publish.py\n  services/\n    validation_api/\n    reconciliation_api/\n    nl_query_agent/\n    rag_chat/\nweb/                           # NEW — Next.js 15 (when Phase 3 starts)\n  app/\n  components/\n  lib/\npackages/\n  linked-art-ui/               # NEW — shared component library\n    src/\n    stories/\n    package.json\nops/\n  docker-compose.yml\n  k8s/\n  helm/\ndocs/\n  architecture/\n  contracts/\n  shapes/\n  api/\n</code></pre>\n<h3 id=\"27-2-hard-rules-during-migration\">27.2 Hard rules during migration</h3>\n<ul><li><strong>Never</strong> mutate `_source.raw`.</li><li><strong>Never</strong> import across adapters (use `artwork-builder.js` as the bridge — already enforced).</li><li><strong>Never</strong> create a sibling `web/` Next.js app until Phase 3 acceptance passes (per README).</li><li><strong>Always</strong> keep contracts as leaf modules — no upstream deps.</li><li><strong>Always</strong> add a test for any risky transform.</li><li><strong>Always</strong> preserve loading / empty / error / success states on any new UI surface.</li></ul>\n<p>---</p>\n<h2 id=\"28-risks-mitigations\">28. Risks &amp; Mitigations</h2>\n<p>| Risk | Likelihood | Impact | Mitigation |</p>\n<p>|---|---|---|---|</p>\n<p>| Legacy CMS data quality varies wildly | High | High | Three-tier confidence reconciliation (§7.3); human review queue; SHACL gate; explicit `unknown` rights state. |</p>\n<p>| CIDOC-CRM expertise gap | High | High | Form Builder + Visual ETL Mapper + NL→SPARQL hide complexity. Pedagogical scaffolding (Storytelling). |</p>\n<p>| External authorities change format / retire | Medium | Medium | Getty retires XML/Relational end-2025 — already using LOD. VIAF dumps frozen — fall back to Wikidata `P214`. Wrap every authority in an adapter with a feature flag. |</p>\n<p>| Pure SPARQL too slow at scale | Medium | High | Materialized `Record` docs + shortcut triples + aggressive CDN caching (§20). |</p>\n<p>| LLM hallucinations in Graph-RAG | High | High | Mandatory citations; &quot;cite or refuse&quot; rule; SHACL-validated SPARQL pre-execution; deterministic eval suite. |</p>\n<p>| Rights misclassification | Low | High | `unknown` is the default. Per-provider analyzers (existing). Image actions gated on `reuse_label`. |</p>\n<p>| Vendor lock-in (MarkLogic / Stardog) | Medium | Medium | Storage layer behind an interface; OSS fallback (GraphDB CE) tested in CI. |</p>\n<p>| IIIF server unavailable for partner institutions | Medium | Low | Adapter degrades gracefully to source-hosted image with attribution; manifest-on-the-fly fallback via `iiif-prezi3`. |</p>\n<p>| Prompt injection via NL queries | Medium | High | WAF rules; query sandbox; SPARQL-only execution path; no shell / file system tools in the discovery agent. |</p>\n<p>| Sustainability of links (URI rot) | High | Medium | Persistent identifier strategy; ARK or DOI for canonical records; CDN-cached `sameAs` clusters. |</p>\n<p>---</p>\n<h2 id=\"29-glossary\">29. Glossary</h2>\n<ul><li><strong>AAT</strong> — Art &amp; Architecture Thesaurus (Getty).</li><li><strong>ActivityStreams</strong> — W3C 2.0 spec for activity feeds; basis of IIIF Change Discovery.</li><li><strong>Arches</strong> — Open-source heritage data platform (Getty Conservation Institute + WMF) used as the curator backend.</li><li><strong>BagIt</strong> — IETF spec for hierarchical file packaging used for audit bundles.</li><li><strong>CIDOC-CRM 7.1</strong> — ISO 21127, the formal ontology for cultural heritage, in its current RDFS-encoded version. Linked Art is a profile of it.</li><li><strong>Crom (cromulent)</strong> — Python lib for Linked Art instance creation.</li><li><strong>FAST</strong> — Faceted Application of Subject Terminology (OCLC); subject headings derived from LCSH.</li><li><strong>HAL</strong> — Hypertext Application Language; `_links` envelope.</li><li><strong>Histopedia</strong> — Interactive timeline visualization tool used for publication histories and overlapping exhibitions.</li><li><strong>Linked Art</strong> — JSON-LD profile of CIDOC-CRM optimized for developer usability.</li><li><strong>LOUD</strong> — Linked Open Usable Data (Yale LUX framing).</li><li><strong>Ogee</strong> — Open-source Arches package that maps Arches data entry forms to compliant Linked Art JSON-LD output.</li><li><strong>PeriodO</strong> — Authority for named historical periods and historical place names.</li><li><strong>PROV-O</strong> — W3C Provenance Ontology.</li><li><strong>SHACL</strong> — Shapes Constraint Language; W3C spec for RDF validation.</li><li><strong>Shortcut triple</strong> — Materialized 1-hop predicate that collapses multi-hop CIDOC-CRM paths for query performance.</li><li><strong>TGN</strong> — Thesaurus of Geographic Names (Getty).</li><li><strong>ULAN</strong> — Union List of Artist Names (Getty).</li><li><strong>VIAF</strong> — Virtual International Authority File (OCLC).</li><li><strong>Zelge</strong> — Documentation-first configuration system. Declares the canonical mapping from each UI form field to its CIDOC-CRM semantic path; consumed by form generator, validator, and JSON-LD emitter so all three stay in sync.</li></ul>\n<p>---</p>\n<h2 id=\"30-open-questions-decide-before-phase-0-ships\">30. Open Questions (decide before Phase 0 ships)</h2>\n<ol><li><strong>Primary triple store</strong> — MarkLogic (paid, Yale-grade) vs GraphDB CE (OSS) vs Stardog vs Fuseki? Default recommendation: <strong>GraphDB CE</strong> for the OSS path until p95 SLOs demand otherwise.</li></ol>\n<ol><li><strong>Vector store</strong> — pgvector (simplest) vs Qdrant (best perf) vs Weaviate (graph-aware)? Default: <strong>pgvector</strong> until &gt; 5M embeddings.</li></ol>\n<ol><li><strong>Frontend styling</strong> — Tailwind 4 vs Chakra UI 3? Default: <strong>Tailwind 4</strong> for new work; respect existing vanilla styles in `public/`.</li></ol>\n<p>3a. <strong>Search engine</strong> — Solr 9 vs OpenSearch 2 vs Elasticsearch 8? Default for LUX alignment: <strong>Solr 9</strong> (best nested-document faceting). Default for ops familiarity: <strong>OpenSearch 2</strong>.</p>\n<p>3b. <strong>Curator backend</strong> — Adopt full Arches + Ogee + Zelge, or build a lighter NestJS/Next.js curator console that emits the same JSON-LD? Default: <strong>Arches + Ogee + Zelge</strong> if a curator team exists; <strong>light console</strong> if dev-only.</p>\n<ol><li><strong>Orchestrator</strong> — Dagster vs Airflow? Default: <strong>Dagster</strong> for asset-centric semantics + lineage UI.</li></ol>\n<ol><li><strong>Mono-repo or multi-repo?</strong> Recommendation: <strong>mono-repo</strong> with `pnpm` workspaces + `turbo` for the Node/TS side, plus a `pipeline/` Python subtree using `uv` + `pyproject.toml`.</li></ol>\n<ol><li><strong>License of new code</strong> — match existing repo LICENSE.</li></ol>\n<ol><li><strong>ID scheme</strong> — ULID + namespaced URIs (`https://lod.metamuseum.org/{type}/{ulid}`) — confirm domain.</li></ol>\n<ol><li><strong>Meta Wiki Art</strong> namespace + citation template + agent approval gates — needs a separate spec doc (see `LinkedArtCollaboration.md` for context).</li></ol>\n<p>When any of the above is decided, update §3.4 / §3.5 / §11 accordingly and remove the question from this list.</p>\n<p>---</p>\n<p><em>End of master spec. Linked context: `LinkedArtPRD.md` · `LinkedArtAppFeatures.md` · `YaleLux.md` · `LODPipeline.md` · `LODEngine.md` · `LODTools.md` · `Vocabularies.md` · `SPARQL.md` · `LinkedArtPatterns.md` · `LinkedArtDashboard.md` · `LinkedArtRoadmap.md` · `LinkedArtWidgets.md` · `LinkedArtChallenges.md` · `LinkedArtSaaS.md` · `LinkedArtUnmetNeeds.md` · `LinkedArtUseCases.md` · `LinkedArtJobReady.md` · `LinkedArtSoftware.md` · `LinkedArtCollaboration.md` · `LinkedArtFeatureRoadmap.md` · `LinkedDesign.md` · `Linked Open Art Data Web App - Must-have Data Sources.md` · `Linked Art Notes.md`.</em></p>","updatedAt":"2018-10-20T01:46:40.000Z","checksum":"a5f0baca89c688f779cc11e6108f44484ca738796196e1d1761ba04f9c4ba85e","checksumPrefix":"a5f0baca89c6","anchorCount":129,"lineCount":1155,"rawUrl":"/api/docs/content?path=linked-art%2FLinkedArtSOTAWebApp.md","htmlUrl":"/docs?doc=linked-art%2FLinkedArtSOTAWebApp.md","apiUrl":"/api/docs/content?path=linked-art%2FLinkedArtSOTAWebApp.md"}