Meta Museum Documentation

Catalog (70)

ID	Document	Updated	Anchors	SHA
agents/ag2-extraction-notes	AG2 Extraction Notes agents/ag2-extraction-notes.md	10/20/2018, 1:46:40 AM	11	e8d0072ebec1
asset-provenance	Asset Provenance asset-provenance.md	10/20/2018, 1:46:40 AM	4	1025c0acc117
closeout-notes	AI-RSI one-click closeout notes closeout-notes.md	10/20/2018, 1:46:40 AM	2	1f560f6a8535
content-credibility-engine	Content Credibility Engine content-credibility-engine.md	10/20/2018, 1:46:40 AM	8	d9aa32358670
demo-script	Demo Video — Shot List & Script (60–90s) demo-script.md	10/20/2018, 1:46:40 AM	2	131ddae42e6e
deployment	Deployment — Vercel + Render deployment.md	10/20/2018, 1:46:40 AM	8	4911b1f459b5
development-roadmap	Meta Museum Development Roadmap development-roadmap.md	10/20/2018, 1:46:40 AM	23	624a8a089d72
development/aidd-tdd	AIDD + TDD Discipline development/aidd-tdd.md	10/20/2018, 1:46:40 AM	5	cd0a0524525a
env	Environment Variables env.md	10/20/2018, 1:46:40 AM	10	9c18634cab1a
evals/golden-museum-questions	Golden Eval Dataset: Complex Museum Questions evals/golden-museum-questions.md	10/20/2018, 1:46:40 AM	6	2876a2b5e78d
linked-art/conformance-matrix	Linked Art 1.0 — Conformance Matrix linked-art/conformance-matrix.md	10/20/2018, 1:46:40 AM	5	53ff87000bf4
linked-art/Linked%20Art%20Notes	Linked Art Notes.md linked-art/Linked Art Notes.md	10/20/2018, 1:46:40 AM	0	aca66d51107b
linked-art/Linked%20Open%20Art%20Data%20Web%20App%20-%20Must-have%20Data%20Sources	Linked Open Art Data Web App (AI) — Must-have Data Sources linked-art/Linked Open Art Data Web App - Must-have Data Sources.md	10/20/2018, 1:46:40 AM	7	7b7d350fe8a0
linked-art/LinkedArtAppFeatures	🏛️ Art Explorer: Linked Art Application & Ecosystem linked-art/LinkedArtAppFeatures.md	10/20/2018, 1:46:40 AM	14	e23b890ecd2a
linked-art/LinkedArtChallenges	LinkedArtChallenges.md linked-art/LinkedArtChallenges.md	10/20/2018, 1:46:40 AM	0	d8c987070277
linked-art/LinkedArtCollaboration	LinkedArtCollaboration.md linked-art/LinkedArtCollaboration.md	10/20/2018, 1:46:40 AM	1	14ccf63edef3
linked-art/LinkedArtDashboard	LinkedArtDashboard.md linked-art/LinkedArtDashboard.md	10/20/2018, 1:46:40 AM	0	6d04d4b2bf79
linked-art/LinkedArtFeatureRoadmap	Feature Roadmap for Linked Open Art Data Apps linked-art/LinkedArtFeatureRoadmap.md	10/20/2018, 1:46:40 AM	8	ac10d8e79c20
linked-art/LinkedArtJobReady	LinkedArtJobReady.md linked-art/LinkedArtJobReady.md	10/20/2018, 1:46:40 AM	0	c60b357bcb87
linked-art/LinkedArtModel1.0-Reference	Linked Art Model 1.0 Reference (Round 1) linked-art/LinkedArtModel1.0-Reference.md	10/20/2018, 1:46:40 AM	344	e6d48d474b3e
linked-art/LinkedArtPatterns	LinkedArtPatterns.md linked-art/LinkedArtPatterns.md	10/20/2018, 1:46:40 AM	0	d45bbbb02d70
linked-art/LinkedArtPRD	🖼️ Product Requirements Document linked-art/LinkedArtPRD.md	10/20/2018, 1:46:40 AM	20	91bc1f37307c
linked-art/LinkedArtRoadmap	LinkedArtRoadmap.md linked-art/LinkedArtRoadmap.md	10/20/2018, 1:46:40 AM	0	e52e71c6bd28
linked-art/LinkedArtSaaS	LinkedArtSaaS.md linked-art/LinkedArtSaaS.md	10/20/2018, 1:46:40 AM	0	3d260738fb29
linked-art/LinkedArtSoftware	Code and Tools linked-art/LinkedArtSoftware.md	10/20/2018, 1:46:40 AM	8	9e8fef24aea9
linked-art/LinkedArtSOTAWebApp	LinkedArt SOTA Web App — Master Build Specification linked-art/LinkedArtSOTAWebApp.md	10/20/2018, 1:46:40 AM	129	a5f0baca89c6
linked-art/LinkedArtUnmetNeeds	LinkedArtUnmetNeeds.md linked-art/LinkedArtUnmetNeeds.md	10/20/2018, 1:46:40 AM	0	cb35fac29cc1
linked-art/LinkedArtUseCases	LinkedArtUseCases.md linked-art/LinkedArtUseCases.md	10/20/2018, 1:46:40 AM	0	5c572ce8e7f3
linked-art/LinkedArtWidgets	LinkedArtWidgets.md linked-art/LinkedArtWidgets.md	10/20/2018, 1:46:40 AM	0	b39911c7d97d
linked-art/LinkedDesign	LinkedDesign.md linked-art/LinkedDesign.md	10/20/2018, 1:46:40 AM	0	0a02240471e5
linked-art/LODEngine	LODEngine.md linked-art/LODEngine.md	10/20/2018, 1:46:40 AM	0	ef73426f80db
linked-art/LODPipeline	LODPipeline.md linked-art/LODPipeline.md	10/20/2018, 1:46:40 AM	0	fe95e61ed9da
linked-art/LODTools	LODTools.md linked-art/LODTools.md	10/20/2018, 1:46:40 AM	0	3167947fc4e4
linked-art/SPARQL	SPARQL.md linked-art/SPARQL.md	10/20/2018, 1:46:40 AM	0	50e00ed51733
linked-art/Vocabularies	Vocabularies.md linked-art/Vocabularies.md	10/20/2018, 1:46:40 AM	0	e0574a338aaa
linked-art/YaleLux	YaleLux.md linked-art/YaleLux.md	10/20/2018, 1:46:40 AM	0	74fd47fae749
meta-wiki-art-bridge	Meta Wiki Art Bridge (MediaWiki + Wikibase) meta-wiki-art-bridge.md	10/20/2018, 1:46:40 AM	7	7a43fb0c48b8
ops/activity-adoption-proof	Activity Feed Adoption Proof Runbook ops/activity-adoption-proof.md	10/20/2018, 1:46:40 AM	5	68a80b43ae58
ops/ag2-worker	AG2 Worker and Bridge Runbook ops/ag2-worker.md	10/20/2018, 1:46:40 AM	9	50efcd4e3318
ops/auth-credential-rotation	Auth credential rotation runbook ops/auth-credential-rotation.md	10/20/2018, 1:46:40 AM	4	449b8b8eecb6
ops/deployment-preflight	Deployment Preflight Runbook ops/deployment-preflight.md	10/20/2018, 1:46:40 AM	5	ac60432d0aed
ops/era-c-exit-gate-evidence	Era C Exit-Gate Evidence Pack ops/era-c-exit-gate-evidence.md	10/20/2018, 1:46:40 AM	6	656b9c7f85c6
ops/go-live-checklist	Go-Live & Evidence-Pipeline Checklist ops/go-live-checklist.md	10/20/2018, 1:46:40 AM	6	ae7f5d71f7dc
ops/k6-slo	k6 SLO Load Test (SOTA §20.4) ops/k6-slo.md	10/20/2018, 1:46:40 AM	4	328b5b3163d4
ops/kpi-evidence	SOTA §26 KPI Evidence Input ops/kpi-evidence.md	10/20/2018, 1:46:40 AM	5	d7b2973d2927
ops/launch-review	Launch Review Packet ops/launch-review.md	10/20/2018, 1:46:40 AM	5	880e41ebcbe3
ops/managed-linked-art-pilot-runbook	Managed Linked Art Pilot Runbook ops/managed-linked-art-pilot-runbook.md	10/20/2018, 1:46:40 AM	11	d4f125c2ddae
ops/otel-local	Local OpenTelemetry Wiring (Tempo / Jaeger) ops/otel-local.md	10/20/2018, 1:46:40 AM	5	1ebbc3b33f92
ops/outbox-projector	Transactional Outbox Projector (Postgres -> Solr/GraphDB) ops/outbox-projector.md	10/20/2018, 1:46:40 AM	5	dc70ad766471
ops/procurement-readiness-packet	Procurement Readiness Packet ops/procurement-readiness-packet.md	10/20/2018, 1:46:40 AM	9	c5685e82cca7
ops/reconciliation-service	Reconciliation Service (C2) ops/reconciliation-service.md	10/20/2018, 1:46:40 AM	6	05162c313ea9
ops/search-graph-provisioning	Solr 9 + GraphDB Provisioning ops/search-graph-provisioning.md	10/20/2018, 1:46:40 AM	6	fc1b15279a84
ops/security-dr-drill	Pen Test Baseline + DR Drill Runbook ops/security-dr-drill.md	10/20/2018, 1:46:40 AM	3	a766ef3e2afc
progress/2026-05-31/era-c-readiness-snapshot	Era C Readiness Snapshot (May 31, 2026) progress/2026-05-31/era-c-readiness-snapshot.md	10/20/2018, 1:46:40 AM	3	9672614ceb53
progress/era-history	Meta Museum — Era Delivery History progress/era-history.md	10/20/2018, 1:46:40 AM	47	cc030755d1e5
providers/harvard-art-museums	Harvard Art Museums API Integration Plan providers/harvard-art-museums.md	10/20/2018, 1:46:40 AM	11	fa8b980154f5
providers/louvre-collections-json	Louvre Collections JSON Integration Plan providers/louvre-collections-json.md	10/20/2018, 1:46:40 AM	11	775f91a8d813
providers/nga-open-data	National Gallery of Art (NGA) Open Data Integration Plan providers/nga-open-data.md	10/20/2018, 1:46:40 AM	11	51c4807c8de0
providers/princeton-art-museum	Princeton University Art Museum API Integration Plan providers/princeton-art-museum.md	10/20/2018, 1:46:40 AM	11	c8823f65ee41
providers/rkd-knowledge-graph	RKD Knowledge Graph Integration Plan providers/rkd-knowledge-graph.md	10/20/2018, 1:46:40 AM	16	2b4b42f2ad42
providers/smithsonian-open-access	Smithsonian Open Access Integration Plan providers/smithsonian-open-access.md	10/20/2018, 1:46:40 AM	12	db1ffa4cab02
providers/vanda-collections-api	Victoria and Albert Museum (V&A) Collections API Integration Plan providers/vanda-collections-api.md	10/20/2018, 1:46:40 AM	11	755d93972233
quality	Quality & Performance quality.md	10/20/2018, 1:46:40 AM	6	174add040960
reconciliation/exhibition-literature-reconciliation	Exhibition + Literature Reconciliation (B6.1) reconciliation/exhibition-literature-reconciliation.md	10/20/2018, 1:46:40 AM	7	293e9d81dd7c
responsible-ai	Responsible AI responsible-ai.md	10/20/2018, 1:46:40 AM	8	f90006650821
risk-register	Risk Register risk-register.md	10/20/2018, 1:46:40 AM	4	becb213d5c5e
roadmap-to-10	Roadmap to 10/10 roadmap-to-10.md	10/20/2018, 1:46:40 AM	15	40a11000dc7e
roadmap	Meta Museum Roadmap roadmap.md	10/20/2018, 1:46:40 AM	18	145d0cbbe54a
rsi-wiki	AI-RSI compounding wiki rsi-wiki.md	10/20/2018, 1:46:40 AM	8	b64914fe6f20
wikibase-cloud-migration-checklist	Wikibase Cloud -> Self-Host Migration Checklist wikibase-cloud-migration-checklist.md	10/20/2018, 1:46:40 AM	12	170657fcbf2b

Current Document: LODPipeline.md

Source updated 10/20/2018, 1:46:40 AM · SHA-256 fe95e61ed9da · 82 lines

Canonical ID: linked-art/LODPipeline

JSON for this doc:/api/docs/content?path=linked-art/LODPipeline.md

Human link:/docs?doc=linked-art%2FLODPipeline.md

Canonical API endpoint:/api/docs/content?path=linked-art%2FLODPipeline.md

Sections (stable anchors):

No detectable headings.

Optimal Linked Open Data (LOD) pipelines follow a modular ELT (Extract-Load-Transform) architecture with strong emphasis on reconciliation, validation, and governance to handle heterogeneous metadata like Yale LuX’s challenges.[alation +1]

Ingestion Layer

• Extract from diverse sources (CMS exports, APIs, CSVs) using tools like Apache Airflow or Dagster for scheduling/orchestration.

• Load raw into a staging lake (e.g., S3, MinIO) preserving originals for auditability—avoid early transformation to capture implicit data.[alation]

• Handle batch (daily dumps) + streaming (real-time updates) via Kafka for hybrid needs.[striim]

Profiling & Cleaning

• Auto-profile schemas with Great Expectations or OpenRefine APIs to detect inconsistencies (e.g., date formats, nulls).

• Normalize basics: standardize vocabularies, extract entities (NLP via spaCy), cluster variants (e.g., “Andy Warhol” → canonical form).

• Flag low-quality records for human review; aim for 80% automation [prior context].

Reconciliation & Enrichment

• Core LOD step: Match entities to authorities (Wikidata, Getty ULAN, GeoNames) using fuzzy matching (RecordLinkage lib) + ML (sentence-transformers).

• Adopt Linked Art/CIDOC-CRM profiles: map to scoped classes (e.g., crm:E22_Person) with tiered completeness—core fields first [prior context].

• Enrich via SPARQL queries; resolve ambiguities with confidence scores (>90% auto-approve, else queue).[dagster]

Transformation to LOD

• Convert to RDF triples using custom mappers (Python/RDFlib) or Sinopia/Linked Art tools.

• Build knowledge graph: infer relationships (e.g., “painted by” via proximity in records).

• Validate with SHACL shapes for LOD compliance (no orphans, proper URIs).

Storage & Serving

• Triplestore: Blazegraph, Stardog, or GraphDB for scalable querying.

• Use multi-model DB (e.g., MarkLogic like LuX) for hybrid text/graph [prior context].

• Expose via SPARQL endpoint + IIIF for images; add faceting for usability.

Orchestration & Monitoring

• Pipeline Orchestrator: LangGraph/CrewAI agents manage workflows—auto-generate subtasks, retry failures, escalate to humans [prior context].

• Observability: Prometheus/Grafana for lineage, metrics (reconciliation accuracy, throughput).

• Governance: Automated audits, versioning (Git for schemas), zero-ETL federation for external sources.[alation]

Best Practices

• Modular microservices: Independent stages scale via Kubernetes.[dagster]

• Idempotent & recoverable: Reprocess from any point without duplicates.

• Portfolio Tip: Deploy on GitHub with Streamlit UI demoing end-to-end on sample data, yielding queryable LOD endpoint. This mirrors LuX but automates pain points for 10x faster iteration.[ones]

Goals of a Linked Open Data (LOD) pipeline center on transforming heterogeneous, siloed metadata into interoperable, queryable RDF triples that reveal connections across collections, enabling discovery like Yale LuX while ensuring FAIR principles (Findable, Accessible, Interoperable, Reusable) [prior context].[youtube]

Primary Goals

• Interoperability: Map local schemas (e.g., museum CMS) to standards like Linked Art/CIDOC-CRM, reconciling entities to authorities (Wikidata, Getty) for linked graphs.

• Data Quality: Automate 80%+ of cleaning/reconciliation to resolve completeness-usability tradeoffs, flagging edge cases for humans.

• Scalability & Discoverability: Handle 10k+ records efficiently, exposing SPARQL endpoints, IIIF images, and faceted search for serendipitous exploration.

• Governance: Track lineage, validate SHACL compliance, and support iterative improvements without reprocessing everything.

Scope Boundaries

In Scope:

• Ingestion from CSVs/APIs/databases; profiling/normalization; entity reconciliation/enrichment; RDF transformation; triplestore loading; monitoring/orchestration.

• LOD-specific: Knowledge graph inference, tiered metadata profiles, hybrid batch/streaming for updates.

Out of Scope:

• Upstream collection management (focus on export processing).

• End-user apps (deliver queryable backend only).

• Non-cultural data unless extended (e.g., stick to art/archives like LuX).

• Full ML training (use off-shelf for reconciliation).

Success Metrics

• 90%+ reconciliation accuracy; <1hr batch runtime for 10k records; 100% SHACL validation pass rate. Portfolio demo: End-to-end on sample data yielding live SPARQL query [prior context].

Deploying a Linked Open Data (LOD) pipeline requires scalable, containerized infrastructure with orchestration for ETL, triplestores, and monitoring, optimized for metadata-heavy workloads like Yale LuX [prior context].

Core Tech Stack

• Orchestration: Dagster or Apache Airflow (Python-native for LOD scripting with RDFlib).

• Triplestore: GraphDB (free community edition) or Blazegraph—scales to millions of triples, SPARQL 1.1 compliant.

• Staging/Storage: MinIO (S3-compatible) for raw CSVs/JSON; Postgres for metadata lineage.

• Reconciliation: OpenRefine server (Dockerized) + Python (RecordLinkage, sentence-transformers for fuzzy matching).

• Validation: SHACL shapes via PySHACL; Great Expectations for data quality.

• Agents/UI: LangGraph for task orchestration; Streamlit for pipeline dashboard.

Deployment Options (From MVP to Prod)

Local/Dev (Free, Portfolio-Ready)

docker-compose up # Single YAML: Airflow + GraphDB + MinIO + OpenRefine

• GitHub repo with `docker-compose.yml`, sample Yale-like CSV → live SPARQL demo.

• Run: `docker-compose up`; query at `http://localhost:7200/sparql`.

Cloud Kubernetes (Scalable, ~$50-200/mo)

Provider: DigitalOcean Kubernetes (DOKS) or AWS EKS—easy Helm charts.

Helm deploy:

helm repo add apache-airflow https://airflow.apache.org

helm install lod-pipeline apache-airflow/airflow --values values.yaml

helm install graphdb bitnami/graphdb

LOD-Specific Optimizations

• Multi-Model: GraphDB handles text/graph hybrid like LuX’s MarkLogic.

• IIIF: Add Cantaloupe server for image serving.

• Federation: SPARQL SERVICE for Wikidata/GeoNames without full ingest.

• Security: OAuth2 on SPARQL (GraphDB plugin); no gateway needed initially.

This setup deploys a full LOD pipeline in <1 day, handles 100k+ records, and shines in portfolios with live demos—directly addressing LuX data challenges at fraction of custom cost [prior context].

AI/agent quick endpoints

/api/docs/manifest — current catalog with IDs, anchors, source timestamps, and checksums.
/api/docs/content?path=roadmap.md — roadmap JSON payload + rendered HTML for downstream tooling.

Meta Museum Docs (Live Markdown Source)

Catalog (70)

Current Document: LODPipeline.md

AI/agent quick endpoints