Catalog (70)

IDDocumentUpdatedAnchorsSHA
agents/ag2-extraction-notesAG2 Extraction Notes
agents/ag2-extraction-notes.md
10/20/2018, 1:46:40 AM11e8d0072ebec1
asset-provenanceAsset Provenance
asset-provenance.md
10/20/2018, 1:46:40 AM41025c0acc117
closeout-notesAI-RSI one-click closeout notes
closeout-notes.md
10/20/2018, 1:46:40 AM21f560f6a8535
content-credibility-engineContent Credibility Engine
content-credibility-engine.md
10/20/2018, 1:46:40 AM8d9aa32358670
demo-scriptDemo Video — Shot List & Script (60–90s)
demo-script.md
10/20/2018, 1:46:40 AM2131ddae42e6e
deploymentDeployment — Vercel + Render
deployment.md
10/20/2018, 1:46:40 AM84911b1f459b5
development-roadmapMeta Museum Development Roadmap
development-roadmap.md
10/20/2018, 1:46:40 AM23624a8a089d72
development/aidd-tddAIDD + TDD Discipline
development/aidd-tdd.md
10/20/2018, 1:46:40 AM5cd0a0524525a
envEnvironment Variables
env.md
10/20/2018, 1:46:40 AM109c18634cab1a
evals/golden-museum-questionsGolden Eval Dataset: Complex Museum Questions
evals/golden-museum-questions.md
10/20/2018, 1:46:40 AM62876a2b5e78d
linked-art/conformance-matrixLinked Art 1.0 — Conformance Matrix
linked-art/conformance-matrix.md
10/20/2018, 1:46:40 AM553ff87000bf4
linked-art/Linked%20Art%20NotesLinked Art Notes.md
linked-art/Linked Art Notes.md
10/20/2018, 1:46:40 AM0aca66d51107b
linked-art/Linked%20Open%20Art%20Data%20Web%20App%20-%20Must-have%20Data%20SourcesLinked Open Art Data Web App (AI) — Must-have Data Sources
linked-art/Linked Open Art Data Web App - Must-have Data Sources.md
10/20/2018, 1:46:40 AM77b7d350fe8a0
linked-art/LinkedArtAppFeatures🏛️ Art Explorer: Linked Art Application & Ecosystem
linked-art/LinkedArtAppFeatures.md
10/20/2018, 1:46:40 AM14e23b890ecd2a
linked-art/LinkedArtChallengesLinkedArtChallenges.md
linked-art/LinkedArtChallenges.md
10/20/2018, 1:46:40 AM0d8c987070277
linked-art/LinkedArtCollaborationLinkedArtCollaboration.md
linked-art/LinkedArtCollaboration.md
10/20/2018, 1:46:40 AM114ccf63edef3
linked-art/LinkedArtDashboardLinkedArtDashboard.md
linked-art/LinkedArtDashboard.md
10/20/2018, 1:46:40 AM06d04d4b2bf79
linked-art/LinkedArtFeatureRoadmapFeature Roadmap for Linked Open Art Data Apps
linked-art/LinkedArtFeatureRoadmap.md
10/20/2018, 1:46:40 AM8ac10d8e79c20
linked-art/LinkedArtJobReadyLinkedArtJobReady.md
linked-art/LinkedArtJobReady.md
10/20/2018, 1:46:40 AM0c60b357bcb87
linked-art/LinkedArtModel1.0-ReferenceLinked Art Model 1.0 Reference (Round 1)
linked-art/LinkedArtModel1.0-Reference.md
10/20/2018, 1:46:40 AM344e6d48d474b3e
linked-art/LinkedArtPatternsLinkedArtPatterns.md
linked-art/LinkedArtPatterns.md
10/20/2018, 1:46:40 AM0d45bbbb02d70
linked-art/LinkedArtPRD🖼️ Product Requirements Document
linked-art/LinkedArtPRD.md
10/20/2018, 1:46:40 AM2091bc1f37307c
linked-art/LinkedArtRoadmapLinkedArtRoadmap.md
linked-art/LinkedArtRoadmap.md
10/20/2018, 1:46:40 AM0e52e71c6bd28
linked-art/LinkedArtSaaSLinkedArtSaaS.md
linked-art/LinkedArtSaaS.md
10/20/2018, 1:46:40 AM03d260738fb29
linked-art/LinkedArtSoftwareCode and Tools
linked-art/LinkedArtSoftware.md
10/20/2018, 1:46:40 AM89e8fef24aea9
linked-art/LinkedArtSOTAWebAppLinkedArt SOTA Web App — Master Build Specification
linked-art/LinkedArtSOTAWebApp.md
10/20/2018, 1:46:40 AM129a5f0baca89c6
linked-art/LinkedArtUnmetNeedsLinkedArtUnmetNeeds.md
linked-art/LinkedArtUnmetNeeds.md
10/20/2018, 1:46:40 AM0cb35fac29cc1
linked-art/LinkedArtUseCasesLinkedArtUseCases.md
linked-art/LinkedArtUseCases.md
10/20/2018, 1:46:40 AM05c572ce8e7f3
linked-art/LinkedArtWidgetsLinkedArtWidgets.md
linked-art/LinkedArtWidgets.md
10/20/2018, 1:46:40 AM0b39911c7d97d
linked-art/LinkedDesignLinkedDesign.md
linked-art/LinkedDesign.md
10/20/2018, 1:46:40 AM00a02240471e5
linked-art/LODEngineLODEngine.md
linked-art/LODEngine.md
10/20/2018, 1:46:40 AM0ef73426f80db
linked-art/LODPipelineLODPipeline.md
linked-art/LODPipeline.md
10/20/2018, 1:46:40 AM0fe95e61ed9da
linked-art/LODToolsLODTools.md
linked-art/LODTools.md
10/20/2018, 1:46:40 AM03167947fc4e4
linked-art/SPARQLSPARQL.md
linked-art/SPARQL.md
10/20/2018, 1:46:40 AM050e00ed51733
linked-art/VocabulariesVocabularies.md
linked-art/Vocabularies.md
10/20/2018, 1:46:40 AM0e0574a338aaa
linked-art/YaleLuxYaleLux.md
linked-art/YaleLux.md
10/20/2018, 1:46:40 AM074fd47fae749
meta-wiki-art-bridgeMeta Wiki Art Bridge (MediaWiki + Wikibase)
meta-wiki-art-bridge.md
10/20/2018, 1:46:40 AM77a43fb0c48b8
ops/activity-adoption-proofActivity Feed Adoption Proof Runbook
ops/activity-adoption-proof.md
10/20/2018, 1:46:40 AM568a80b43ae58
ops/ag2-workerAG2 Worker and Bridge Runbook
ops/ag2-worker.md
10/20/2018, 1:46:40 AM950efcd4e3318
ops/auth-credential-rotationAuth credential rotation runbook
ops/auth-credential-rotation.md
10/20/2018, 1:46:40 AM4449b8b8eecb6
ops/deployment-preflightDeployment Preflight Runbook
ops/deployment-preflight.md
10/20/2018, 1:46:40 AM5ac60432d0aed
ops/era-c-exit-gate-evidenceEra C Exit-Gate Evidence Pack
ops/era-c-exit-gate-evidence.md
10/20/2018, 1:46:40 AM6656b9c7f85c6
ops/go-live-checklistGo-Live & Evidence-Pipeline Checklist
ops/go-live-checklist.md
10/20/2018, 1:46:40 AM6ae7f5d71f7dc
ops/k6-slok6 SLO Load Test (SOTA §20.4)
ops/k6-slo.md
10/20/2018, 1:46:40 AM4328b5b3163d4
ops/kpi-evidenceSOTA §26 KPI Evidence Input
ops/kpi-evidence.md
10/20/2018, 1:46:40 AM5d7b2973d2927
ops/launch-reviewLaunch Review Packet
ops/launch-review.md
10/20/2018, 1:46:40 AM5880e41ebcbe3
ops/managed-linked-art-pilot-runbookManaged Linked Art Pilot Runbook
ops/managed-linked-art-pilot-runbook.md
10/20/2018, 1:46:40 AM11d4f125c2ddae
ops/otel-localLocal OpenTelemetry Wiring (Tempo / Jaeger)
ops/otel-local.md
10/20/2018, 1:46:40 AM51ebbc3b33f92
ops/outbox-projectorTransactional Outbox Projector (Postgres -> Solr/GraphDB)
ops/outbox-projector.md
10/20/2018, 1:46:40 AM5dc70ad766471
ops/procurement-readiness-packetProcurement Readiness Packet
ops/procurement-readiness-packet.md
10/20/2018, 1:46:40 AM9c5685e82cca7
ops/reconciliation-serviceReconciliation Service (C2)
ops/reconciliation-service.md
10/20/2018, 1:46:40 AM605162c313ea9
ops/search-graph-provisioningSolr 9 + GraphDB Provisioning
ops/search-graph-provisioning.md
10/20/2018, 1:46:40 AM6fc1b15279a84
ops/security-dr-drillPen Test Baseline + DR Drill Runbook
ops/security-dr-drill.md
10/20/2018, 1:46:40 AM3a766ef3e2afc
progress/2026-05-31/era-c-readiness-snapshotEra C Readiness Snapshot (May 31, 2026)
progress/2026-05-31/era-c-readiness-snapshot.md
10/20/2018, 1:46:40 AM39672614ceb53
progress/era-historyMeta Museum — Era Delivery History
progress/era-history.md
10/20/2018, 1:46:40 AM47cc030755d1e5
providers/harvard-art-museumsHarvard Art Museums API Integration Plan
providers/harvard-art-museums.md
10/20/2018, 1:46:40 AM11fa8b980154f5
providers/louvre-collections-jsonLouvre Collections JSON Integration Plan
providers/louvre-collections-json.md
10/20/2018, 1:46:40 AM11775f91a8d813
providers/nga-open-dataNational Gallery of Art (NGA) Open Data Integration Plan
providers/nga-open-data.md
10/20/2018, 1:46:40 AM1151c4807c8de0
providers/princeton-art-museumPrinceton University Art Museum API Integration Plan
providers/princeton-art-museum.md
10/20/2018, 1:46:40 AM11c8823f65ee41
providers/rkd-knowledge-graphRKD Knowledge Graph Integration Plan
providers/rkd-knowledge-graph.md
10/20/2018, 1:46:40 AM162b4b42f2ad42
providers/smithsonian-open-accessSmithsonian Open Access Integration Plan
providers/smithsonian-open-access.md
10/20/2018, 1:46:40 AM12db1ffa4cab02
providers/vanda-collections-apiVictoria and Albert Museum (V&A) Collections API Integration Plan
providers/vanda-collections-api.md
10/20/2018, 1:46:40 AM11755d93972233
qualityQuality & Performance
quality.md
10/20/2018, 1:46:40 AM6174add040960
reconciliation/exhibition-literature-reconciliationExhibition + Literature Reconciliation (B6.1)
reconciliation/exhibition-literature-reconciliation.md
10/20/2018, 1:46:40 AM7293e9d81dd7c
responsible-aiResponsible AI
responsible-ai.md
10/20/2018, 1:46:40 AM8f90006650821
risk-registerRisk Register
risk-register.md
10/20/2018, 1:46:40 AM4becb213d5c5e
roadmap-to-10Roadmap to 10/10
roadmap-to-10.md
10/20/2018, 1:46:40 AM1540a11000dc7e
roadmapMeta Museum Roadmap
roadmap.md
10/20/2018, 1:46:40 AM18145d0cbbe54a
rsi-wikiAI-RSI compounding wiki
rsi-wiki.md
10/20/2018, 1:46:40 AM8b64914fe6f20
wikibase-cloud-migration-checklistWikibase Cloud -> Self-Host Migration Checklist
wikibase-cloud-migration-checklist.md
10/20/2018, 1:46:40 AM12170657fcbf2b

    Current Document: LODPipeline.md

    Source updated 10/20/2018, 1:46:40 AM · SHA-256 fe95e61ed9da · 82 lines

    Canonical ID: linked-art/LODPipeline

    JSON for this doc:/api/docs/content?path=linked-art/LODPipeline.md

    Human link:/docs?doc=linked-art%2FLODPipeline.md

    Canonical API endpoint:/api/docs/content?path=linked-art%2FLODPipeline.md

    Sections (stable anchors):

    No detectable headings.

    Optimal Linked Open Data (LOD) pipelines follow a modular ELT (Extract-Load-Transform) architecture with strong emphasis on reconciliation, validation, and governance to handle heterogeneous metadata like Yale LuX’s challenges.[alation +1]

    1. Ingestion Layer

    • Extract from diverse sources (CMS exports, APIs, CSVs) using tools like Apache Airflow or Dagster for scheduling/orchestration.

    • Load raw into a staging lake (e.g., S3, MinIO) preserving originals for auditability—avoid early transformation to capture implicit data.[alation]

    • Handle batch (daily dumps) + streaming (real-time updates) via Kafka for hybrid needs.[striim]

    1. Profiling & Cleaning

    • Auto-profile schemas with Great Expectations or OpenRefine APIs to detect inconsistencies (e.g., date formats, nulls).

    • Normalize basics: standardize vocabularies, extract entities (NLP via spaCy), cluster variants (e.g., “Andy Warhol” → canonical form).

    • Flag low-quality records for human review; aim for 80% automation [prior context].

    1. Reconciliation & Enrichment

    • Core LOD step: Match entities to authorities (Wikidata, Getty ULAN, GeoNames) using fuzzy matching (RecordLinkage lib) + ML (sentence-transformers).

    • Adopt Linked Art/CIDOC-CRM profiles: map to scoped classes (e.g., crm:E22_Person) with tiered completeness—core fields first [prior context].

    • Enrich via SPARQL queries; resolve ambiguities with confidence scores (>90% auto-approve, else queue).[dagster]

    1. Transformation to LOD

    • Convert to RDF triples using custom mappers (Python/RDFlib) or Sinopia/Linked Art tools.

    • Build knowledge graph: infer relationships (e.g., “painted by” via proximity in records).

    • Validate with SHACL shapes for LOD compliance (no orphans, proper URIs).

    1. Storage & Serving

    • Triplestore: Blazegraph, Stardog, or GraphDB for scalable querying.

    • Use multi-model DB (e.g., MarkLogic like LuX) for hybrid text/graph [prior context].

    • Expose via SPARQL endpoint + IIIF for images; add faceting for usability.

    1. Orchestration & Monitoring

    • Pipeline Orchestrator: LangGraph/CrewAI agents manage workflows—auto-generate subtasks, retry failures, escalate to humans [prior context].

    • Observability: Prometheus/Grafana for lineage, metrics (reconciliation accuracy, throughput).

    • Governance: Automated audits, versioning (Git for schemas), zero-ETL federation for external sources.[alation]

    Best Practices

    • Modular microservices: Independent stages scale via Kubernetes.[dagster]

    • Idempotent & recoverable: Reprocess from any point without duplicates.

    • Portfolio Tip: Deploy on GitHub with Streamlit UI demoing end-to-end on sample data, yielding queryable LOD endpoint. This mirrors LuX but automates pain points for 10x faster iteration.[ones]

    Goals of a Linked Open Data (LOD) pipeline center on transforming heterogeneous, siloed metadata into interoperable, queryable RDF triples that reveal connections across collections, enabling discovery like Yale LuX while ensuring FAIR principles (Findable, Accessible, Interoperable, Reusable) [prior context].[youtube]

    Primary Goals

    • Interoperability: Map local schemas (e.g., museum CMS) to standards like Linked Art/CIDOC-CRM, reconciling entities to authorities (Wikidata, Getty) for linked graphs.

    • Data Quality: Automate 80%+ of cleaning/reconciliation to resolve completeness-usability tradeoffs, flagging edge cases for humans.

    • Scalability & Discoverability: Handle 10k+ records efficiently, exposing SPARQL endpoints, IIIF images, and faceted search for serendipitous exploration.

    • Governance: Track lineage, validate SHACL compliance, and support iterative improvements without reprocessing everything.

    Scope Boundaries

    In Scope:

    • Ingestion from CSVs/APIs/databases; profiling/normalization; entity reconciliation/enrichment; RDF transformation; triplestore loading; monitoring/orchestration.

    • LOD-specific: Knowledge graph inference, tiered metadata profiles, hybrid batch/streaming for updates.

    Out of Scope:

    • Upstream collection management (focus on export processing).

    • End-user apps (deliver queryable backend only).

    • Non-cultural data unless extended (e.g., stick to art/archives like LuX).

    • Full ML training (use off-shelf for reconciliation).

    Success Metrics

    • 90%+ reconciliation accuracy; <1hr batch runtime for 10k records; 100% SHACL validation pass rate. Portfolio demo: End-to-end on sample data yielding live SPARQL query [prior context].

    Deploying a Linked Open Data (LOD) pipeline requires scalable, containerized infrastructure with orchestration for ETL, triplestores, and monitoring, optimized for metadata-heavy workloads like Yale LuX [prior context].

    Core Tech Stack

    • Orchestration: Dagster or Apache Airflow (Python-native for LOD scripting with RDFlib).

    • Triplestore: GraphDB (free community edition) or Blazegraph—scales to millions of triples, SPARQL 1.1 compliant.

    • Staging/Storage: MinIO (S3-compatible) for raw CSVs/JSON; Postgres for metadata lineage.

    • Reconciliation: OpenRefine server (Dockerized) + Python (RecordLinkage, sentence-transformers for fuzzy matching).

    • Validation: SHACL shapes via PySHACL; Great Expectations for data quality.

    • Agents/UI: LangGraph for task orchestration; Streamlit for pipeline dashboard.

    Deployment Options (From MVP to Prod)

    1. Local/Dev (Free, Portfolio-Ready)

    docker-compose up # Single YAML: Airflow + GraphDB + MinIO + OpenRefine

    • GitHub repo with `docker-compose.yml`, sample Yale-like CSV → live SPARQL demo.

    • Run: `docker-compose up`; query at `http://localhost:7200/sparql`.

    1. Cloud Kubernetes (Scalable, ~$50-200/mo)

    Provider: DigitalOcean Kubernetes (DOKS) or AWS EKS—easy Helm charts.

    Helm deploy:

    helm repo add apache-airflow https://airflow.apache.org

    helm install lod-pipeline apache-airflow/airflow --values values.yaml

    helm install graphdb bitnami/graphdb

    LOD-Specific Optimizations

    • Multi-Model: GraphDB handles text/graph hybrid like LuX’s MarkLogic.

    • IIIF: Add Cantaloupe server for image serving.

    • Federation: SPARQL SERVICE for Wikidata/GeoNames without full ingest.

    • Security: OAuth2 on SPARQL (GraphDB plugin); no gateway needed initially.

    This setup deploys a full LOD pipeline in <1 day, handles 100k+ records, and shines in portfolios with live demos—directly addressing LuX data challenges at fraction of custom cost [prior context].

    AI/agent quick endpoints