Meta Museum Documentation

Catalog (70)

ID	Document	Updated	Anchors	SHA
agents/ag2-extraction-notes	AG2 Extraction Notes agents/ag2-extraction-notes.md	10/20/2018, 1:46:40 AM	11	e8d0072ebec1
asset-provenance	Asset Provenance asset-provenance.md	10/20/2018, 1:46:40 AM	4	1025c0acc117
closeout-notes	AI-RSI one-click closeout notes closeout-notes.md	10/20/2018, 1:46:40 AM	2	1f560f6a8535
content-credibility-engine	Content Credibility Engine content-credibility-engine.md	10/20/2018, 1:46:40 AM	8	d9aa32358670
demo-script	Demo Video — Shot List & Script (60–90s) demo-script.md	10/20/2018, 1:46:40 AM	2	131ddae42e6e
deployment	Deployment — Vercel + Render deployment.md	10/20/2018, 1:46:40 AM	8	4911b1f459b5
development-roadmap	Meta Museum Development Roadmap development-roadmap.md	10/20/2018, 1:46:40 AM	23	624a8a089d72
development/aidd-tdd	AIDD + TDD Discipline development/aidd-tdd.md	10/20/2018, 1:46:40 AM	5	cd0a0524525a
env	Environment Variables env.md	10/20/2018, 1:46:40 AM	10	9c18634cab1a
evals/golden-museum-questions	Golden Eval Dataset: Complex Museum Questions evals/golden-museum-questions.md	10/20/2018, 1:46:40 AM	6	2876a2b5e78d
linked-art/conformance-matrix	Linked Art 1.0 — Conformance Matrix linked-art/conformance-matrix.md	10/20/2018, 1:46:40 AM	5	53ff87000bf4
linked-art/Linked%20Art%20Notes	Linked Art Notes.md linked-art/Linked Art Notes.md	10/20/2018, 1:46:40 AM	0	aca66d51107b
linked-art/Linked%20Open%20Art%20Data%20Web%20App%20-%20Must-have%20Data%20Sources	Linked Open Art Data Web App (AI) — Must-have Data Sources linked-art/Linked Open Art Data Web App - Must-have Data Sources.md	10/20/2018, 1:46:40 AM	7	7b7d350fe8a0
linked-art/LinkedArtAppFeatures	🏛️ Art Explorer: Linked Art Application & Ecosystem linked-art/LinkedArtAppFeatures.md	10/20/2018, 1:46:40 AM	14	e23b890ecd2a
linked-art/LinkedArtChallenges	LinkedArtChallenges.md linked-art/LinkedArtChallenges.md	10/20/2018, 1:46:40 AM	0	d8c987070277
linked-art/LinkedArtCollaboration	LinkedArtCollaboration.md linked-art/LinkedArtCollaboration.md	10/20/2018, 1:46:40 AM	1	14ccf63edef3
linked-art/LinkedArtDashboard	LinkedArtDashboard.md linked-art/LinkedArtDashboard.md	10/20/2018, 1:46:40 AM	0	6d04d4b2bf79
linked-art/LinkedArtFeatureRoadmap	Feature Roadmap for Linked Open Art Data Apps linked-art/LinkedArtFeatureRoadmap.md	10/20/2018, 1:46:40 AM	8	ac10d8e79c20
linked-art/LinkedArtJobReady	LinkedArtJobReady.md linked-art/LinkedArtJobReady.md	10/20/2018, 1:46:40 AM	0	c60b357bcb87
linked-art/LinkedArtModel1.0-Reference	Linked Art Model 1.0 Reference (Round 1) linked-art/LinkedArtModel1.0-Reference.md	10/20/2018, 1:46:40 AM	344	e6d48d474b3e
linked-art/LinkedArtPatterns	LinkedArtPatterns.md linked-art/LinkedArtPatterns.md	10/20/2018, 1:46:40 AM	0	d45bbbb02d70
linked-art/LinkedArtPRD	🖼️ Product Requirements Document linked-art/LinkedArtPRD.md	10/20/2018, 1:46:40 AM	20	91bc1f37307c
linked-art/LinkedArtRoadmap	LinkedArtRoadmap.md linked-art/LinkedArtRoadmap.md	10/20/2018, 1:46:40 AM	0	e52e71c6bd28
linked-art/LinkedArtSaaS	LinkedArtSaaS.md linked-art/LinkedArtSaaS.md	10/20/2018, 1:46:40 AM	0	3d260738fb29
linked-art/LinkedArtSoftware	Code and Tools linked-art/LinkedArtSoftware.md	10/20/2018, 1:46:40 AM	8	9e8fef24aea9
linked-art/LinkedArtSOTAWebApp	LinkedArt SOTA Web App — Master Build Specification linked-art/LinkedArtSOTAWebApp.md	10/20/2018, 1:46:40 AM	129	a5f0baca89c6
linked-art/LinkedArtUnmetNeeds	LinkedArtUnmetNeeds.md linked-art/LinkedArtUnmetNeeds.md	10/20/2018, 1:46:40 AM	0	cb35fac29cc1
linked-art/LinkedArtUseCases	LinkedArtUseCases.md linked-art/LinkedArtUseCases.md	10/20/2018, 1:46:40 AM	0	5c572ce8e7f3
linked-art/LinkedArtWidgets	LinkedArtWidgets.md linked-art/LinkedArtWidgets.md	10/20/2018, 1:46:40 AM	0	b39911c7d97d
linked-art/LinkedDesign	LinkedDesign.md linked-art/LinkedDesign.md	10/20/2018, 1:46:40 AM	0	0a02240471e5
linked-art/LODEngine	LODEngine.md linked-art/LODEngine.md	10/20/2018, 1:46:40 AM	0	ef73426f80db
linked-art/LODPipeline	LODPipeline.md linked-art/LODPipeline.md	10/20/2018, 1:46:40 AM	0	fe95e61ed9da
linked-art/LODTools	LODTools.md linked-art/LODTools.md	10/20/2018, 1:46:40 AM	0	3167947fc4e4
linked-art/SPARQL	SPARQL.md linked-art/SPARQL.md	10/20/2018, 1:46:40 AM	0	50e00ed51733
linked-art/Vocabularies	Vocabularies.md linked-art/Vocabularies.md	10/20/2018, 1:46:40 AM	0	e0574a338aaa
linked-art/YaleLux	YaleLux.md linked-art/YaleLux.md	10/20/2018, 1:46:40 AM	0	74fd47fae749
meta-wiki-art-bridge	Meta Wiki Art Bridge (MediaWiki + Wikibase) meta-wiki-art-bridge.md	10/20/2018, 1:46:40 AM	7	7a43fb0c48b8
ops/activity-adoption-proof	Activity Feed Adoption Proof Runbook ops/activity-adoption-proof.md	10/20/2018, 1:46:40 AM	5	68a80b43ae58
ops/ag2-worker	AG2 Worker and Bridge Runbook ops/ag2-worker.md	10/20/2018, 1:46:40 AM	9	50efcd4e3318
ops/auth-credential-rotation	Auth credential rotation runbook ops/auth-credential-rotation.md	10/20/2018, 1:46:40 AM	4	449b8b8eecb6
ops/deployment-preflight	Deployment Preflight Runbook ops/deployment-preflight.md	10/20/2018, 1:46:40 AM	5	ac60432d0aed
ops/era-c-exit-gate-evidence	Era C Exit-Gate Evidence Pack ops/era-c-exit-gate-evidence.md	10/20/2018, 1:46:40 AM	6	656b9c7f85c6
ops/go-live-checklist	Go-Live & Evidence-Pipeline Checklist ops/go-live-checklist.md	10/20/2018, 1:46:40 AM	6	ae7f5d71f7dc
ops/k6-slo	k6 SLO Load Test (SOTA §20.4) ops/k6-slo.md	10/20/2018, 1:46:40 AM	4	328b5b3163d4
ops/kpi-evidence	SOTA §26 KPI Evidence Input ops/kpi-evidence.md	10/20/2018, 1:46:40 AM	5	d7b2973d2927
ops/launch-review	Launch Review Packet ops/launch-review.md	10/20/2018, 1:46:40 AM	5	880e41ebcbe3
ops/managed-linked-art-pilot-runbook	Managed Linked Art Pilot Runbook ops/managed-linked-art-pilot-runbook.md	10/20/2018, 1:46:40 AM	11	d4f125c2ddae
ops/otel-local	Local OpenTelemetry Wiring (Tempo / Jaeger) ops/otel-local.md	10/20/2018, 1:46:40 AM	5	1ebbc3b33f92
ops/outbox-projector	Transactional Outbox Projector (Postgres -> Solr/GraphDB) ops/outbox-projector.md	10/20/2018, 1:46:40 AM	5	dc70ad766471
ops/procurement-readiness-packet	Procurement Readiness Packet ops/procurement-readiness-packet.md	10/20/2018, 1:46:40 AM	9	c5685e82cca7
ops/reconciliation-service	Reconciliation Service (C2) ops/reconciliation-service.md	10/20/2018, 1:46:40 AM	6	05162c313ea9
ops/search-graph-provisioning	Solr 9 + GraphDB Provisioning ops/search-graph-provisioning.md	10/20/2018, 1:46:40 AM	6	fc1b15279a84
ops/security-dr-drill	Pen Test Baseline + DR Drill Runbook ops/security-dr-drill.md	10/20/2018, 1:46:40 AM	3	a766ef3e2afc
progress/2026-05-31/era-c-readiness-snapshot	Era C Readiness Snapshot (May 31, 2026) progress/2026-05-31/era-c-readiness-snapshot.md	10/20/2018, 1:46:40 AM	3	9672614ceb53
progress/era-history	Meta Museum — Era Delivery History progress/era-history.md	10/20/2018, 1:46:40 AM	47	cc030755d1e5
providers/harvard-art-museums	Harvard Art Museums API Integration Plan providers/harvard-art-museums.md	10/20/2018, 1:46:40 AM	11	fa8b980154f5
providers/louvre-collections-json	Louvre Collections JSON Integration Plan providers/louvre-collections-json.md	10/20/2018, 1:46:40 AM	11	775f91a8d813
providers/nga-open-data	National Gallery of Art (NGA) Open Data Integration Plan providers/nga-open-data.md	10/20/2018, 1:46:40 AM	11	51c4807c8de0
providers/princeton-art-museum	Princeton University Art Museum API Integration Plan providers/princeton-art-museum.md	10/20/2018, 1:46:40 AM	11	c8823f65ee41
providers/rkd-knowledge-graph	RKD Knowledge Graph Integration Plan providers/rkd-knowledge-graph.md	10/20/2018, 1:46:40 AM	16	2b4b42f2ad42
providers/smithsonian-open-access	Smithsonian Open Access Integration Plan providers/smithsonian-open-access.md	10/20/2018, 1:46:40 AM	12	db1ffa4cab02
providers/vanda-collections-api	Victoria and Albert Museum (V&A) Collections API Integration Plan providers/vanda-collections-api.md	10/20/2018, 1:46:40 AM	11	755d93972233
quality	Quality & Performance quality.md	10/20/2018, 1:46:40 AM	6	174add040960
reconciliation/exhibition-literature-reconciliation	Exhibition + Literature Reconciliation (B6.1) reconciliation/exhibition-literature-reconciliation.md	10/20/2018, 1:46:40 AM	7	293e9d81dd7c
responsible-ai	Responsible AI responsible-ai.md	10/20/2018, 1:46:40 AM	8	f90006650821
risk-register	Risk Register risk-register.md	10/20/2018, 1:46:40 AM	4	becb213d5c5e
roadmap-to-10	Roadmap to 10/10 roadmap-to-10.md	10/20/2018, 1:46:40 AM	15	40a11000dc7e
roadmap	Meta Museum Roadmap roadmap.md	10/20/2018, 1:46:40 AM	18	145d0cbbe54a
rsi-wiki	AI-RSI compounding wiki rsi-wiki.md	10/20/2018, 1:46:40 AM	8	b64914fe6f20
wikibase-cloud-migration-checklist	Wikibase Cloud -> Self-Host Migration Checklist wikibase-cloud-migration-checklist.md	10/20/2018, 1:46:40 AM	12	170657fcbf2b

Current Document: YaleLux.md

Source updated 10/20/2018, 1:46:40 AM · SHA-256 74fd47fae749 · 390 lines

Canonical ID: linked-art/YaleLux

JSON for this doc:/api/docs/content?path=linked-art/YaleLux.md

Human link:/docs?doc=linked-art%2FYaleLux.md

Canonical API endpoint:/api/docs/content?path=linked-art%2FYaleLux.md

Sections (stable anchors):

No detectable headings.

Based on the paper provided regarding Yale University’s LUX project, here are the key technical details for app design and development, categorized by architecture, data modeling, and performance optimization.

Data Modeling: The Linked Art Paradigm

The core of the system is built on Linked Open Usable Data (LOUD) principles, specifically using the Linked Art profile. This ensures cross-domain consistency (Art, Natural History, Archives, Libraries).

• Ontology Foundation: Built on CIDOC-CRM but simplified for developer usability.

• Format: JSON-LD is used to make data accessible to standard web developers without deep semantic web expertise.

• Core Classes:

• HumanMadeObject: Physical cultural items.

• DigitalObject: Digital files/surrogates.

• LinguisticObject (Texts) & VisualItem (Images): Separates the "work" from the physical carrier.

• Set: Used for archival collections and hierarchies.

• Activity: Connects actors, places, and times (e.g., provenance, creation, exhibitions).

• Handling Complexity:

• Natural History: Geological and biological specimens are mapped to HumanMadeObject for consistency, or Type for taxonomic hierarchies.

• Archives: Modeled as nested Sets to separate physical location from intellectual arrangement.

System Architecture & Technology Stack

Yale rejected a pure "triplestore" approach in favor of a Multi-Modal Database architecture. This hybrid approach handles both document retrieval (text search) and graph traversal (relationship discovery).

A. The Database Engine

• Technology: MarkLogic (Commercial).

• Rationale:

• It supports CTS (Core Text Search) for high-speed document retrieval and relevancy scoring.

• It natively stores JSON-LD and triples, allowing for graph queries (SPARQL) to be joined with text queries.

• Performance Note: Yale found MarkLogic’s internal CTS query language significantly faster than standard SPARQL for their specific use cases.

B. The Ingestion Pipeline

Data is harvested from disparate systems (museum CMS, library catalogs) and normalized.

• Synchronization Protocol: IIIF Change Discovery API (based on W3C Activity Streams 2.0). This allows the harvester to "walk" backward through time to find record updates.

• Processing Stack:

• Language: Python.

• Caching/State: PostgreSQL (stores intermediate record states) and Redis (Key/Value store for fast URI concordance/matching).

• Reconciliation: An algorithm matches internal records with external authorities (Getty, Wikidata, LOC) to enrich data and merge duplicate entities into a single "Hub" page.

Front-End Application Design

The discovery interface (LUX) is a Single Page Application (SPA) designed to decouple the user interface from the complex backend logic.

• Framework: React.

• API Strategy:

• HAL (Hypertext Application Language): The backend returns responses with "named links" (_links). The front end simply follows these links rather than constructing complex query strings. This decouples the frontend code from the backend query syntax.

• Activity Streams Pattern: Search results follow the "Paged Collection" pattern (next/previous links) for standard pagination.

• Component Reuse: The UI components are built around data patterns (e.g., a "Name" component, a "Location" component) rather than domain specific pages, allowing the same code to render a fossil, a painting, or a book.

Performance Optimization & Design Lessons

Yale learned that pure graph traversal is too slow for a consumer-facing application at the scale of 41 million records (expanding to 2 billion triples).

• The "Record" Concept: Despite using a graph, they maintain the concept of a "Record" (a pre-constructed JSON-LD document). Constructing sub-graphs on the fly was computationally too expensive.

• Shortcut Triples: The pipeline injects artificial "shortcut" triples to reduce join depth.

• Example: Instead of Object -> Produced_by -> Part -> Carried_out_by -> Person (3 hops), they index lux:agentOfProduction (1 hop).

• Aggressive Caching:

• Static JSON-LD: The system relies on retrieving static JSON-LD representations from web caches rather than dynamic database hits whenever possible.

• Query Caching: Because queries are provided via HAL links in the response, the query links themselves are cached.

Next Step

Would you like me to draft a technical requirements document or a data mapping template based on the Linked Art structure described in this text?

Yale LUX project. This analysis focuses on the engineering patterns and architectural decisions that made the project viable.

The Core Data Paradigm: "Linked Open Usable Data" (LOUD)

The foundational technical decision was to prioritize usability over semantic perfection. Yale adopted Linked Art, a profile of the CIDOC-CRM ontology, to unify disparate collections (Art, Natural History, Archives, and Libraries).

Unified Data Modeling

The system normalizes 41 million records from different domains into a single conceptual model using JSON-LD. This allows a single frontend codebase to render diverse content without domain-specific logic.

• Shared Classes: "Dinosaur fossils" (Science) and "Paintings" (Art) are both modeled as HumanMadeObject.

• Separation of Content: The model strictly separates the physical item (HumanMadeObject) from the intellectual work it carries (LinguisticObject or VisualItem). This handles complex library cases where one book (object) contains multiple texts (works).

• Archival Hierarchies: Archives are modeled as nested Sets to separate the physical location of a box from its intellectual arrangement in a finding aid.

The "Record" Abstraction

Crucially, the team rejected the pure semantic web notion that "there are no records, only a graph."

• Engineering Reality: To build a performant app, the system maintains the concept of a Record—a discrete, self-contained JSON-LD document representing an entity (e.g., a Person or an Object).

• Benefit: This allows for standard document-based indexing, caching, and retrieval, which is significantly faster than dynamically assembling sub-graphs at runtime.

System Architecture: The Multi-Modal Strategy

Yale moved away from "pure" triplestores, finding them too slow and complex for consumer-facing apps. Instead, they adopted a Multi-Modal Architecture.

Hybrid Database Engine

The system uses MarkLogic, which functions simultaneously as:

A Document Store: Stores the JSON-LD records for fast retrieval and text search.

A Graph Store: Indexes the RDF triples embedded in the JSON-LD for relationship traversal.

Why this matters: This allows the application to use the "right tool for the job." It uses CTS (Core Text Search) for high-speed keyword queries and relevance ranking, and Graph Queries to find connections (e.g., "Find all students of this artist").

Ingestion Pipeline

The data pipeline is designed for massive scale and eventual consistency.

• Synchronization: It uses the IIIF Change Discovery API (based on W3C Activity Streams) to "crawl" for updates rather than relying on full dumps.

• Processing Stack: The pipeline is written in Python, using PostgreSQL for intermediate state caching and Redis for high-speed URI matching.

• Reconciliation: An automated process merges records from internal systems with external authorities (Getty, LOC, Wikidata) to create a single "Hub" record for every person, place, and concept.

Performance Optimization Techniques

Performance was a primary requirement, leading to several clever engineering shortcuts.

Index-Time "Shortcut" Triples

Graph queries that require multiple "hops" (joins) are expensive. To solve this, the pipeline materializes artificial triples during ingestion.

• The Problem: Finding the creator of an object normally requires traversing: Object → Production Event → Part of Event → Carried out by → Person.

• The Fix: The pipeline injects a shortcut triple: Object → lux:agentOfProduction → Person.

• Result: Runtime queries become instantaneous single-hop lookups.

Aggressive Caching Strategy

The application architecture assumes that computing facets and graph connections is expensive, so it avoids doing it live whenever possible.

• Pre-Computation: Complex sub-graphs needed for UI rendering are materialized into the static JSON-LD record.

• Web Caching: The architecture relies heavily on standard web caches (CDNs, browser caches) to serve the static JSON-LD files.

• Query Caching: Because queries are predefined (see HAL below), the search results themselves can be aggressively cached.

Frontend & API Design: Decoupling via Hypermedia

The most distinct design pattern in LUX is the complete decoupling of the frontend UI from the backend query logic.

Hypertext Application Language (HAL)

Instead of the frontend constructing complex SPARQL or SQL queries, the backend API returns Named Links in the response.

• Mechanism: A record for "Claude Monet" includes a _links section with pre-built URLs like search_works_by_agent.

• Benefit: The React frontend simply renders a link. If the backend team changes the database query language or schema, they only update the link generation logic on the server. The frontend code remains untouched.

Component-Based UI

Because the data is normalized to Linked Art, the UI uses reusable components based on data patterns rather than domain types.

• A "Date/Time" component handles the TimeSpan pattern.

• A "Name" component handles the identified_by pattern.

• These components are composed to build pages, meaning the same code renders a fossil, a painting, or a book.

Lessons for Developers

Consistency over Completeness

The team found that strict data consistency (forcing everything into the Linked Art model) was more valuable than capturing every nuance of the original data. This consistency reduced frontend development time and bugs.

"Follow Your Nose" Discovery

The user experience is designed around navigation, not just search. The API supports this by ensuring bidirectional linking is discoverable. Instead of every Book linking to "Concept: Physics," the Concept page searches for "items that link to me," preventing massive overhead on the Concept record itself.

Developer Experience (DX)

The choice of JSON-LD and Python was intentional to ensure that "junior" developers could contribute without needing deep expertise in semantic web standards like RDF/XML or SPARQL.

Next Step

Would you like me to generate a JSON-LD code snippet demonstrating how a specific item (like a Book or Painting) would be modeled with the "Shortcut Triples" mentioned above?

💡 Yale LUX Project: In-Depth Technical Extraction

Linked Art Data Model and Cross-Domain Details

The LUX system is built on the Linked Art metadata application profile, a constrained subset and extension of the CIDOC Conceptual Reference Model (CIDOC-CRM), ensuring a consistent conceptual model across diverse collections.

Core Classes and Concepts

Linked Art uses a set of classes that serve as the foundational entities for the knowledge graph (KG):

• Item Entities:

• HumanMadeObject (physical collection items, even repaired natural specimens).

• DigitalObject (digital files, e.g., web home pages).

• Work Entities (Intellectual/Conceptual):

• LinguisticObject (text-based works, e.g., a book's text).

• VisualItem (image-based works, e.g., the visual content of a photograph).

• Contextual Entities ("Hub Pages"):

• Person and Group (organizations).

• Place (geographic locations).

• Activity (events like exhibitions, provenance, production/creation).

• Modeling Extensions:

• Set: A small Linked Art extension to CIDOC-CRM, representing a conceptual collection or aggregation (e.g., archival collections, departmental sets) capable of containing members of different types (objects, digital items, other sets).

Relationships and Data Patterns

The model is structured to separate physical holdings from intellectual works, crucial for library data:

• Holding-to-Work: HumanMadeObject can carry a LinguisticObject or show a VisualItem. DigitalObject uses digitally_carry/digitally_show.

• Production: Physical items use produced_by (an embedded Activity); digital items use created_by (an embedded Activity).

• Relationships between entities are expressed by properties like member_of (to a Set), classified_as (to a Type), part_of (spatial hierarchy for Place), and about (subject relationship).

Key data patterns (used consistently across all classes) are implemented as properties to hold structured information directly on the entity record, facilitating front-end consistency:

• identified_by: For names, titles, and unique identifiers.

• referred_to_by: For descriptions or textual statements.

• dimension: For measurements, linking to a MeasurementUnit.

• subject_of: For external digital references (websites, APIs) via an access_point URI.

• equivalent: Links to external resources describing the same entity.

• Time: Dates are embedded within Activity records using a TimeSpan construction.

Cross-Domain Modeling Lessons

• Natural History (Yale Peabody Museum): Specimens, which lack human creative intent, are still modeled as HumanMadeObject for consistency, leveraging the model's strength in separating the physical object from an intellectual work.

• Archives: Modeled by separating the archival collection's conceptual arrangement from the objects' physical location. The arrangement is represented as a hierarchy of Set entities.

• Library Data (13M+ Works): MARC data was successfully mapped, but the challenge shifted from modeling to extracting knowledge (e.g., proper place subject headings) from MARC, exposing errors previously hidden by simple HTML rendering.

Linked Data Technology and Architecture

The system handles 41 million records resulting in 2+ billion RDF triples and required a dedicated, performant technology stack.

System Selection

Yale discarded open-source component integrations (e.g., AWS Neptune + Elastic) due to the extensive effort required for a performant integrated graph/document query capability. The final choice was the proprietary MarkLogic multi-modal database, which can process both graph and document queries simultaneously, a key performance requirement.

System Optimization and Performance

Triple Materialization: Instead of relying on the slow JSON-LD expansion to materialize all RDF triples into the triplestore, a custom triple generator was implemented. This generator materializes only the necessary triples, dramatically reducing the dataset's on-disk size by approximately 50% and speeding up loading/indexing.

Shortcut Triples: The pipeline adds artificial "shortcut" triples (e.g., lux:agentOfProduction) to bypass intermediate nodes, thus avoiding complex, slow joins across large intermediate result sets in graph queries.

Query Language: The MarkLogic query language CTS was found to be many times faster than SPARQL for LUX's required queries. Efficient joining of graph and document queries is achieved by aligning the document identifier with the entity's URI.

Data Acquisition Pipeline

• Change Synchronization: The system relies on the IIIF Change Discovery API (based on W3C Activity Streams 2.0) implemented by collecting units. This API provides a chronological stream of update/delete events, allowing the harvester to synchronize records over the web.

• Processing Stack: The processing pipeline uses Python with open-source tools:

• PostgreSQL for caching harvested, intermediate, and final record representations.

• Redis for a fast key/value store, essential for managing the 100 million+ URIs needed for reconciliation concordances.

Reconciliation and Enrichment

The pipeline involves three phases, operating in parallel across single records to ensure scalability with 41M+ records:

Reconciliation: Identifies equivalent URIs for entities (people, places) from internal and 20+ external authority sources (e.g., Getty, LoC, Wikidata). If URIs are unavailable, it attempts exact name matching against the closest authorities.

Internal URI Update: All record identifiers and outbound links are updated to use the LUX internal URI.

Merging: All records with the same internal URI are merged into a single, enriched description for export to MarkLogic.

Front-End Design and Hypermedia API

The front-end is a React single-page application focused on discovery over repeated searching, emphasizing actionable links to "hub pages."

The Essential Record Construct

Despite Linked Data's emphasis on a unified graph, the record construct remains essential for usability and computational tractability. Performance relies on materializing computed sub-graphs into discrete, serialized JSON-LD chunks, which function as pre-constructed records. This avoids the overwhelming computational cost of repeatedly constructing sub-graphs and facet counts on the fly.

Performance and Caching

• The application uses a REST-oriented paradigm with multiple layers of web caches (browser, React, web cache infrastructure) relying on the static JSON-LD representation of records.

• Performance is high because the richly connected graph means the same records are reused frequently in a session, resulting in a cache hit. The system without cache supports 150 concurrent queries, but caching dramatically reduces database load.

Decoupling with Hypermedia (HAL)

To maintain separation between the front-end code and the evolving back-end query syntax/data model, the Hypertext Application Language (HAL) is used.

• HAL Links: Named search links (e.g., _links.related_objects) are added at request time by the middle tier, separated from the semantic knowledge.

• The front-end only needs to follow the named link, completely decoupling it from the query logic (MarkLogic CTS, scope/parameters).

• Paged Responses: The results of these HAL searches follow the W3C Activity Streams Paged Collection pattern for standardized, paged retrieval of results.

This approach yielded significant benefits: a new system could be implemented and deployed in less than one person-day of engineering effort by reusing components and configuring new HAL links, demonstrating a highly sustainable and reusable codebase.

That's a great question, as the original paper explicitly notes that LUX is a "work in progress" with ideas for improvement. Based on the technical design and general best practices for Linked Data applications, here are key areas and specific recommendations for improving the LUX application, categorized by technical focus and user experience:

⚙️ Technical and Data Improvements

The primary goal here is to enhance performance, scalability, and data quality.

Advanced Reconciliation & Data Quality

The paper highlighted that reconciliation of subjects and concepts was less successful than for people and places, and MARC data revealed hidden quality issues.

• Improve Subject/Concept Matching: Focus data engineering effort on developing more sophisticated algorithms for matching concepts, subjects, and genres, perhaps leveraging Word Embeddings or Knowledge Graph Embeddings to find conceptual similarity beyond exact string matches.

• Stratigraphy Modeling: Finalize and implement the new class model for stratigraphy and other complex natural history features mentioned in the paper, ensuring all domains are fully represented.

• Automated Data Auditing: Develop scripts that flag common MARC-to-Linked Art mapping errors (like barcodes appearing as dates) before they hit the enrichment pipeline, improving the data quality at the source.

Knowledge Graph Optimization

The paper noted that they already implemented "shortcut" triples and a custom generator, but further optimization is always possible.

• Implement Read-Only Replica: For a production system like LUX, implementing a dedicated, replicated MarkLogic instance or similar system optimized purely for front-end read queries would guarantee performance under heavy user load.

• Refine Shortcut Triples: Conduct A/B testing on the HAL links (the named searches) to see if additional shortcut triples can reduce query execution time for the most frequent user actions (e.g., "Find all exhibition records related to this item").

🎨 User Experience (UX) and Discovery Improvements

The goal is to move beyond finding specific items to facilitating deeper, unexpected exploration.

Enhanced Graph Visualization

While the current app is highly functional, Linked Data's power is best demonstrated visually.

• Interactive Relationship Viewer: Add a feature on "hub pages" (Person, Place, Activity) that displays a small, interactive visualization of the immediate relationships (e.g., this Person \to was a member of \to this Group \to was involved in \to this Activity). This makes the abstract "graph" concept concrete.

• Time-Based Exploration: Leverage the highly structured TimeSpan data to create an interactive timeline for entities. A user on a Person page could see a visual timeline of their activities (birth, production events, exhibitions) and filter results based on historical periods.

Cross-Domain Clarity and Context

Since LUX blends wildly different domains (Art, Archives, Natural History), clarifying context is vital.

• Visual Domain Tags: Implement clear, concise visual indicators (e.g., icons or small color tags) next to every item in search results to immediately signify the originating collection/domain (e.g., 🏛️ for Art, 🦖 for Peabody, 📚 for Library).

• Progressive Disclosure: Use progressive disclosure principles (as noted in UX best practices) to initially show only the most essential, connected information, and require an explicit click to view highly granular, domain-specific metadata (like full MARC fields or detailed stratigraphy data).

Personalization and Reusability

The GitHub repository shows work on "Personal Collections," which is a perfect foundation for next-level features.

• User-Generated Graph Edges: Allow authenticated users to create and save their own temporary "virtual collections" (Sets). This would let a researcher link an item from the Art Gallery to a manuscript in the Library via a user-defined relationship, empowering digital scholarship.

• Search and Export Personalization: Allow users to save complex queries and facets, and provide highly structured export formats (like custom JSON-LD or CSV) of their personalized results.

I can focus on a single area—for example, diving into the specifics of how to implement the Interactive Relationship Viewer using the existing HAL links—or I can search for any official public roadmap or user feedback published by the Yale team if you'd like to stick closer to their planned work.

What area of improvement is most relevant to your team right now (UX/Discovery, or Technical/Data)?

AI/agent quick endpoints

/api/docs/manifest — current catalog with IDs, anchors, source timestamps, and checksums.
/api/docs/content?path=roadmap.md — roadmap JSON payload + rendered HTML for downstream tooling.

Meta Museum Docs (Live Markdown Source)

Catalog (70)

Current Document: YaleLux.md

AI/agent quick endpoints