Catalog (70)

IDDocumentUpdatedAnchorsSHA
agents/ag2-extraction-notesAG2 Extraction Notes
agents/ag2-extraction-notes.md
10/20/2018, 1:46:40 AM11e8d0072ebec1
asset-provenanceAsset Provenance
asset-provenance.md
10/20/2018, 1:46:40 AM41025c0acc117
closeout-notesAI-RSI one-click closeout notes
closeout-notes.md
10/20/2018, 1:46:40 AM21f560f6a8535
content-credibility-engineContent Credibility Engine
content-credibility-engine.md
10/20/2018, 1:46:40 AM8d9aa32358670
demo-scriptDemo Video — Shot List & Script (60–90s)
demo-script.md
10/20/2018, 1:46:40 AM2131ddae42e6e
deploymentDeployment — Vercel + Render
deployment.md
10/20/2018, 1:46:40 AM84911b1f459b5
development-roadmapMeta Museum Development Roadmap
development-roadmap.md
10/20/2018, 1:46:40 AM23624a8a089d72
development/aidd-tddAIDD + TDD Discipline
development/aidd-tdd.md
10/20/2018, 1:46:40 AM5cd0a0524525a
envEnvironment Variables
env.md
10/20/2018, 1:46:40 AM109c18634cab1a
evals/golden-museum-questionsGolden Eval Dataset: Complex Museum Questions
evals/golden-museum-questions.md
10/20/2018, 1:46:40 AM62876a2b5e78d
linked-art/conformance-matrixLinked Art 1.0 — Conformance Matrix
linked-art/conformance-matrix.md
10/20/2018, 1:46:40 AM553ff87000bf4
linked-art/Linked%20Art%20NotesLinked Art Notes.md
linked-art/Linked Art Notes.md
10/20/2018, 1:46:40 AM0aca66d51107b
linked-art/Linked%20Open%20Art%20Data%20Web%20App%20-%20Must-have%20Data%20SourcesLinked Open Art Data Web App (AI) — Must-have Data Sources
linked-art/Linked Open Art Data Web App - Must-have Data Sources.md
10/20/2018, 1:46:40 AM77b7d350fe8a0
linked-art/LinkedArtAppFeatures🏛️ Art Explorer: Linked Art Application & Ecosystem
linked-art/LinkedArtAppFeatures.md
10/20/2018, 1:46:40 AM14e23b890ecd2a
linked-art/LinkedArtChallengesLinkedArtChallenges.md
linked-art/LinkedArtChallenges.md
10/20/2018, 1:46:40 AM0d8c987070277
linked-art/LinkedArtCollaborationLinkedArtCollaboration.md
linked-art/LinkedArtCollaboration.md
10/20/2018, 1:46:40 AM114ccf63edef3
linked-art/LinkedArtDashboardLinkedArtDashboard.md
linked-art/LinkedArtDashboard.md
10/20/2018, 1:46:40 AM06d04d4b2bf79
linked-art/LinkedArtFeatureRoadmapFeature Roadmap for Linked Open Art Data Apps
linked-art/LinkedArtFeatureRoadmap.md
10/20/2018, 1:46:40 AM8ac10d8e79c20
linked-art/LinkedArtJobReadyLinkedArtJobReady.md
linked-art/LinkedArtJobReady.md
10/20/2018, 1:46:40 AM0c60b357bcb87
linked-art/LinkedArtModel1.0-ReferenceLinked Art Model 1.0 Reference (Round 1)
linked-art/LinkedArtModel1.0-Reference.md
10/20/2018, 1:46:40 AM344e6d48d474b3e
linked-art/LinkedArtPatternsLinkedArtPatterns.md
linked-art/LinkedArtPatterns.md
10/20/2018, 1:46:40 AM0d45bbbb02d70
linked-art/LinkedArtPRD🖼️ Product Requirements Document
linked-art/LinkedArtPRD.md
10/20/2018, 1:46:40 AM2091bc1f37307c
linked-art/LinkedArtRoadmapLinkedArtRoadmap.md
linked-art/LinkedArtRoadmap.md
10/20/2018, 1:46:40 AM0e52e71c6bd28
linked-art/LinkedArtSaaSLinkedArtSaaS.md
linked-art/LinkedArtSaaS.md
10/20/2018, 1:46:40 AM03d260738fb29
linked-art/LinkedArtSoftwareCode and Tools
linked-art/LinkedArtSoftware.md
10/20/2018, 1:46:40 AM89e8fef24aea9
linked-art/LinkedArtSOTAWebAppLinkedArt SOTA Web App — Master Build Specification
linked-art/LinkedArtSOTAWebApp.md
10/20/2018, 1:46:40 AM129a5f0baca89c6
linked-art/LinkedArtUnmetNeedsLinkedArtUnmetNeeds.md
linked-art/LinkedArtUnmetNeeds.md
10/20/2018, 1:46:40 AM0cb35fac29cc1
linked-art/LinkedArtUseCasesLinkedArtUseCases.md
linked-art/LinkedArtUseCases.md
10/20/2018, 1:46:40 AM05c572ce8e7f3
linked-art/LinkedArtWidgetsLinkedArtWidgets.md
linked-art/LinkedArtWidgets.md
10/20/2018, 1:46:40 AM0b39911c7d97d
linked-art/LinkedDesignLinkedDesign.md
linked-art/LinkedDesign.md
10/20/2018, 1:46:40 AM00a02240471e5
linked-art/LODEngineLODEngine.md
linked-art/LODEngine.md
10/20/2018, 1:46:40 AM0ef73426f80db
linked-art/LODPipelineLODPipeline.md
linked-art/LODPipeline.md
10/20/2018, 1:46:40 AM0fe95e61ed9da
linked-art/LODToolsLODTools.md
linked-art/LODTools.md
10/20/2018, 1:46:40 AM03167947fc4e4
linked-art/SPARQLSPARQL.md
linked-art/SPARQL.md
10/20/2018, 1:46:40 AM050e00ed51733
linked-art/VocabulariesVocabularies.md
linked-art/Vocabularies.md
10/20/2018, 1:46:40 AM0e0574a338aaa
linked-art/YaleLuxYaleLux.md
linked-art/YaleLux.md
10/20/2018, 1:46:40 AM074fd47fae749
meta-wiki-art-bridgeMeta Wiki Art Bridge (MediaWiki + Wikibase)
meta-wiki-art-bridge.md
10/20/2018, 1:46:40 AM77a43fb0c48b8
ops/activity-adoption-proofActivity Feed Adoption Proof Runbook
ops/activity-adoption-proof.md
10/20/2018, 1:46:40 AM568a80b43ae58
ops/ag2-workerAG2 Worker and Bridge Runbook
ops/ag2-worker.md
10/20/2018, 1:46:40 AM950efcd4e3318
ops/auth-credential-rotationAuth credential rotation runbook
ops/auth-credential-rotation.md
10/20/2018, 1:46:40 AM4449b8b8eecb6
ops/deployment-preflightDeployment Preflight Runbook
ops/deployment-preflight.md
10/20/2018, 1:46:40 AM5ac60432d0aed
ops/era-c-exit-gate-evidenceEra C Exit-Gate Evidence Pack
ops/era-c-exit-gate-evidence.md
10/20/2018, 1:46:40 AM6656b9c7f85c6
ops/go-live-checklistGo-Live & Evidence-Pipeline Checklist
ops/go-live-checklist.md
10/20/2018, 1:46:40 AM6ae7f5d71f7dc
ops/k6-slok6 SLO Load Test (SOTA §20.4)
ops/k6-slo.md
10/20/2018, 1:46:40 AM4328b5b3163d4
ops/kpi-evidenceSOTA §26 KPI Evidence Input
ops/kpi-evidence.md
10/20/2018, 1:46:40 AM5d7b2973d2927
ops/launch-reviewLaunch Review Packet
ops/launch-review.md
10/20/2018, 1:46:40 AM5880e41ebcbe3
ops/managed-linked-art-pilot-runbookManaged Linked Art Pilot Runbook
ops/managed-linked-art-pilot-runbook.md
10/20/2018, 1:46:40 AM11d4f125c2ddae
ops/otel-localLocal OpenTelemetry Wiring (Tempo / Jaeger)
ops/otel-local.md
10/20/2018, 1:46:40 AM51ebbc3b33f92
ops/outbox-projectorTransactional Outbox Projector (Postgres -> Solr/GraphDB)
ops/outbox-projector.md
10/20/2018, 1:46:40 AM5dc70ad766471
ops/procurement-readiness-packetProcurement Readiness Packet
ops/procurement-readiness-packet.md
10/20/2018, 1:46:40 AM9c5685e82cca7
ops/reconciliation-serviceReconciliation Service (C2)
ops/reconciliation-service.md
10/20/2018, 1:46:40 AM605162c313ea9
ops/search-graph-provisioningSolr 9 + GraphDB Provisioning
ops/search-graph-provisioning.md
10/20/2018, 1:46:40 AM6fc1b15279a84
ops/security-dr-drillPen Test Baseline + DR Drill Runbook
ops/security-dr-drill.md
10/20/2018, 1:46:40 AM3a766ef3e2afc
progress/2026-05-31/era-c-readiness-snapshotEra C Readiness Snapshot (May 31, 2026)
progress/2026-05-31/era-c-readiness-snapshot.md
10/20/2018, 1:46:40 AM39672614ceb53
progress/era-historyMeta Museum — Era Delivery History
progress/era-history.md
10/20/2018, 1:46:40 AM47cc030755d1e5
providers/harvard-art-museumsHarvard Art Museums API Integration Plan
providers/harvard-art-museums.md
10/20/2018, 1:46:40 AM11fa8b980154f5
providers/louvre-collections-jsonLouvre Collections JSON Integration Plan
providers/louvre-collections-json.md
10/20/2018, 1:46:40 AM11775f91a8d813
providers/nga-open-dataNational Gallery of Art (NGA) Open Data Integration Plan
providers/nga-open-data.md
10/20/2018, 1:46:40 AM1151c4807c8de0
providers/princeton-art-museumPrinceton University Art Museum API Integration Plan
providers/princeton-art-museum.md
10/20/2018, 1:46:40 AM11c8823f65ee41
providers/rkd-knowledge-graphRKD Knowledge Graph Integration Plan
providers/rkd-knowledge-graph.md
10/20/2018, 1:46:40 AM162b4b42f2ad42
providers/smithsonian-open-accessSmithsonian Open Access Integration Plan
providers/smithsonian-open-access.md
10/20/2018, 1:46:40 AM12db1ffa4cab02
providers/vanda-collections-apiVictoria and Albert Museum (V&A) Collections API Integration Plan
providers/vanda-collections-api.md
10/20/2018, 1:46:40 AM11755d93972233
qualityQuality & Performance
quality.md
10/20/2018, 1:46:40 AM6174add040960
reconciliation/exhibition-literature-reconciliationExhibition + Literature Reconciliation (B6.1)
reconciliation/exhibition-literature-reconciliation.md
10/20/2018, 1:46:40 AM7293e9d81dd7c
responsible-aiResponsible AI
responsible-ai.md
10/20/2018, 1:46:40 AM8f90006650821
risk-registerRisk Register
risk-register.md
10/20/2018, 1:46:40 AM4becb213d5c5e
roadmap-to-10Roadmap to 10/10
roadmap-to-10.md
10/20/2018, 1:46:40 AM1540a11000dc7e
roadmapMeta Museum Roadmap
roadmap.md
10/20/2018, 1:46:40 AM18145d0cbbe54a
rsi-wikiAI-RSI compounding wiki
rsi-wiki.md
10/20/2018, 1:46:40 AM8b64914fe6f20
wikibase-cloud-migration-checklistWikibase Cloud -> Self-Host Migration Checklist
wikibase-cloud-migration-checklist.md
10/20/2018, 1:46:40 AM12170657fcbf2b

    Current Document: YaleLux.md

    Source updated 10/20/2018, 1:46:40 AM · SHA-256 74fd47fae749 · 390 lines

    Canonical ID: linked-art/YaleLux

    JSON for this doc:/api/docs/content?path=linked-art/YaleLux.md

    Human link:/docs?doc=linked-art%2FYaleLux.md

    Canonical API endpoint:/api/docs/content?path=linked-art%2FYaleLux.md

    Sections (stable anchors):

    No detectable headings.

    Based on the paper provided regarding Yale University’s LUX project, here are the key technical details for app design and development, categorized by architecture, data modeling, and performance optimization.

    1. Data Modeling: The Linked Art Paradigm

    The core of the system is built on Linked Open Usable Data (LOUD) principles, specifically using the Linked Art profile. This ensures cross-domain consistency (Art, Natural History, Archives, Libraries).

    • Ontology Foundation: Built on CIDOC-CRM but simplified for developer usability.

    • Format: JSON-LD is used to make data accessible to standard web developers without deep semantic web expertise.

    • Core Classes:

    • HumanMadeObject: Physical cultural items.

    • DigitalObject: Digital files/surrogates.

    • LinguisticObject (Texts) & VisualItem (Images): Separates the "work" from the physical carrier.

    • Set: Used for archival collections and hierarchies.

    • Activity: Connects actors, places, and times (e.g., provenance, creation, exhibitions).

    • Handling Complexity:

    • Natural History: Geological and biological specimens are mapped to HumanMadeObject for consistency, or Type for taxonomic hierarchies.

    • Archives: Modeled as nested Sets to separate physical location from intellectual arrangement.

    1. System Architecture & Technology Stack

    Yale rejected a pure "triplestore" approach in favor of a Multi-Modal Database architecture. This hybrid approach handles both document retrieval (text search) and graph traversal (relationship discovery).

    A. The Database Engine

    • Technology: MarkLogic (Commercial).

    • Rationale:

    • It supports CTS (Core Text Search) for high-speed document retrieval and relevancy scoring.

    • It natively stores JSON-LD and triples, allowing for graph queries (SPARQL) to be joined with text queries.

    • Performance Note: Yale found MarkLogic’s internal CTS query language significantly faster than standard SPARQL for their specific use cases.

    B. The Ingestion Pipeline

    Data is harvested from disparate systems (museum CMS, library catalogs) and normalized.

    • Synchronization Protocol: IIIF Change Discovery API (based on W3C Activity Streams 2.0). This allows the harvester to "walk" backward through time to find record updates.

    • Processing Stack:

    • Language: Python.

    • Caching/State: PostgreSQL (stores intermediate record states) and Redis (Key/Value store for fast URI concordance/matching).

    • Reconciliation: An algorithm matches internal records with external authorities (Getty, Wikidata, LOC) to enrich data and merge duplicate entities into a single "Hub" page.

    1. Front-End Application Design

    The discovery interface (LUX) is a Single Page Application (SPA) designed to decouple the user interface from the complex backend logic.

    • Framework: React.

    • API Strategy:

    • HAL (Hypertext Application Language): The backend returns responses with "named links" (_links). The front end simply follows these links rather than constructing complex query strings. This decouples the frontend code from the backend query syntax.

    • Activity Streams Pattern: Search results follow the "Paged Collection" pattern (next/previous links) for standard pagination.

    • Component Reuse: The UI components are built around data patterns (e.g., a "Name" component, a "Location" component) rather than domain specific pages, allowing the same code to render a fossil, a painting, or a book.

    1. Performance Optimization & Design Lessons

    Yale learned that pure graph traversal is too slow for a consumer-facing application at the scale of 41 million records (expanding to 2 billion triples).

    • The "Record" Concept: Despite using a graph, they maintain the concept of a "Record" (a pre-constructed JSON-LD document). Constructing sub-graphs on the fly was computationally too expensive.

    • Shortcut Triples: The pipeline injects artificial "shortcut" triples to reduce join depth.

    • Example: Instead of Object -> Produced_by -> Part -> Carried_out_by -> Person (3 hops), they index lux:agentOfProduction (1 hop).

    • Aggressive Caching:

    • Static JSON-LD: The system relies on retrieving static JSON-LD representations from web caches rather than dynamic database hits whenever possible.

    • Query Caching: Because queries are provided via HAL links in the response, the query links themselves are cached.

    Next Step

    Would you like me to draft a technical requirements document or a data mapping template based on the Linked Art structure described in this text?

    Yale LUX project. This analysis focuses on the engineering patterns and architectural decisions that made the project viable.

    1. The Core Data Paradigm: "Linked Open Usable Data" (LOUD)

    The foundational technical decision was to prioritize usability over semantic perfection. Yale adopted Linked Art, a profile of the CIDOC-CRM ontology, to unify disparate collections (Art, Natural History, Archives, and Libraries).

    Unified Data Modeling

    The system normalizes 41 million records from different domains into a single conceptual model using JSON-LD. This allows a single frontend codebase to render diverse content without domain-specific logic.

    • Shared Classes: "Dinosaur fossils" (Science) and "Paintings" (Art) are both modeled as HumanMadeObject.

    • Separation of Content: The model strictly separates the physical item (HumanMadeObject) from the intellectual work it carries (LinguisticObject or VisualItem). This handles complex library cases where one book (object) contains multiple texts (works).

    • Archival Hierarchies: Archives are modeled as nested Sets to separate the physical location of a box from its intellectual arrangement in a finding aid.

    The "Record" Abstraction

    Crucially, the team rejected the pure semantic web notion that "there are no records, only a graph."

    • Engineering Reality: To build a performant app, the system maintains the concept of a Record—a discrete, self-contained JSON-LD document representing an entity (e.g., a Person or an Object).

    • Benefit: This allows for standard document-based indexing, caching, and retrieval, which is significantly faster than dynamically assembling sub-graphs at runtime.

    1. System Architecture: The Multi-Modal Strategy

    Yale moved away from "pure" triplestores, finding them too slow and complex for consumer-facing apps. Instead, they adopted a Multi-Modal Architecture.

    Hybrid Database Engine

    The system uses MarkLogic, which functions simultaneously as:

    1. A Document Store: Stores the JSON-LD records for fast retrieval and text search.
    1. A Graph Store: Indexes the RDF triples embedded in the JSON-LD for relationship traversal.

    Why this matters: This allows the application to use the "right tool for the job." It uses CTS (Core Text Search) for high-speed keyword queries and relevance ranking, and Graph Queries to find connections (e.g., "Find all students of this artist").

    Ingestion Pipeline

    The data pipeline is designed for massive scale and eventual consistency.

    • Synchronization: It uses the IIIF Change Discovery API (based on W3C Activity Streams) to "crawl" for updates rather than relying on full dumps.

    • Processing Stack: The pipeline is written in Python, using PostgreSQL for intermediate state caching and Redis for high-speed URI matching.

    • Reconciliation: An automated process merges records from internal systems with external authorities (Getty, LOC, Wikidata) to create a single "Hub" record for every person, place, and concept.

    1. Performance Optimization Techniques

    Performance was a primary requirement, leading to several clever engineering shortcuts.

    Index-Time "Shortcut" Triples

    Graph queries that require multiple "hops" (joins) are expensive. To solve this, the pipeline materializes artificial triples during ingestion.

    • The Problem: Finding the creator of an object normally requires traversing: Object → Production Event → Part of Event → Carried out by → Person.

    • The Fix: The pipeline injects a shortcut triple: Object → lux:agentOfProduction → Person.

    • Result: Runtime queries become instantaneous single-hop lookups.

    Aggressive Caching Strategy

    The application architecture assumes that computing facets and graph connections is expensive, so it avoids doing it live whenever possible.

    • Pre-Computation: Complex sub-graphs needed for UI rendering are materialized into the static JSON-LD record.

    • Web Caching: The architecture relies heavily on standard web caches (CDNs, browser caches) to serve the static JSON-LD files.

    • Query Caching: Because queries are predefined (see HAL below), the search results themselves can be aggressively cached.

    1. Frontend & API Design: Decoupling via Hypermedia

    The most distinct design pattern in LUX is the complete decoupling of the frontend UI from the backend query logic.

    Hypertext Application Language (HAL)

    Instead of the frontend constructing complex SPARQL or SQL queries, the backend API returns Named Links in the response.

    • Mechanism: A record for "Claude Monet" includes a _links section with pre-built URLs like search_works_by_agent.

    • Benefit: The React frontend simply renders a link. If the backend team changes the database query language or schema, they only update the link generation logic on the server. The frontend code remains untouched.

    Component-Based UI

    Because the data is normalized to Linked Art, the UI uses reusable components based on data patterns rather than domain types.

    • A "Date/Time" component handles the TimeSpan pattern.

    • A "Name" component handles the identified_by pattern.

    • These components are composed to build pages, meaning the same code renders a fossil, a painting, or a book.

    1. Lessons for Developers

    Consistency over Completeness

    The team found that strict data consistency (forcing everything into the Linked Art model) was more valuable than capturing every nuance of the original data. This consistency reduced frontend development time and bugs.

    "Follow Your Nose" Discovery

    The user experience is designed around navigation, not just search. The API supports this by ensuring bidirectional linking is discoverable. Instead of every Book linking to "Concept: Physics," the Concept page searches for "items that link to me," preventing massive overhead on the Concept record itself.

    Developer Experience (DX)

    The choice of JSON-LD and Python was intentional to ensure that "junior" developers could contribute without needing deep expertise in semantic web standards like RDF/XML or SPARQL.

    Next Step

    Would you like me to generate a JSON-LD code snippet demonstrating how a specific item (like a Book or Painting) would be modeled with the "Shortcut Triples" mentioned above?

    💡 Yale LUX Project: In-Depth Technical Extraction

    1. Linked Art Data Model and Cross-Domain Details

    The LUX system is built on the Linked Art metadata application profile, a constrained subset and extension of the CIDOC Conceptual Reference Model (CIDOC-CRM), ensuring a consistent conceptual model across diverse collections.

    Core Classes and Concepts

    Linked Art uses a set of classes that serve as the foundational entities for the knowledge graph (KG):

    • Item Entities:

    • HumanMadeObject (physical collection items, even repaired natural specimens).

    • DigitalObject (digital files, e.g., web home pages).

    • Work Entities (Intellectual/Conceptual):

    • LinguisticObject (text-based works, e.g., a book's text).

    • VisualItem (image-based works, e.g., the visual content of a photograph).

    • Contextual Entities ("Hub Pages"):

    • Person and Group (organizations).

    • Place (geographic locations).

    • Activity (events like exhibitions, provenance, production/creation).

    • Modeling Extensions:

    • Set: A small Linked Art extension to CIDOC-CRM, representing a conceptual collection or aggregation (e.g., archival collections, departmental sets) capable of containing members of different types (objects, digital items, other sets).

    Relationships and Data Patterns

    The model is structured to separate physical holdings from intellectual works, crucial for library data:

    • Holding-to-Work: HumanMadeObject can carry a LinguisticObject or show a VisualItem. DigitalObject uses digitally_carry/digitally_show.

    • Production: Physical items use produced_by (an embedded Activity); digital items use created_by (an embedded Activity).

    • Relationships between entities are expressed by properties like member_of (to a Set), classified_as (to a Type), part_of (spatial hierarchy for Place), and about (subject relationship).

    Key data patterns (used consistently across all classes) are implemented as properties to hold structured information directly on the entity record, facilitating front-end consistency:

    • identified_by: For names, titles, and unique identifiers.

    • referred_to_by: For descriptions or textual statements.

    • dimension: For measurements, linking to a MeasurementUnit.

    • subject_of: For external digital references (websites, APIs) via an access_point URI.

    • equivalent: Links to external resources describing the same entity.

    • Time: Dates are embedded within Activity records using a TimeSpan construction.

    Cross-Domain Modeling Lessons

    • Natural History (Yale Peabody Museum): Specimens, which lack human creative intent, are still modeled as HumanMadeObject for consistency, leveraging the model's strength in separating the physical object from an intellectual work.

    • Archives: Modeled by separating the archival collection's conceptual arrangement from the objects' physical location. The arrangement is represented as a hierarchy of Set entities.

    • Library Data (13M+ Works): MARC data was successfully mapped, but the challenge shifted from modeling to extracting knowledge (e.g., proper place subject headings) from MARC, exposing errors previously hidden by simple HTML rendering.

    1. Linked Data Technology and Architecture

    The system handles 41 million records resulting in 2+ billion RDF triples and required a dedicated, performant technology stack.

    System Selection

    Yale discarded open-source component integrations (e.g., AWS Neptune + Elastic) due to the extensive effort required for a performant integrated graph/document query capability. The final choice was the proprietary MarkLogic multi-modal database, which can process both graph and document queries simultaneously, a key performance requirement.

    System Optimization and Performance

    1. Triple Materialization: Instead of relying on the slow JSON-LD expansion to materialize all RDF triples into the triplestore, a custom triple generator was implemented. This generator materializes only the necessary triples, dramatically reducing the dataset's on-disk size by approximately 50% and speeding up loading/indexing.
    1. Shortcut Triples: The pipeline adds artificial "shortcut" triples (e.g., lux:agentOfProduction) to bypass intermediate nodes, thus avoiding complex, slow joins across large intermediate result sets in graph queries.
    1. Query Language: The MarkLogic query language CTS was found to be many times faster than SPARQL for LUX's required queries. Efficient joining of graph and document queries is achieved by aligning the document identifier with the entity's URI.

    Data Acquisition Pipeline

    • Change Synchronization: The system relies on the IIIF Change Discovery API (based on W3C Activity Streams 2.0) implemented by collecting units. This API provides a chronological stream of update/delete events, allowing the harvester to synchronize records over the web.

    • Processing Stack: The processing pipeline uses Python with open-source tools:

    • PostgreSQL for caching harvested, intermediate, and final record representations.

    • Redis for a fast key/value store, essential for managing the 100 million+ URIs needed for reconciliation concordances.

    Reconciliation and Enrichment

    The pipeline involves three phases, operating in parallel across single records to ensure scalability with 41M+ records:

    1. Reconciliation: Identifies equivalent URIs for entities (people, places) from internal and 20+ external authority sources (e.g., Getty, LoC, Wikidata). If URIs are unavailable, it attempts exact name matching against the closest authorities.
    1. Internal URI Update: All record identifiers and outbound links are updated to use the LUX internal URI.
    1. Merging: All records with the same internal URI are merged into a single, enriched description for export to MarkLogic.
    1. Front-End Design and Hypermedia API

    The front-end is a React single-page application focused on discovery over repeated searching, emphasizing actionable links to "hub pages."

    The Essential Record Construct

    Despite Linked Data's emphasis on a unified graph, the record construct remains essential for usability and computational tractability. Performance relies on materializing computed sub-graphs into discrete, serialized JSON-LD chunks, which function as pre-constructed records. This avoids the overwhelming computational cost of repeatedly constructing sub-graphs and facet counts on the fly.

    Performance and Caching

    • The application uses a REST-oriented paradigm with multiple layers of web caches (browser, React, web cache infrastructure) relying on the static JSON-LD representation of records.

    • Performance is high because the richly connected graph means the same records are reused frequently in a session, resulting in a cache hit. The system without cache supports 150 concurrent queries, but caching dramatically reduces database load.

    Decoupling with Hypermedia (HAL)

    To maintain separation between the front-end code and the evolving back-end query syntax/data model, the Hypertext Application Language (HAL) is used.

    • HAL Links: Named search links (e.g., _links.related_objects) are added at request time by the middle tier, separated from the semantic knowledge.

    • The front-end only needs to follow the named link, completely decoupling it from the query logic (MarkLogic CTS, scope/parameters).

    • Paged Responses: The results of these HAL searches follow the W3C Activity Streams Paged Collection pattern for standardized, paged retrieval of results.

    This approach yielded significant benefits: a new system could be implemented and deployed in less than one person-day of engineering effort by reusing components and configuring new HAL links, demonstrating a highly sustainable and reusable codebase.

    That's a great question, as the original paper explicitly notes that LUX is a "work in progress" with ideas for improvement. Based on the technical design and general best practices for Linked Data applications, here are key areas and specific recommendations for improving the LUX application, categorized by technical focus and user experience:

    ⚙️ Technical and Data Improvements

    The primary goal here is to enhance performance, scalability, and data quality.

    1. Advanced Reconciliation & Data Quality

    The paper highlighted that reconciliation of subjects and concepts was less successful than for people and places, and MARC data revealed hidden quality issues.

    • Improve Subject/Concept Matching: Focus data engineering effort on developing more sophisticated algorithms for matching concepts, subjects, and genres, perhaps leveraging Word Embeddings or Knowledge Graph Embeddings to find conceptual similarity beyond exact string matches.

    • Stratigraphy Modeling: Finalize and implement the new class model for stratigraphy and other complex natural history features mentioned in the paper, ensuring all domains are fully represented.

    • Automated Data Auditing: Develop scripts that flag common MARC-to-Linked Art mapping errors (like barcodes appearing as dates) before they hit the enrichment pipeline, improving the data quality at the source.

    1. Knowledge Graph Optimization

    The paper noted that they already implemented "shortcut" triples and a custom generator, but further optimization is always possible.

    • Implement Read-Only Replica: For a production system like LUX, implementing a dedicated, replicated MarkLogic instance or similar system optimized purely for front-end read queries would guarantee performance under heavy user load.

    • Refine Shortcut Triples: Conduct A/B testing on the HAL links (the named searches) to see if additional shortcut triples can reduce query execution time for the most frequent user actions (e.g., "Find all exhibition records related to this item").

    🎨 User Experience (UX) and Discovery Improvements

    The goal is to move beyond finding specific items to facilitating deeper, unexpected exploration.

    1. Enhanced Graph Visualization

    While the current app is highly functional, Linked Data's power is best demonstrated visually.

    • Interactive Relationship Viewer: Add a feature on "hub pages" (Person, Place, Activity) that displays a small, interactive visualization of the immediate relationships (e.g., this Person \to was a member of \to this Group \to was involved in \to this Activity). This makes the abstract "graph" concept concrete.

    • Time-Based Exploration: Leverage the highly structured TimeSpan data to create an interactive timeline for entities. A user on a Person page could see a visual timeline of their activities (birth, production events, exhibitions) and filter results based on historical periods.

    1. Cross-Domain Clarity and Context

    Since LUX blends wildly different domains (Art, Archives, Natural History), clarifying context is vital.

    • Visual Domain Tags: Implement clear, concise visual indicators (e.g., icons or small color tags) next to every item in search results to immediately signify the originating collection/domain (e.g., 🏛️ for Art, 🦖 for Peabody, 📚 for Library).

    • Progressive Disclosure: Use progressive disclosure principles (as noted in UX best practices) to initially show only the most essential, connected information, and require an explicit click to view highly granular, domain-specific metadata (like full MARC fields or detailed stratigraphy data).

    1. Personalization and Reusability

    The GitHub repository shows work on "Personal Collections," which is a perfect foundation for next-level features.

    • User-Generated Graph Edges: Allow authenticated users to create and save their own temporary "virtual collections" (Sets). This would let a researcher link an item from the Art Gallery to a manuscript in the Library via a user-defined relationship, empowering digital scholarship.

    • Search and Export Personalization: Allow users to save complex queries and facets, and provide highly structured export formats (like custom JSON-LD or CSV) of their personalized results.

    I can focus on a single area—for example, diving into the specifics of how to implement the Interactive Relationship Viewer using the existing HAL links—or I can search for any official public roadmap or user feedback published by the Yale team if you'd like to stick closer to their planned work.

    What area of improvement is most relevant to your team right now (UX/Discovery, or Technical/Data)?

    AI/agent quick endpoints