🖼️ Product Requirements Document
State‑of‑the‑Art Linked Art Web Application
---
🎯 1. Product Vision & Objectives
The goal is to build a state‑of‑the‑art cross‑collection discovery platform powered by the Linked Art data model (a profile of CIDOC‑CRM) to unify cultural heritage data across museums, libraries, and archives.
The platform emphasizes:
- Maximizing usable open data
- Balancing ontological rigor with developer accessibility
- Supporting ~90% of common cultural heritage use cases without forcing unnecessary semantic complexity
---
👥 2. Target Audience & Personas
🏛️ Data Creators & Catalogers
Museum and library staff who need intuitive backend interfaces to create semantic relationships without touching raw JSON‑LD.
💻 Developers & Data Consumers
Technologists who require predictable, well‑structured JSON‑LD REST APIs for downstream apps, UI layers, and integrations.
🔍 Researchers & Public Users
End users who want visual, exploratory discovery tools that hide graph complexity while surfacing rich contextual connections across institutions.
---
🚀 3. Key Features & Functionality
A. 🏗️ Backend Data Management & Ingestion
Use an Arches‑based framework (e.g., Ogee) to provide user‑friendly forms that automatically generate Linked Art‑compliant JSON‑LD.
- Semantic Abstraction UI(ca://s?q=Explain_Semantic_Abstraction_UI)
Pipelines reconcile legacy strings against Getty AAT/ULAN/TGN, LoC, and Wikidata.
- Automated Authority Reconciliation(ca://s?q=Describe_Authority_Reconciliation)
ETL pipelines convert flat CMS exports (XML/JSON/CSV) into structured Linked Art nodes.
- Legacy Data Transformation(ca://s?q=Legacy_Data_Transformation_ETL)
---
B. 🔎 Discovery & Search Architecture
Support both:
- Hybrid Datastore(ca://s?q=Hybrid_Datastore_Architecture)
- JSON‑LD document storage (fast faceted search)
- Triple stores (deep graph querying)
Enable multi‑entity, cross‑collection questions such as:
“Find works by European artists depicting the American West in the 19th century.”
- Advanced Graph Querying(ca://s?q=Advanced_Graph_Querying)
---
C. 🎨 Advanced UI & Visualizations
First‑class pages for people, places, and concepts enriched with Wikidata/dbPedia context.
- Entity Knowledge Panels(ca://s?q=Entity_Knowledge_Panels)
Dynamic browsing of large datasets (e.g., artists by birth year, exhibition histories).
- Interactive Concertina Lists & Histograms(ca://s?q=Concertina_Lists_and_Histograms)
Zoomable timelines + map layers linked to gazetteers like PeriodO.
- Overlapping Timelines & Interactive Maps(ca://s?q=Linked_Art_Timeline_and_Map_Visualizations)
Deep zoom, high‑resolution viewing, and side‑by‑side comparisons.
- IIIF Image Integration(ca://s?q=IIIF_Integration_Overview)
---
🧩 4. Data Architecture & Modeling
Shift from object‑centric to activity‑centric modeling: production, provenance, exhibitions, etc.
- Event‑Centric Structure(ca://s?q=Event_Centric_Modeling_Linked_Art)
Distinguish:
Prevents redundant data entry and improves reuse.
- Entity Separation(ca://s?q=Entity_Separation_in_Linked_Art)
- Physical object
- Visual item
- Digital object
Enforce hierarchical concept vocabularies for seamless navigation and faceted filtering.
- Standardized Vocabulary Integration(ca://s?q=Standardized_Vocabulary_Integration)
---
If you'd like, I can also generate:
- a diagrammatic architecture map,
- a feature‑to‑persona matrix,
- or a roadmap with milestones.
This is a very strong baseline PRD that accurately captures the core philosophy of Linked Art (CIDOC-CRM profile), its dual audience, and the necessary tech stack.
However, to make this truly State-of-the-Art (SOTA) for modern software development and current cultural heritage standards, we need to bridge the gap between traditional semantic web technologies and modern AI/data paradigms, while also filling out standard PRD structural gaps (KPIs, Risks, Out-of-Scope).
Here is how we can improve and elevate this PRD:
1. 🤖 Inject Modern "SOTA" AI & Data Capabilities
The current PRD describes a standard 2020-era Linked Art implementation. To make it cutting-edge, integrate these features:
- AI-Assisted Legacy ETL & Mapping: Mapping legacy CMS data (TMS, EMu, Adlib) to CIDOC-CRM is notoriously difficult. Add an LLM-assisted mapping pipeline that analyzes legacy data schemas and suggests Linked Art JSON-LD mappings.
- Natural Language to Graph (NL2Graph): Instead of expecting researchers to write SPARQL or complex API queries, integrate an LLM layer that translates natural language questions ("Find works by female European artists exhibiting in Paris before 1900") directly into graph queries.
- Graph RAG (Retrieval-Augmented Generation): Combine the structured knowledge graph with vector databases to allow users to "chat with the collection," backed by verifiable CIDOC-CRM citations to prevent hallucinations.
2. 🏛️ Strengthen Cultural Heritage Domain Specifics
- Activity Streams for Data Syndication: Linked Art relies heavily on the ActivityStreams specification to sync data between institutions. Your PRD needs a section on how this platform publishes and consumes data updates.
- Deep Provenance Modeling: Emphasize the modeling of ownership history. Provenance is a massive use case for Linked Art; the UI must support visualizing chronological chains of ownership, identifying gaps, and flagging red-flag eras (e.g., WWII-era spoliation).
- Rights & Licensing Statements: Explicitly include support for RightsStatements.org and Creative Commons in the data model. Discovery is useless if developers don't know if they can legally use the image.
3. 📊 Add Missing "Product Management" Sections
A professional PRD needs to define what success looks like and what could go wrong.
- Success Metrics (KPIs):
- Data Quality: % of records successfully reconciled to Getty AAT/ULAN.
- Performance: API response time < 200ms for hybrid graph queries.
- Adoption: Number of external API calls, Monthly Active Users (MAU) on the discovery portal.
- Non-Functional Requirements (NFRs): Accessibility (WCAG 2.1 AA compliant—crucial for museums), internationalization (i18n), and mobile-responsive UI.
- Out of Scope: Explicitly state what you are not building (e.g., "This is NOT a replacement for the museum's core Collections Management System (CMS). It is a downstream publication layer.")
- Risks & Mitigations: Mention the steep learning curve of CIDOC-CRM and mitigate it via the "Semantic Abstraction UI" you outlined.
---
📝 How the Improved Sections Look in the PRD:
Add the following sections to your document to instantly elevate it:
🆕 5. Syndication & Interoperability
- ActivityStreams Integration: Implement a W3C ActivityStreams endpoint to broadcast creations, updates, and deletions. This allows external aggregators to "subscribe" to the museum's data changes in real-time.
- Jupyter Notebook & Data Dumps: Provide automated, regular JSON-LD data dumps and maintain a repository of Jupyter Notebooks demonstrating how researchers can query the dataset natively in Python.
🆕 6. Modern AI & Machine Learning Enhancements
- LLM-Assisted Reconciliation: Utilize fine-tuned models to assist catalogers in reconciling ambiguous entity names against Wikidata and Getty vocabularies, reducing manual data entry.
- Semantic Vector Search: Complement the exact-match graph querying with vector search (embeddings of artwork descriptions and visual IIIF features) to enable "find similar artworks" functionality even without perfect metadata.
🆕 7. Success Metrics & KPIs
- Data Enrichment: >80% of imported records enriched with at least 3 external URIs (ULAN, AAT, TGN).
- API Performance: 95th percentile query response time under 300ms.
- System Uptime: 99.9% uptime for the public-facing IIIF and JSON-LD endpoints.
🆕 8. Out of Scope (For MVP)
- Replacing the institution's primary CMS (e.g., TMS, EMu, or Qi).
- Native ingestion of raw audio/video files (focusing strictly on IIIF image and text metadata for MVP).
---
Where to go next?
I recommend generating the Diagrammatic Architecture Map next.
Because we are talking about a hybrid datastore (Triple Store + Document DB), an ETL pipeline from legacy CMS systems, IIIF servers, and a new AI query layer, visualizing how data flows from the Museum CMS ➡️ ETL Pipeline ➡️ Linked Art Graph ➡️ React/Next.js Frontend is the most crucial next step to get engineering buy-in.
Would you like me to map out that Architecture Diagram detailing the specific modern tech stack (e.g., Elasticsearch + Blazegraph/Neptune + Next.js + IIIF Image Server)?
Here is the upgraded 10/10 PRD.
I have removed the residual citation markers, added a concrete 3-phase rollout roadmap, explicitly defined the reconciliation logic (using external authorities), and established strict API security and rate-limiting requirements to protect the database from heavy graph queries.
!🎨 Linked Art Web Application — Product Requirements Document (PRD)
!🧭 1. Executive Summary
The Linked Art Web Application is a modern, standards‑compliant platform for ingesting, transforming, validating, and publishing cultural heritage data using the Linked Art profile, a JSON‑LD implementation of the CIDOC‑CRM ontology.
Linked Art intentionally reduces the complexity of CIDOC‑CRM while retaining its expressive power — covering 90% of real museum use cases with ~10% of the complexity.
This application enables museums, archives, libraries, and research institutions to publish interoperable, event‑based cultural heritage data for public access, scholarly research, and cross‑institutional aggregation.
!🧩 2. Problem Statement
CIDOC‑CRM is powerful but extremely complex, containing dozens of classes and hundreds of relationships. Most institutions lack the resources to implement it directly.
Linked Art provides a practical, interoperable subset of CRM, but institutions still need:
A turnkey ingestion pipeline
A validation and mapping engine
A searchable public interface
A JSON‑LD API
A graph‑based visualization layer
This PRD defines a complete system that solves these gaps.
!🎯 3. Goals & Non‑Goals
Goals
Publish cultural heritage data in Linked Art JSON‑LD format.
Provide a complete ETL pipeline: ingest → map → validate → publish.
Offer a public search and browsing interface.
Support event‑based modeling (production, acquisition, attribution, etc.).
Ensure interoperability with other Linked Art and IIIF systems.
Provide developer‑friendly APIs for reuse and integration.
Enable research workflows through graph and timeline visualizations.
Non‑Goals
Full implementation of the entire CIDOC‑CRM ontology (intentionally avoided to maintain simplicity).
Acting as a collections management system (CMS).
Providing image hosting (integrates with IIIF instead).
!👥 4. User Personas
!🖼️ Museum Curator
Needs to publish authoritative object metadata and provenance.
!🧪 Digital Humanities Researcher
Needs structured, event‑based data for analysis and visualization.
!🛠️ Software Developer
Needs clean JSON‑LD APIs for integration with external systems.
!🏛️ Cultural Heritage Institution
Needs a standards‑compliant, low‑maintenance publishing platform.
!🔄 5. User Flows
5.1 Data Ingestion Flow
1 Upload source data (CSV, JSON API, XML).
2 System maps fields to Linked Art patterns.
3 Validation engine checks conformance.
4 User reviews and resolves any reconciliation conflicts.
5 Data is published as JSON‑LD.
5.2 Public Browsing Flow
1 User searches for an object, person, or event.
2 Results appear with facets (type, date, location).
3 User opens a detail page.
4 Page displays Linked Art JSON‑LD, images, events, relationships.
5 User explores graph or timeline views.
!🧱 6. Functional Requirements
6.1 ETL Pipeline
Based on real Linked Art workflows, which include source → mapped → Linked Art JSON‑LD stages.
Source ingestion: Accept JSON API, CSV, XML.
Mapping engine: Transform source data into intermediate model.
Linked Art generator: Produce final JSON‑LD.
Validation: Ensure strict conformance with the Linked Art profile.
Reconciliation Engine: Deduplicate agents, places, and concepts by mapping against external authorities (Getty ULAN, AAT, TGN, Wikidata). Flag conflicts for manual review in the Admin UI.
Versioning: Track changes and updates over time.
6.2 JSON‑LD API
REST endpoints for objects, agents, events, places.
JSON‑LD @context support.
Pagination, filtering, faceting.
Graph expansion (?expand=events,actors).
IIIF image linking.
6.3 Search & Discovery
Full‑text search.
Faceted navigation.
Autocomplete.
Timeline view.
Network graph view.
6.4 Admin Interface
Upload datasets.
Review validation errors and reconciliation conflicts.
Approve publication.
Manage field mappings.
!⚙️ 7. Non‑Functional Requirements
Performance
Ingest 10k objects under 5 minutes.
API responses < 200ms for common queries.
Reliability
99.9% uptime.
Automatic retries for ingestion jobs.
Security & API Governance
Access Control: Role‑based access control (RBAC) for the Admin interface.
Audit Logs: Track all data ingestion and mapping changes.
Rate Limiting: Throttling policies (e.g., 100 requests/minute per IP) to protect the database from expensive SPARQL/Graph expansion queries.
CORS: Configurable Cross-Origin Resource Sharing policies for external web integrations.
API Keys: Provisioning for developer tiers allowing higher rate limits.
Accessibility
WCAG 2.1 AA compliance for all public UI views.
!🧬 8. Data Model
8.1 Standards
Linked Art: JSON‑LD profile built on CIDOC‑CRM.
CIDOC‑CRM: Event‑based ontology for cultural heritage.
JSON‑LD: Serialization format for Linked Data.
8.2 Core Entities
HumanMadeObject
Person / Group
Production / Acquisition Events
Place
DigitalObject (IIIF)
8.3 Required Patterns
Event‑based modeling.
Identifiers & references.
Multilingual labels.
Provenance chains.
!🔧 9. ETL Requirements (Detailed)
Based on real Linked Art workflows: source → mapped → Linked Art.
Source Data
JSON API (e.g., PIA API).
CSV exports from existing CMS.
XML from legacy systems.
Mapping Layer
Extract relevant fields.
Normalize dates, names, identifiers.
Map to Linked Art patterns.
Linked Art Output
JSON‑LD with correct @context.
Conformance to Linked Art profile.
Stable, dereferenceable URIs.
!🛠️ 10. API Requirements
Endpoints
/objects
/agents
/events
/places
/search
Features
JSON‑LD framing.
Filtering (?type=painting&date=1500-1600).
Sorting.
Graph expansion.
!🖥️ 11. Frontend Requirements
Search Interface
Keyword search.
Facets: type, date, location, material.
Object Detail Page
Title, description, IIIF images.
Production event.
Attribution.
Provenance.
Related objects.
Visualizations
Timeline.
Network graph.
Map view.
!🗺️ 12. Implementation Roadmap & Phasing
Phase 1: MVP (Core Foundation)
Goal: Establish the underlying data architecture and public read access.
Deliverables: Core JSON-LD API, rigid CSV ingestion (pre-mapped), basic search interface, and IIIF image integration.
Phase 2: The Publishing Engine (ETL Focus)
Goal: Allow non-technical curators to ingest and publish data independently.
Deliverables: Full Admin interface, robust mapping engine, JSON/XML ingestion, automated validation checks, and the reconciliation engine (Getty ULAN/Wikidata).
Phase 3: Research & Discovery (Advanced UI)
Goal: Unlock the full potential of event-based semantic data.
Deliverables: Network graph views, interactive timelines, map views for geographic provenance, and advanced API graph expansion (?expand=).
!🧪 13. TDD Requirements
Test Coverage
ETL mapping tests.
JSON‑LD validation tests.
API contract tests.
UI component tests.
Accessibility tests.
Acceptance Criteria
Every Linked Art entity must validate against the profile.
Every API endpoint must return valid JSON‑LD.
Every ingestion must produce deterministic output.
!📏 14. Success Metrics
Technical
100% JSON‑LD validation pass rate.
<1% ingestion error rate.
<200ms API response time.
User Experience
90% successful search rate.
80% of users find the desired object within 3 clicks.
Adoption
Integration with at least 3 external Linked Art systems.
!⚠️ 15. Risks & Dependencies
Risks
Source data quality from legacy CMS varies widely and requires heavy normalization.
Institutions may lack compliant IIIF servers for image fetching.
CIDOC‑CRM updates may require upstream model changes.
Dependencies
Linked Art specification.
CIDOC‑CRM ontology.
IIIF Image API.
External data sources and authority files (Getty Vocabularies, Wikidata).