RKD Knowledge Graph Integration Plan
Status: Planned (next B5 provider slice)
Updated: May 30, 2026
Source
- Dataset page: <https://rkd.triply.cc/rkd/RKD-Knowledge-Graph>
- Services page: <https://rkd.triply.cc/rkd/RKD-Knowledge-Graph/services>
- Triply SPARQL API reference: <https://docs.triply.cc/triply-api/>
As of May 30, 2026, the public dataset reports 600M+ statements and indicates CIDOC-CRM + Linked Art oriented modeling.
Why this provider
- Strong Linked Data fit for Meta Museum’s event-centric model.
- High-value coverage in Netherlandish art and archival/provenance-rich domains.
- Natural complement to Met, Getty, and Rijks.
Integration goals
- Add RKD as a first-class provider through `provider-interface`.
- Keep Linked Art JSON-LD canonical in storage and import flows.
- Preserve attribution + license metadata at ingest and display boundaries.
- Keep runtime safe at scale (bounded/paged queries only).
Proposed implementation slice
Adapter
- `src/adapters/rkd.ts`
- provider descriptor/profile
- URI parsing/normalization helpers
- candidate extraction from SPARQL/search payloads
- normalization into `SourceRecord` + `Artwork` boundary DTOs
Routes
- `GET /api/rkd/profile`
- `POST /api/rkd/search`
- `POST /api/rkd/entity`
- `POST /api/rkd/import`
- Optional `POST /api/rkd/sparql` (read-only, allowlisted templates only)
UI
- Add `rkd` source toggle to `/explore`.
- Show provider/license attribution consistently in cards/details/import output.
Configuration model
Use environment-driven endpoint config to avoid hardcoding service assumptions:
- `RKD_TRIPLY_ACCOUNT` (default: `rkd`)
- `RKD_TRIPLY_DATASET` (default: `RKD-Knowledge-Graph`)
- `RKD_SPARQL_SERVICE` (for example `speedy` or configured service name)
- `RKD_SPARQL_ENDPOINT` (explicit override, if provided)
- `RKD_TRIPLY_INSTANCE` (default: `triplydb.com`)
- `RKD_TRIPLY_TOKEN` (optional; required for private/internal datasets or write-capable service management)
- `RKD_USE_SAVED_QUERY_API` (`true` by default in production)
- `RKD_SAVED_QUERY_BASE` (optional explicit saved-query API base override)
Resolution order:
- `RKD_SPARQL_ENDPOINT` explicit override
- computed Triply endpoint from account/dataset/service
Computed endpoint patterns (from official Triply API docs):
- Dataset API:
- `https://api.INSTANCE/datasets/ACCOUNT/DATASET/`
- Service API:
- `https://api.INSTANCE/datasets/ACCOUNT/DATASET/services/SERVICE/`
- SPARQL endpoint:
- `https://api.INSTANCE/datasets/ACCOUNT/DATASET/services/SERVICE/sparql`
- Linked-data export:
- `https://api.INSTANCE/datasets/ACCOUNT/DATASET/download`
- Graph list:
- `https://api.INSTANCE/datasets/ACCOUNT/DATASET/graphs`
- IRI describe:
- `https://api.INSTANCE/datasets/ACCOUNT/DATASET/describe.nt?resource=RESOURCE`
Authentication and security
From official Triply API guidance:
- Public datasets: most reads can be done without auth.
- Internal/private datasets (and write operations): require bearer token.
- Header format:
- `Authorization: Bearer TOKEN`
- Token handling requirements:
- never commit tokens to git
- never share tokens outside authorized operators
- rotate tokens regularly or after suspected compromise
Project policy:
- `RKD_TRIPLY_TOKEN` must come from environment/secret manager only.
- No token logging in route handler errors.
- Tests must stub auth headers and never use real secrets.
Scale and safety guardrails
- Enforce small page windows (`limit`, `offset` / cursor) and max caps.
- Timebox upstream calls and retry with backoff.
- Reject mutation SPARQL (`INSERT`, `DELETE`, `LOAD`, `CLEAR`, etc.).
- Prefer allowlisted query templates with controlled variables.
- Add cache headers/internal memoization for repeated lookups.
- Default to saved query APIs for production retrieval/pagination; use raw SPARQL only for controlled admin/research paths.
SPARQL protocol + formats (official)
Triply supports the SPARQL 1.1 query protocol with:
- GET query (`query` parameter) for small requests
- POST urlencoded (`application/x-www-form-urlencoded`)
- POST direct (`application/sparql-query`) for larger/custom requests
Implementation preference:
- Saved queries for production flows
- Direct POST for ad hoc long queries
- GET only for tiny diagnostics
Result formats (via `Accept` header or suffix) include:
- `application/json`, `application/sparql-results+json`, `text/csv`, `text/tab-separated-values`
- `application/ld+json`, `application/n-triples`, `application/n-quads`, `application/trig`, `text/turtle`
For Linked Art integration routes, prefer:
- `application/ld+json` / `application/n-triples` for graph materialization paths
- `application/sparql-results+json` for tabular SELECT/ASK result handling
Linked-data serialization handling
Triply linked-data export supports:
- TriG (`application/trig`)
- N-Triples (`application/n-triples`)
- N-Quads (`application/n-quads`)
- Turtle (`text/turtle`)
- JSON-LD (`application/ld+json`)
Meta Museum adapter behavior:
- ingest using JSON-LD or N-Triples/N-Quads depending on endpoint behavior
- normalize to canonical Linked Art JSON-LD at storage boundary
- preserve source serialization metadata in `_source` diagnostics where useful
Standards and conformance mapping
All RKD provider PRs must include Standards Mapping with round + fixture anchors from:
- object + provenance rounds
- shared structures rounds
- schema rounds (as applicable)
- search/protocol rounds (71+)
Required conformance checks:
- event-centric modeling preserved (no object-person shortcut flattening)
- ownership vs custody distinction preserved
- ambiguous transfers mapped conservatively (`Transfer`)
- carrier/content separation preserved (`HumanMadeObject`/`DigitalObject` vs `VisualItem`/`LinguisticObject`)
- protocol/profile checks (B8): context/profile negotiation, CORS/OPTIONS, URI opacity, array cardinality safety
- Triply transport checks: auth behavior for public/private access modes, controlled media-type negotiation, safe fallback when requested serialization is unavailable
License and attribution handling
- Dataset license: Open Data Commons Attribution License 1.0 (ODC-By 1.0).
- Persist and render source attribution and license metadata with imported records.
- Keep reuse status conservative when image rights are not explicit.
Test plan (failing-first)
- `tests/adapters/rkd.test.ts`
- `tests/api/rkd/profile.test.ts`
- `tests/api/rkd/search.test.ts`
- `tests/api/rkd/entity.test.ts`
- `tests/api/rkd/import.test.ts`
- optional `tests/api/rkd/sparql.test.ts`
- extend `tests/quality/protocol-conformance.test.ts` with RKD route coverage once routes land
- add `tests/api/rkd/auth.test.ts` covering:
- public-read unauthenticated path
- bearer token forwarding path
- missing/invalid token behavior for protected operations
- add `tests/api/rkd/serialization.test.ts` covering Accept-header negotiation and fallback behavior
Exit criteria for RKD slice
- Routes and adapter shipped with green tests.
- `/explore` can discover + import RKD records.
- Standards Mapping note included in PR.
- B8 protocol checks implemented for RKD endpoints.