Triple Storage for Random-Access
Versioned Querying of RDF Archives

Ruben Taelman

ISWC 2019, 30 October 2019

Triple Storage for Random-Access Versioned Querying of RDF Archives

Ghent University – imec – IDLab, Belgium

Versioning: Variants of datasets exist
in parallel

How to store multiple RDF dataset versions
to enable efficient versioned queries?

Fernández et al. 2018

RDF versioning

Fernández et al. 2018

Versioning storage strategies

Hybrid storage has been done before

Hybrid Independent Copy (IC) and Change-Based (CB)

Regular delta chain

We make deltas relative to the snapshot

This makes version materialization fast for any version

Regular delta chainAlternative delta chain

We combine all deltas in a
combined timestamp-based store

Compresses all redundant triples, and annotates them with versions.

Storage overview
t1 ∈ 1,2, t2 ∈ 3,5, t3 ∈ 1,2,4, ...

Streaming, offset-enabled querying

Implementation: OSTRICH

OSTRICH
Fernández et al. 2018

Evaluation using BEAR

BEAR

Experiments

We highlight some results, more results can be found in the paper.

Storage size and time scale linearly
for a few large versions

BEAR-A: 10 versions, 30M~66M triples per version.

beara size beara time

Storage size and time mostly scale linearly for a many small versions

BEAR-B-hourly: 1299 versions, ~48K triples per version.

bearb size
bearb time

→ Delta chain becomes too long after version 1100

Version Materialization times are constant with respect to version

vm

Median BEAR-B-hourly VM query results for all triple patterns for all versions.

Delta Materialization times are constant with respect to version

dm

Median BEAR-B-hourly DM query results for all triple patterns for all versions.

Version Queries are always fast

vq

Median BEAR-B-hourly VQ query results for all triple patterns.

Conclusions

Additional links