On the Semantics of TPF-QS, Towards Publishing and Querying RDF Streams at Web-scale

On the Semantics of TPF-QS

Towards Publishing and Querying RDF Streams at a lower cost

Ruben Taelman¹
Riccardo Tommasini²
Joachim Van Herwegen¹
Miel Vander Sande¹
Emanuele Della Valle²
Ruben Verborgh¹

¹Ghent University – imec – IDLab, Belgium

²Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico of Milano, Italy

Song information as a slow RDF stream

09:15 - 09:20
```
song-x
```
09:20 - 09:25
```
song-y
```
09:25 - 09:30
```
song-z
```

How to publish slow streams
at a low cost?

TPF-QS: Triple Pattern Fragments Query Streamer (Taelman 2016)

Client-side RDF Stream Processor that continuously queries over RDF streams
Stream are published through a Triple Pattern Fragments (TPF) interface

How does TPF-QS compare to other RDF Stream Processing (RSP) engines, both formally and experimentally?

TPF-QS: a Client-side RDF Stream Processor
RSP-QL Formalization
Experimental comparison with other RSP engines

TPF-QS: a Client-side RDF Stream Processor
RSP-QL Formalization
Experimental comparison with other RSP engines

TPF-QS is a client-side RSP engine

Time interval annotation

RDF streams are annotated with their time validity
Low-cost server

Expose (slow) RDF streams through a TPF interface
Just-in-time

Client re-executes the query when result expires

RDF stream annotation to indicate expiration times

Assumption: expiration time of each stream element is predefined

09:15 - 09:20
```
radio:my m:plays song-x
```
09:20 - 09:25
```
radio:my m:plays song-y
```
09:25 - 09:30
```
radio:my m:plays song-z
```

Triple Pattern Fragments, (Verborgh 2016)
a low-cost Linked Data interface

Server exposes triple pattern interface over HTTP
RDF streams are published as annotated RDF statements
Clients evaluate full SPARQL queries client-side
→ Lowers server load

Client evaluates only when needed

Continuous SPARQL query:

SELECT ?song WHERE { radio:my m:plays ?song }

RDF Stream:

09:15 - 09:20
```
radio:my m:plays song-x
```
09:20 - 09:25
```
radio:my m:plays song-y
```
09:25 - 09:30
```
radio:my m:plays song-z
```

Query results:

09:15
```
song-x
```
09:20
```
song-y
```
09:25
```
song-z
```

Client evaluates only when needed

Continuous SPARQL query:

SELECT ?song WHERE { radio:my m:plays ?song }

RDF Stream:

09:15 - 09:20
```
radio:my m:plays song-x
```
09:20 - 09:25
```
radio:my m:plays song-y
```
09:25 - 09:30
```
radio:my m:plays song-z
```

Query results:

09:15
```
song-x
```
09:20
```
song-y
```
09:25
```
song-z
```

Client evaluates only when needed

Continuous SPARQL query:

SELECT ?song WHERE { radio:my m:plays ?song }

RDF Stream:

09:15 - 09:20
```
radio:my m:plays song-x
```
09:20 - 09:25
```
radio:my m:plays song-y
```
09:25 - 09:30
```
radio:my m:plays song-z
```

Query results:

09:15
```
song-x
```
09:20
```
song-y
```
09:25
```
song-z
```

TPF-QS: a Client-side RDF Stream Processor
RSP-QL Formalization
Experimental comparison with other RSP engines

RSP-QL: A unifying RSP Query Model (Dell'Aglio 2014)

Captures the differences between systems such as CQELS, C-SPARQL.

Extension of RDF and SPARQL, adding time semantics.

RDF stream S: unbounded sequence of pairs (τ, t).

τ: RDF statement

t: Time instant

RSP-QL applies windows to streams

A window operation W takes a chunk out of the potentially infinite stream.
The output of a window operation is a time-varying graph G_W.
Query evaluation over G_W at certain time instants.
Different policies exist to determine these time instants.

e.g.: periodic, on content-change, ...

RSP-QL applies windows to streams

A window operation W takes a chunk out of the potentially infinite stream.
The output of a window operation is a time-varying graph G_W.
Query evaluation over G_W at certain time instants.
Different policies exist to determine these time instants.

e.g.: periodic, on content-change, ...

RSP-QL applies windows to streams

A window operation W takes a chunk out of the potentially infinite stream.
The output of a window operation is a time-varying graph G_W.
Query evaluation over G_W at certain time instants.
Different policies exist to determine these time instants.

e.g.: periodic, on content-change, ...

TPF-QS in terms of RSP-QL (summary)

Stream elements annotated with time intervals

Intervals indicate valid time.
Tumbling time-based window, width = 1 time unit

Forms a sequence of consecutive windows for each time unit.
Simple query evaluation, as the resulting G_W is atemporal.
Mapping-expire policy

Query is (re-)evaluated when previous result mappings expire.

Does not exist in other RSP engines, only possible because of knowledge of expiration times.

TPF-QS: a Client-side RDF Stream Processor
RSP-QL Formalization
Experimental comparison with other RSP engines

Experimental comparison using CityBench (Ali 2015)

RSP benchmarking suite based on real-world sensor data

Streams evolve slowly, ideal for TPF-QS
Relevant metrics for RSP

Result completeness: result streams vs expected streams

Scalability: CPU usage and execution time for increasing number of clients

Result latency: Delay until each stream element appears as result
Adapters for benchmarking C-SPARQL and CQELS

Adaptation of CityBench for TPF-QS

CityBench does not work out-of-the-box with TPF-QS, so changes were needed.

Queries only available in C-SPARQL and CQELS syntax

Equivalent SPARQL 1.1 queries were created for TPF-QS

Transformation algorithm in appendix
Assumption of server-side engines

Multi-client support was added for single TPF server and multiple TPF-QS clients

Client CPU usage and bandwidth usage is also measured

Experimental Setup of CityBench

TPF-QS, C-SPARQL, CQELS
1 server machine, 8 client machines
6 CityBench queries (resulting frequency of 10 seconds)
Concurrent queries: 1, 16, 32, 64
15 minutes query evaluation runtime

TPF-QS achieves lower server CPU usage and lower latency for simple queries

	Q1	Q2	Q3
CPU
Latency

TPF-QS reaches higher server load for complex queries because of increased data transfer

	Q6	Q9	Q11*
CPU
Latency

*C-SPARQL fails for Q11

TPF-QS client load initially peaks,
and then drops

Load does not drop for Q2 and Q9,
because query evaluation duration > query frequency (10 seconds)

Conclusion: there is potential for lightweight streaming interfaces

TPF is good for publishing slow streams

Effective for simple queries, but not for complex queries.
TPF is not good for publishing fast streams

HTTP delay becomes too significant.
Alternative interfaces are needed for different types of streams

TPF with a temporal index?

TPF with streaming responses?

...

On the Semantics of TPF-QS