Song information as a slow RDF stream
- 09:15 - 09:20
song-x
- 09:20 - 09:25
song-y
- 09:25 - 09:30
song-z
How to publish slow streams
at a low cost?
TPF-QS: Triple Pattern Fragments Query Streamer (Taelman 2016)
- Client-side RDF Stream Processor that continuously queries over RDF streams
- Stream are published through a Triple Pattern Fragments (TPF) interface
How does TPF-QS compare to other RDF Stream Processing (RSP) engines, both formally and experimentally?
TPF-QS is a client-side RSP engine
-
RDF streams are annotated with their time validity
-
Expose (slow) RDF streams through a TPF interface
-
Client re-executes the query when result expires
RDF stream annotation to indicate expiration times
Assumption: expiration time of each stream element is predefined
- 09:15 - 09:20
radio:my m:plays song-x
- 09:20 - 09:25
radio:my m:plays song-y
- 09:25 - 09:30
radio:my m:plays song-z
Triple Pattern Fragments, (Verborgh 2016)
a low-cost Linked Data interface
- Server exposes triple pattern interface over HTTP
- RDF streams are published as annotated RDF statements
- Clients evaluate full SPARQL queries client-side
→ Lowers server load
Client evaluates only when needed
Continuous SPARQL query:
SELECT ?song WHERE { radio:my m:plays ?song }
RDF Stream:
- 09:15 - 09:20
radio:my m:plays song-x
- 09:20 - 09:25
radio:my m:plays song-y
- 09:25 - 09:30
radio:my m:plays song-z
Query results:
- 09:15
song-x
- 09:20
song-y
- 09:25
song-z
Client evaluates only when needed
Continuous SPARQL query:
SELECT ?song WHERE { radio:my m:plays ?song }
RDF Stream:
- 09:15 - 09:20
radio:my m:plays song-x
- 09:20 - 09:25
radio:my m:plays song-y
- 09:25 - 09:30
radio:my m:plays song-z
Query results:
- 09:15
song-x
- 09:20
song-y
- 09:25
song-z
Client evaluates only when needed
Continuous SPARQL query:
SELECT ?song WHERE { radio:my m:plays ?song }
RDF Stream:
- 09:15 - 09:20
radio:my m:plays song-x
- 09:20 - 09:25
radio:my m:plays song-y
- 09:25 - 09:30
radio:my m:plays song-z
Query results:
- 09:15
song-x
- 09:20
song-y
- 09:25
song-z
RSP-QL: A unifying RSP Query Model (Dell'Aglio 2014)
Captures the differences between systems such as CQELS, C-SPARQL.
Extension of RDF and SPARQL, adding time semantics.
RDF stream S
: unbounded sequence of pairs (τ, t)
.
τ
: RDF statement
t
: Time instant
RSP-QL applies windows to streams
- A window operation
W
takes a chunk out of the potentially infinite stream.
- The output of a window operation is a time-varying graph
GW
.
-
Query evaluation over
GW
at certain time instants.
Different policies exist to determine these time instants.
e.g.: periodic, on content-change, ...
RSP-QL applies windows to streams
- A window operation
W
takes a chunk out of the potentially infinite stream.
- The output of a window operation is a time-varying graph
GW
.
-
Query evaluation over
GW
at certain time instants.
Different policies exist to determine these time instants.
e.g.: periodic, on content-change, ...
RSP-QL applies windows to streams
- A window operation
W
takes a chunk out of the potentially infinite stream.
- The output of a window operation is a time-varying graph
GW
.
-
Query evaluation over
GW
at certain time instants.
Different policies exist to determine these time instants.
e.g.: periodic, on content-change, ...
TPF-QS in terms of RSP-QL (summary)
-
Stream elements annotated with time intervals
Intervals indicate valid time.
-
Tumbling time-based window, width = 1 time unit
Forms a sequence of consecutive windows for each time unit.
Simple query evaluation, as the resulting GW
is atemporal.
-
Mapping-expire policy
Query is (re-)evaluated when previous result mappings expire.
Does not exist in other RSP engines, only possible because of knowledge of expiration times.
Experimental comparison using CityBench (Ali 2015)
-
RSP benchmarking suite based on real-world sensor data
Streams evolve slowly, ideal for TPF-QS
-
Relevant metrics for RSP
Result completeness: result streams vs expected streams
Scalability: CPU usage and execution time for increasing number of clients
Result latency: Delay until each stream element appears as result
-
Adapters for benchmarking C-SPARQL and CQELS
Adaptation of CityBench for TPF-QS
CityBench does not work out-of-the-box with TPF-QS, so changes were needed.
-
Queries only available in C-SPARQL and CQELS syntax
Equivalent SPARQL 1.1 queries were created for TPF-QS
Transformation algorithm in appendix
-
Assumption of server-side engines
Multi-client support was added for single TPF server and multiple TPF-QS clients
Client CPU usage and bandwidth usage is also measured
Experimental Setup of CityBench
- TPF-QS, C-SPARQL, CQELS
- 1 server machine, 8 client machines
- 6 CityBench queries (resulting frequency of 10 seconds)
- Concurrent queries: 1, 16, 32, 64
- 15 minutes query evaluation runtime
TPF-QS achieves lower server CPU usage and lower latency for simple queries
TPF-QS reaches higher server load for complex queries because of increased data transfer
*C-SPARQL fails for Q11
TPF-QS client load initially peaks,
and then drops
- Load does not drop for Q2 and Q9,
because query evaluation duration > query frequency (10 seconds)
Conclusion: there is potential for lightweight streaming interfaces
-
TPF is good for publishing slow streams
Effective for simple queries, but not for complex queries.
-
TPF is not good for publishing fast streams
HTTP delay becomes too significant.
-
Alternative interfaces are needed for different types of streams
TPF with a temporal index?
TPF with streaming responses?
...