Link Traversal-based Query Processing

Ruben Taelman

Web Fundamentals course 2023, 4 May 2023

Link Traversal-based
Query Processing

Query execution by following links between documents

Ghent University – imec – IDLab, Belgium

SPARQL processing over centralized data

How to query over decentralized data?

Example: decentralized address book

Example: Find Alice's contact names

SELECT ?name WHERE {
    <https://alice.pods.org/profile#me>
        foaf:knows ?person.
    ?person foaf:name ?name.
}

Query process:

  1. Start from Alice's profile
  2. Follow links to profiles of Bob and Carol
  3. Query over union of all profiles
  4. Find query results: [ { "name": "Bob" }, { "name": "Carol" } ]

Link Traversal has practical problems

But solutions exist!

Process query during link queue handling

→ Cope with infinitely growing link queues

Pre-filter links before they enter queue

→ Cope with large number of links

Different filtering strategies exist

Follow if triple matches query: example

SELECT ?name WHERE {
    <https://alice.pods.org/profile#me> foaf:knows ?person.
    ?person foaf:name ?name.
}

Contents of https://alice.pods.org/profile:
✅ </profile#me> :knows <https://bob.pods.org/profile#me>.
✅ </profile#me> :knows <https://carol.org/#i>.
❌ </profile#me> :likes <https://bob.pods.org/posts/hello-world>.

Filtering links and query semantics

Zero-knowledge query planning

→ Cope with existing query planning algorithms not working

Score-based heuristics for determining triple pattern order in BGPs

Slow query execution

Query never terminates

Potential security issues

Conclusions