Link Traversal Query Processing

Ruben Taelman

Knowledge Graphs course 2024, 14 March 2024

Link Traversal
Query Processing

Query execution by following links between documents

Ghent University – imec – IDLab, Belgium

SPARQL processing over centralized data

Centralization not always possible

How to query over decentralized data?

Approaches for querying over decentralized data

Client distributes query over query APIs

Federation over SPARQL endpoints

SELECT ?drug ?title WHERE {
  ?drug db:drugCategory dbc:micronutrient.
  ?drug db:casRegistryNumber ?id.
  ?keggDrug rdf:type kegg :Drug.
  ?keggDrug bio2rdf:xRef ?id.
  ?keggDrug purl:title ?title.
}


SELECT ?drug ?title WHERE {
  SERVICE <http://example.com/drb> {
    ?drug db:drugCategory dbc:micronutrient.
    ?drug db:casRegistryNumber ?id.
  }
  SERVICE <http://example.com/kegg> {
    ?keggDrug rdf:type kegg :Drug.
    ?keggDrug bio2rdf:xRef ?id.
    ?keggDrug purl:title ?title.
  }
}

Federation over heterogeneous sources

Limitations of federated querying

Example: decentralized address book

Example: Find Alice's contact names

SELECT ?name WHERE {
    <https://alice.pods.org/profile#me>
        foaf:knows ?person.
    ?person foaf:name ?name.
}

Query process:

  1. Start from Alice's profile
  2. Follow links to profiles of Bob and Carol
  3. Query over union of all profiles
  4. Find query results: [ { "name": "Bob" }, { "name": "Carol" } ]

Link Traversal has practical problems

But solutions exist!

Process query during link queue handling

→ Cope with infinitely growing link queues

Pre-filter links before they enter queue

→ Cope with large number of links

Different filtering strategies exist

Follow if triple matches query: example

SELECT ?name WHERE {
    <https://alice.pods.org/profile#me> foaf:knows ?person.
    ?person foaf:name ?name.
}

Contents of https://alice.pods.org/profile:
✅ </profile#me> :knows <https://bob.pods.org/profile#me>.
✅ </profile#me> :knows <https://carol.org/#i>.
❌ </profile#me> :likes <https://bob.pods.org/posts/hello-world>.

Filtering links and query semantics

Zero-knowledge query planning

→ Cope with existing query planning algorithms not working

Score-based heuristics for determining triple pattern order in BGPs

Slow query execution

Query never terminates

Potential security issues

Conclusions