Web Querying

Ruben Taelman

VAIA course Linked Data & Solid, 24 October 2024

Web Querying

Ghent University – imec – IDLab, Belgium

Linked Data on the Web is multifaceted

Difficulties for app developers

Query Engines abstract access

Hide the complexities of reading and writing for app developers

Application Query Engine Globe

Image credit

Queries as abstraction layer

SQL, SPARQL, GQL, GraphQL, ...

SPARQL Protocol and RDF Query Language

Different query forms in SPARQL

Find all artists born in York

Albert Joseph Moore
Charles Francis Hansom1888
David Reed (comedian)
Dustin Gee
E Ridsdale Tate1922

How do query engines process a query?

RDF dataset + SPARQL query


query results

How do query engines process a query?

RDF dataset + SPARQL query

SPARQL query processing

query results

Basic Graph Patterns enable graph pattern matching

Query results representation

1 solution sequence with 3 solution mappings:
Bob Brockmannhttp://dbpedia.org/resource/Louisiana
Bennie Nawahihttp://dbpedia.org/resource/Honolulu
Weird Al Yankovichttp://dbpedia.org/resource/Downey,_California

Steps in SPARQL query processing

1. Parsing: Transform query string into SPARQL Algebra

1. Parsing: SPARQL Algebra example

SELECT ?x ?y ?z WHERE {
  ?x ?y ?z
  "type": "project",
  "input": {
    "type": "bgp",
    "patterns": [
        "type": "pattern",
        "subject": {
          "termType": "Variable",
          "value": "x"
        "predicate": {
          "termType": "Variable",
          "value": "y"
        "object": {
          "termType": "Variable",
          "value": "z"
  "variables": [
    { "termType": "Variable",
      "value": "x" },
    { "termType": "Variable",
      "value": "y" },
    { "termType": "Variable",
      "value": "z" }

2. Optimization: Why do we need it?

2. Optimization: Different levels

Algebraic optimization


  "type": "distinct",
  "input": {
    "type": "orderby",
    "variable": "x",
    "input": { ... }
  "type": "orderby",
  "variable": "x",
  "input": {
    "type": "distinct",
    "input": { ... }

Low-level optimization

Example: joining in BGPs

How, and in what order should triple patterns be evaluated?

1. ?person a dbpedia-owl:Artist.

2. ?person rdfs:label ?name.

3. ?person dbpedia-owl:birthPlace ?birthPlace.

Different factors influence join order

Obtain cardinality of each pattern

1. ?person a dbpedia-owl:Artist. 200

2. ?person rdfs:label ?name. 10.000

3. ?person dbpedia-owl:birthPlace ?birthPlace. 1.000

Can be pre-computed, or estimated at runtime.

Select appropriate join algorithms

Usually happens in pairs of two operators (n, m)

Join orders and algorithms influence worst-case complexity

→ Restrictions and preprocessing steps (sorting, hashing, ...) not yet taken into account

→ Trade-off between different factors that influence join order

3. Evaluation: executing the plan

Common solution to Web querying: Centralization

Centralization not always possible

How to query over decentralized data?

Approaches for querying over decentralized data

Client distributes query over query APIs

Federation over SPARQL endpoints

SELECT ?drug ?title WHERE {
  ?drug db:drugCategory dbc:micronutrient.
  ?drug db:casRegistryNumber ?id.
  ?keggDrug rdf:type kegg :Drug.
  ?keggDrug bio2rdf:xRef ?id.
  ?keggDrug purl:title ?title.

SELECT ?drug ?title WHERE {
  SERVICE <http://example.com/drb> {
    ?drug db:drugCategory dbc:micronutrient.
    ?drug db:casRegistryNumber ?id.
  SERVICE <http://example.com/kegg> {
    ?keggDrug rdf:type kegg :Drug.
    ?keggDrug bio2rdf:xRef ?id.
    ?keggDrug purl:title ?title.

Federation over heterogeneous sources

Limitations of federated querying

The following will make use of Comunica

💻 Using Comunica in a JS app

Learn more about using Comunica
