SPARQL Query Processing

Ruben Taelman

Web Fundamentals course 2022, 12 May 2022

SPARQL Query Processing

How do SPARQL query engines work?

Ghent University – imec – IDLab, Belgium

SPARQL querying over RDF datasets

Find all artists born in York

namedeathDate
Albert Joseph Moore
Charles Francis Hansom1888
David Reed (comedian)
Dustin Gee
E Ridsdale Tate1922

How do query engines process a query?

RDF dataset + SPARQL query

...

query results

How do query engines process a query?

RDF dataset + SPARQL query

SPARQL query processing

query results

Basic Graph Patterns enable graph pattern matching

Query results representation

1 solution sequence with 3 solution mappings:
namebirthplace
Bob Brockmannhttp://dbpedia.org/resource/Louisiana
Bennie Nawahihttp://dbpedia.org/resource/Honolulu
Weird Al Yankovichttp://dbpedia.org/resource/Downey,_California

Steps in SPARQL query processing

Parsing: transform query string into SPARQL Algebra

SPARQL Algebra example

Input:
SELECT ?x ?y ?z WHERE {
  ?x ?y ?z
}
Output:
{
  "type": "project",
  "input": {
    "type": "bgp",
    "patterns": [
      {
        "type": "pattern",
        "subject": {
          "termType": "Variable",
          "value": "x"
        },
        "predicate": {
          "termType": "Variable",
          "value": "y"
        },
        "object": {
          "termType": "Variable",
          "value": "z"
        }
      }
    ]
  },
  "variables": [
    { "termType": "Variable",
      "value": "x" },
    { "termType": "Variable",
      "value": "y" },
    { "termType": "Variable",
      "value": "z" }
  ]
}

Why is optimization needed?

Different levels of optimization

Algebraic optimization

Example: ORDER BY + DISTINCT

Input:
{
  "type": "distinct",
  "input": {
    "type": "orderby",
    "variable": "x",
    "input": { ... }
  }
}
Output:
{
  "type": "orderby",
  "variable": "x",
  "input": {
    "type": "distinct",
    "input": { ... }
  }
}

Low-level optimization

Example: joining in BGPs

How, and in what order should triple patterns be evaluated?

1. ?person a dbpedia-owl:Artist.

2. ?person rdfs:label ?name.

3. ?person dbpedia-owl:birthPlace ?birthPlace.

Earlier topic in Web Fundamentals course showed one technique
→ different approaches are possible!

Different factors influence join order

Obtain cardinality of each pattern

1. ?person a dbpedia-owl:Artist. 200

2. ?person rdfs:label ?name. 10.000

3. ?person dbpedia-owl:birthPlace ?birthPlace. 1.000

Can be pre-computed, or estimated at runtime.

Select appropriate join algorithms

Usually happens in pairs of two operators (n, m)

Join orders and algorithms influence worst-case complexity

→ Restrictions and preprocessing steps (sorting, hashing, ...) not yet taken into account

→ Trade-off between different factors that influence join order

Evaluating the optimized query plan

Evaluation iteratively handles leaf nodes

Based on operator type

{
  "type": "project",
  "variables": [ ... ]
  "input": {
    "type": "bgp",
    "patterns": [
      {
        "type": "pattern",
        "subject": { ... },
        "predicate": { ... },
        "object": { ... }
      },
      {
        "type": "pattern",
        "subject": { ... },
        "predicate": { ... },
        "object": { ... }
      }
    ]
  }
}

Evaluation iteratively handles leaf nodes

{
  "type": "project",
  "variables": [ ... ]
  "input": {
    "type": "bgp",
    "patterns": [
      {
        "type": "pattern",
        "subject": { ... },
        "predicate": { ... },
        "object": { ... }
      },
      {
        "type": "pattern",
        "subject": { ... },
        "predicate": { ... },
        "object": { ... }
      }
    ]
  }
}

Evaluation iteratively handles leaf nodes

{
  "type": "project",
  "variables": [ ... ]
  "input": {
    "type": "bgp",
    "patterns": [
      {
        ...solution sequence...
      },
      {
        "type": "pattern",
        "subject": { ... },
        "predicate": { ... },
        "object": { ... }
      }
    ]
  }
}

Evaluation iteratively handles leaf nodes

{
  "type": "project",
  "variables": [ ... ]
  "input": {
    "type": "bgp",
    "patterns": [
      {
        ...solution sequence...
      },
      {
        "type": "pattern",
        "subject": { ... },
        "predicate": { ... },
        "object": { ... }
      }
    ]
  }
}

Evaluation iteratively handles leaf nodes

{
  "type": "project",
  "variables": [ ... ]
  "input": {
    "type": "bgp",
    "patterns": [
      {
        ...solution sequence...
      },
      {
        ...solution sequence...
      }
    ]
  }
}

Evaluation iteratively handles leaf nodes

{
  "type": "project",
  "variables": [ ... ]
  "input": {
    "type": "bgp",
    "patterns": [
      {
        ...solution sequence...
      },
      {
        ...solution sequence...
      }
    ]
  }
}

Evaluation iteratively handles leaf nodes

{
  "type": "project",
  "variables": [ ... ]
  "input": {
    ...solution sequence...
  }
}

Evaluation iteratively handles leaf nodes

{
  "type": "project",
  "variables": [ ... ]
  "input": {
    ...solution sequence...
  }
}

Evaluation iteratively handles leaf nodes

...solution sequence...

Some operations can be parallelised

Pipelining produces results as soon as possible

A pipeline of operators

Adaptive processing responds to changing conditions

Conclusions