Storing and Querying Evolving Knowledge Graphs on the Web

Ruben Taelman

Public PhD Defense, IDLab, Ghent University — imec, 17 February 2020

Storing and Querying Evolving Knowledge Graphs on the Web

Advisors: prof. dr. ir. Ruben Verborgh, dr. Miel Vander Sande

Ghent University – imec – IDLab, Belgium

Storing and Querying Evolving Knowledge Graphs on the Web

Storing and Querying Evolving Knowledge Graphs on the Web

The Web is a global information space

a.k.a. The World Wide Web (WWW)

News
Social
Media

Web is focused on humans

Achieving tasks requires manual effort

What if intelligent agents
could do these tasks for us?

I Have a Dream

I Have a Dream

A Web where intelligent
agents are able to achieve
our day-to-day tasks.
2001

Tim Berners-Lee
inventor of the Web

1989, CERN Switzerland

The dream: We're getting there

Agents are being introduced that can perform simple tasks

How do these agents work?

→ Execute Queries over Knowledge Graphs

Query = Structured Question
Knowledge Graph = Structured information

→ Structured questions over structured information

A Knowledge Graph is
a collection of structured information

An Evolving Knowledge Graph is
a collection of structured information that evolves over time

How to store and query evolving knowledge graphs on the Web?

Focus on four sub-challenges

  1. Generating evolving knowledge graphs

  2. Storing evolving knowledge graphs

  3. Querying knowledge graphs on the Web

  4. Querying evolving knowledge graphs on the Web

Experimentation requires realistic data

Experimentation on
evolving knowledge graph systems
requires realistic evolving data

Generating Public Transport Data

Population Distributions as Input

Population Distribution Belgium

Generation Algorithm

Open-source Implementation

PoDiGG

Population Distribution-based GTFS Generator

https://github.com/PoDiGG/podigg

PoDiGG: Example

Stops

PoDiGG: Example

Edges

PoDiGG: Example

Routes

Realism Evaluation

Measured using distance functions


Generated
Generated
Real
Real
Random
Random

Generate realistic
evolving knowledge graphs

How to store evolving knowledge efficiently,
while still allowing efficient queries?

Storing multiple knowledge graph versions

Problem: overlapping data between versions

Solution: only store differences

Regular Delta Chain

Problem: querying inefficient for long chains

Solution 2: differences relative to snapshot

Alternative Delta Chain

New Query Algorithms

Open-source Implementation

OSTRICH

Offset-enabled Triple store for Changesets

https://github.com/rdfostrich/ostrich

Experiment: Querying in single versions

VM

Experiment: Querying between versions

DM

Experiment: Querying across versions

VQ

How to store evolving knowledge efficiently,
while still allowing efficient queries?

Web Querying is multifaceted

Need for a flexible querying platform,
to implement and compare different facets

A set of building blocks
for building query engines

Open-source Implementation

Comunica

A highly flexible query engine platform


https://github.com/comunica/comunica

Comunica is flexible at different levels

Example: find common interests between Ruben Verborgh and me

Handling different Web interfaces

→ Comunica detects type of interface to plan query execution

Need for a flexible querying platform,
to implement and compare different facets

How to query
evolving knowledge graphs
over the Web?

The Web is like a large audience
continuously asking questions

Theatre

Load server client theatre Verborgh 2014

Audience solves questions themselves

Library

Load server client library Verborgh 2014

Focus on evolving data

Data that changes at predictable rates

Annotate data with time intervals

Open-source Implementation

TPF-QS

TPF-QS: Triple Pattern Fragments Query Streamer

https://github.com/LinkedDataFragments/QueryStreamer.js

Experiment: Publisher effort lowered

Scalability

How to query
evolving knowledge graphs
over the Web?

Storing and querying evolving knowledge graphs on the Web

How will this affect you?