Storing and Querying Evolving Knowledge Graphs on the Web

Public PhD Defense, IDLab, Ghent University — imec, 17 February 2020

Storing and Querying Evolving Knowledge Graphs on the Web

Ruben Taelman

Advisors: prof. dr. ir. Ruben Verborgh, dr. Miel Vander Sande

Ghent University – imec – IDLab, Belgium

Storing and Querying Evolving Knowledge Graphs on the Web

Storing and Querying Evolving Knowledge Graphs on the Web

The Web is a global information space

a.k.a. The World Wide Web (WWW)

Mostly used by humans through Web browsers

Chrome

Edge

Firefox

Internet Explorer

Safari

News

Web is focused on humans

Web pages show information

Visualized using Web browsers
Clicking on links

To discover new information
Search information

Using search engines such as Google, Bing, ...

Achieving tasks requires manual effort

Will it rain next week?
1. Find a weather prediction website
2. Select your location
3. Navigate to next week
Finding a travel time and destination for a group of people
- Comparing agendas
- Comparing interests: nature, musea, ...
- Regional temperatures
- Geopolitical status
- ...

What if intelligent agents
could do these tasks for us?

I Have a Dream

I Have a Dream

A Web where intelligent
agents are able to achieve
our day-to-day tasks.
2001

Tim Berners-Lee
inventor of the Web

1989, CERN Switzerland

The dream: We're getting there

Agents are being introduced that can perform simple tasks

Google Assistant

Alexa

Siri

Google Assistant, Alexa, Siri, ...

How do these agents work?

→ Execute Queries over Knowledge Graphs

Query = Structured Question
Knowledge Graph = Structured information

→ Structured questions over structured information

A Knowledge Graph is
a collection of structured information

News

Graph

An Evolving Knowledge Graph is
a collection of structured information that evolves over time

News

Graph

Web ✅
Knowledge Graphs ✅

How to store and query evolving knowledge graphs on the Web?

Focus on four sub-challenges

Generating evolving knowledge graphs
Storing evolving knowledge graphs
Querying knowledge graphs on the Web
Querying evolving knowledge graphs on the Web

Generation
Storage
Querying
Querying Evolving

Generation
Storage
Querying
Querying Evolving

Experimentation requires realistic data

Experimentation on
evolving knowledge graph systems
requires realistic evolving data

Generating Public Transport Data

Time
Space
Complex relationships

Population Distributions as Input

Population Distribution Belgium

Positive correlation between population and stops

Generation Algorithm

Based on existing techniques
Public transport network design
Multiple generation phases
Stops, Edges, Routes, Trips

Open-source Implementation

Population Distribution-based GTFS Generator

https://github.com/PoDiGG/podigg

PoDiGG: Example

Stops

PoDiGG: Example

Edges

PoDiGG: Example

Routes

Realism Evaluation

Measured using distance functions

Generated

Generated

Real

Real

Random

Random

Generate realistic
evolving knowledge graphs

✅

Generation
Storage
Querying
Querying Evolving

How to store evolving knowledge efficiently,
while still allowing efficient queries?

Storing multiple knowledge graph versions

Problem: overlapping data between versions

Solution: only store differences

Regular Delta Chain

Problem: querying inefficient for long chains

Solution 2: differences relative to snapshot

Alternative Delta Chain

New Query Algorithms

Querying within a single version
Which books were available in the library yesterday?
Querying between versions
Which books were exchanged between last week and today?
Querying across versions
At what times was this book present?

Open-source Implementation

OSTRICH

Offset-enabled Triple store for Changesets

https://github.com/rdfostrich/ostrich

Experiment: Querying in single versions

Experiment: Querying between versions

Experiment: Querying across versions

How to store evolving knowledge efficiently,
while still allowing efficient queries?

✅

Generation
Storage
Querying
Querying Evolving

Web Querying is multifaceted

Query languages and extensions
Web interfaces
Query algorithms
...

Need for a flexible querying platform,
to implement and compare different facets

A set of building blocks
for building query engines

Each block is very simple
Blocks can connect to other blocks via strict interfaces
Combinations of blocks can achieve complex tasks

Open-source Implementation

Comunica

A highly flexible query engine platform

https://github.com/comunica/comunica

Comunica is flexible at different levels

Example: find common interests between Ruben Verborgh and me

Handling different Web interfaces

Ruben Verborgh's profile: RDF Turtle file
Ruben Taelman's profile: HTML file with RDFa
DBpedia: Triple Pattern Fragments interface

→ Comunica detects type of interface to plan query execution

Need for a flexible querying platform,
to implement and compare different facets

✅

Generation
Storage
Querying
Querying Evolving

How to query
evolving knowledge graphs
over the Web?

The Web is like a large audience
continuously asking questions

Theatre

Load server client theatre

Verborgh 2014

Audience solves questions themselves

Library

Load server client library

Verborgh 2014

Focus on evolving data

Static data: clients ask a single question
Evolving data: clients continuously ask questions
→ Increase in publisher effort

→ How to limit this increase?

Data that changes at predictable rates

Playlist of radio station
17:20 - 17:27: Max Richter - On The Nature Of Daylight
17:27 - 17:31: Hans Zimmer - Time

Annotate data with time intervals

Publisher:
17:20 - 17:27: Max Richter - On The Nature Of Daylight
17:27 - 17:31: Hans Zimmer - Time
Query at 17:25:
17:20 - 17:27: Max Richter - On The Nature Of Daylight
Query at 17:27:
17:27 - 17:31: Hans Zimmer - Time

Open-source Implementation

TPF-QS

TPF-QS: Triple Pattern Fragments Query Streamer

https://github.com/LinkedDataFragments/QueryStreamer.js

Experiment: Publisher effort lowered

Scalability

How to query
evolving knowledge graphs
over the Web?

✅

Generation
Storage
Querying
Querying Evolving

Storing and querying evolving knowledge graphs on the Web

How will this affect you?

Behind the scenes
No direct contact
Intelligent agents
More complex questions, continuous questions
Personal agents
Control, transparency, privacy, ...
→ More research is needed

https://phd.rubensworks.net/