Querying in Solid

Ruben Taelman

Ghent University – imec – IDLab, Belgium

Solid

a Web-based decentralization ecosystem

Users are in full control over their own data
Built on Web standards
→ How to query efficiently?

How Solid works
Querying in Solid
Ongoing query research

How Solid works
Querying in Solid
Ongoing query research

Personal data pods

Full control of where your pod is stored and who can access it

Pods can store any kind of data

Personal data, photo's, friends, ...

Data become decoupled from apps

Today: data and app are tightly coupled
No choice over where and how data is stored, and who can access it
Solid: data and app are decoupled
Apps require read/write permissions from the user

A paradigm shift in app design

Storage of data is decentralised
Data is stored in the user's pod instead of in the app
Combining multiple data pods
Apps become views over one or more data pods
Explicit access control
Apps can only view or modify (parts of) your data after explicit approval

Solid spreads data across many sources.
How to build efficient Solid apps?

How Solid works
Querying in Solid
Ongoing query research

Queries as abstraction layer

Hide the complexities of reading and writing for app developers

Say what needs to happen, not how
Declarative queries hide complexities of data retrieval
Queries are reusable
Queries are not API-specific
Execution via a generic, reusable query engine
Abstracts away complexities for executing queries

Comunica is a meta query engine

Collection of building blocks
Independent modules implement specific functionality
Runs anywhere (Web browser, server-side, ...)
Written in TypeScript/JavaScript
For research and production-ready
Open governance under the Comunica Association

https://comunica.dev/

How to query within the Solid ecosystem?

Link Traversal-based Query Processing

Query engine follows links based on the Linked Data principles

Traditional link traversal has problems

Termination
Many documents on the Web with many links to follow
Worst-case: download full Web for each query execution
Speed
Following links is expensive: setup TCP/HTTP connection
No indexing and query planning
Completeness
Depending on seed documents, different results may be obtained

Link traversal works quite well for Solid

Exploit structural properties of Solid pods
Pods use Linked Data Platform
Guaranteed to find all files in a user's pod
Make use of document-based indexes
Solid type index: links to all resources of a certain class

https://comunica.dev/research/link_traversal/

How Solid works
Querying in Solid
Ongoing query research

Inefficient query plans

Traditionally: number of links is bottleneck for link traversal
Due to structural properties of Solid pods, this is less of a problem
Query engines must use heuristics for query planning
No statistics available prior to query execution
Hartig, Olaf. "Zero-knowledge query planning for an iterator implementation of link traversal based query execution."
Need for adaptive query planning
Modify query plan during traversal
Discovery of cardinality estimates and indexes

Hybrid query execution

Solid pods are currently document-based
Collection of Linked Data documents
Pods could expose more expressive interfaces
SPARQL endpoints, TPF, SPF, ...
Need for query execution over heterogeneous sources
How to do this in an adaptive manner?
Query engine only discover this interface during query execution

Exploit structural information

Users can structure their pod in a certain way
Place all photos in directories based on country
Query engines may exploit this information
If pods expose this information
Relevant for query planning
Pruning of documents and prioritization

Reasoning at query time

Different pods/apps may use different vocabularies
Schema.org, FOAF, Wikidata, ...
Apps issue queries in a single vocabulary
Query engine should perform schema alignment
Reasoning over partial and streaming knowledge
How to do this efficiently?

Summarization across multiple pods

Data may be aggregated across multiple Solid pods
Usage within family context, work place, ...
Query engines can exploit these summaries
Query planning and source selection

How Solid works
Querying in Solid
Ongoing query research

Link traversal is promising for Solid

Improvements are required to make it usable in practise
Adaptive query planning, hybrid query execution, ...
Investigating different techniques, but more work is needed
Open for collaborations

Querying in Solid

Solid

a Web-based decentralization ecosystem

Users are in full control over their own data

Built on Web standards

→ How to query efficiently?

Personal data pods

Full control of where your pod is stored and who can access it

Pods can store any kind of data

Personal data, photo's, friends, ...

Data become decoupled from apps

Today: data and app are tightly coupled

Solid: data and app are decoupled

A paradigm shift in app design

Storage of data is decentralised

Combining multiple data pods

Explicit access control

Solid spreads data across many sources.How to build efficient Solid apps?

Queries as abstraction layer

Hide the complexities of reading and writing for app developers

Say what needs to happen, not how

Queries are reusable

Execution via a generic, reusable query engine

Comunica is a meta query engine

Collection of building blocks

Runs anywhere (Web browser, server-side, ...)

For research and production-ready

https://comunica.dev/

How to query within the Solid ecosystem?

Link Traversal-based Query Processing

Query engine follows links based on the Linked Data principles

Traditional link traversal has problems

Termination

Speed

Completeness

Link traversal works quite well for Solid

Exploit structural properties of Solid pods

Make use of document-based indexes

https://comunica.dev/research/link_traversal/

Inefficient query plans

Traditionally: number of links is bottleneck for link traversal

Query engines must use heuristics for query planning

Need for adaptive query planning

Hybrid query execution

Solid pods are currently document-based

Pods could expose more expressive interfaces

Need for query execution over heterogeneous sources

Exploit structural information

Users can structure their pod in a certain way

Query engines may exploit this information

Relevant for query planning

Reasoning at query time

Different pods/apps may use different vocabularies

Apps issue queries in a single vocabulary

Reasoning over partial and streaming knowledge

Summarization across multiple pods

Data may be aggregated across multiple Solid pods

Query engines can exploit these summaries

Link traversal is promising for Solid

Improvements are required to make it usable in practise

Investigating different techniques, but more work is needed

Solid spreads data across many sources.
How to build efficient Solid apps?