Solid
a Web-based decentralization ecosystem
Users are in full control over their own data
Built on Web standards
→ How to query efficiently?
Personal data pods
Full control of where your pod is stored and who can access it
Pods can store any kind of data
Personal data, photo's, friends, ...
Data become decoupled from apps
-
Today: data and app are tightly coupled
No choice over where and how data is stored, and who can access it
-
Solid: data and app are decoupled
Apps require read/write permissions from the user
A paradigm shift in app design
-
Storage of data is decentralised
Data is stored in the user's pod instead of in the app
-
Combining multiple data pods
Apps become views over one or more data pods
-
Explicit access control
Apps can only view or modify (parts of) your data after explicit approval
Solid spreads data across many sources.
How to build efficient Solid apps?
Queries as abstraction layer
Hide the complexities of reading and writing for app developers
-
Say what needs to happen, not how
Declarative queries hide complexities of data retrieval
-
Queries are reusable
Queries are not API-specific
-
Execution via a generic, reusable query engine
Abstracts away complexities for executing queries
Comunica is a meta query engine
-
Collection of building blocks
Independent modules implement specific functionality
-
Runs anywhere (Web browser, server-side, ...)
Written in TypeScript/JavaScript
-
For research and production-ready
Open governance under the Comunica Association
How to query within the Solid ecosystem?
Link Traversal-based Query Processing
Query engine follows links based on the Linked Data principles
Traditional link traversal has problems
-
Termination
Many documents on the Web with many links to follow
Worst-case: download full Web for each query execution
-
Speed
Following links is expensive: setup TCP/HTTP connection
No indexing and query planning
-
Completeness
Depending on seed documents, different results may be obtained
Link traversal works quite well for Solid
-
Exploit structural properties of Solid pods
Pods use Linked Data Platform
Guaranteed to find all files in a user's pod
-
Make use of document-based indexes
Solid type index: links to all resources of a certain class
Inefficient query plans
-
Traditionally: number of links is bottleneck for link traversal
Due to structural properties of Solid pods, this is less of a problem
-
Query engines must use heuristics for query planning
No statistics available prior to query execution
Hartig, Olaf. "Zero-knowledge query planning for an iterator implementation of link traversal based query execution."
-
Need for adaptive query planning
Modify query plan during traversal
Discovery of cardinality estimates and indexes
Hybrid query execution
-
Solid pods are currently document-based
Collection of Linked Data documents
-
Pods could expose more expressive interfaces
SPARQL endpoints, TPF, SPF, ...
-
Need for query execution over heterogeneous sources
How to do this in an adaptive manner?
Query engine only discover this interface during query execution
Exploit structural information
-
Users can structure their pod in a certain way
Place all photos in directories based on country
-
Query engines may exploit this information
If pods expose this information
-
Relevant for query planning
Pruning of documents and prioritization
Reasoning at query time
-
Different pods/apps may use different vocabularies
Schema.org, FOAF, Wikidata, ...
-
Apps issue queries in a single vocabulary
Query engine should perform schema alignment
-
Reasoning over partial and streaming knowledge
How to do this efficiently?
Summarization across multiple pods
-
Data may be aggregated across multiple Solid pods
Usage within family context, work place, ...
-
Query engines can exploit these summaries
Query planning and source selection
Link traversal is promising for Solid
-
Improvements are required to make it usable in practise
Adaptive query planning, hybrid query execution, ...
-
Investigating different techniques, but more work is needed
Open for collaborations