Challenges in Query-driven Data Integration for a Decentralized Web

Ruben Taelman

GOBLIN COST Action - Task 1.3 and 1.4 Meeting 4, 26 November 2025

Challenges in
Query-driven
Data Integration
for a Decentralized Web

Ghent University – imec – IDLab, Belgium

Data is highly decentralized

Decentralized data integration is challenging for application developers

Query engines abstract access to decentralized data

Hide the complexities of reading and writing for app developers

Application Gear Globe

Image credit

SPARQL processing over centralized data

Centralization not always possible

How to query over decentralized data?

Approaches for querying over decentralized data

Client distributes query over query APIs

Federation over SPARQL endpoints

SELECT ?drug ?title WHERE {
  ?drug db:drugCategory dbc:micronutrient.
  ?drug db:casRegistryNumber ?id.
  ?keggDrug rdf:type kegg :Drug.
  ?keggDrug bio2rdf:xRef ?id.
  ?keggDrug purl:title ?title.
}


SELECT ?drug ?title WHERE {
  SERVICE <http://example.com/drb> {
    ?drug db:drugCategory dbc:micronutrient.
    ?drug db:casRegistryNumber ?id.
  }
  SERVICE <http://example.com/kegg> {
    ?keggDrug rdf:type kegg :Drug.
    ?keggDrug bio2rdf:xRef ?id.
    ?keggDrug purl:title ?title.
  }
}

Federation over heterogeneous sources

Limitations of federated querying

Example: decentralized address book

Example: Find Alice's contact names

SELECT ?name WHERE {
    <https://alice.pods.org/profile#me>
        foaf:knows ?person.
    ?person foaf:name ?name.
}

Query process:

  1. Start from Alice's address book
  2. Follow links to profiles of Bob and Carol
  3. Query over union of all profiles
  4. Find query results: [ { "name": "Bob" }, { "name": "Carol" } ]

Conclusion: Query engines are a good match for integrating decentralized data

Query execution time mainly influenced by number of followed links

Link queue Discover 4 Link queue Complex 2

If pods expose more information, complex querying can become faster

Pods exposing shape information allows query engines to skip many links

Shape index

Pods exposing cardinality information allows engines to make better query plans

Query plan times