Blog
-
Did AI Crawlers Kill SPARQL Federation?
Public Knowledge Graph infrastructure is degrading due to AI crawlers.
RDF provides the basis for distributing Knowledge Graphs (KGs) across different locations, which is useful when KGs cover different data domains with varying purposes, are managed by different teams and organizations, or are exposed by different access policies. One of the most popular ways of publishing a KG is through a SPARQL endpoint, which offers queryable access. When multiple of these KGs need to be integrated, techniques such as SPARQL federation can be used. While many KGs have been available as public SPARQL endpoints, their openness is currently being challenged by the huge load that is placed on them by modern AI crawlers that power LLMs. Recently, public SPARQL endpoints have started putting in place stricter usage restrictions to avoid going down under this increased server load. While these restrictions limit the range of SPARQL queries that can be executed over them, it becomes especially problematic for SPARQL federated queries, which often involves sending multiple smaller queries to endpoints in a short timeframe.
-
The cost of modularity in SPARQL
How much do modularity and decentralization conflict with centralized speed?
The JavaScript-based Comunica SPARQL query engine is designed for querying over decentralized environments, e.g. through federation and link traversal. In addition, it makes use of a modular architecture to achieve high flexibility for developers. To determine the impact of these design decisions, and to be able to put the base level performance of Comunica in perspective, I compared its performance to state-of-the-art centralized SPARQL engines in terms of querying over centralized Knowledge Graphs. Results show that Comunica can closely match the performance of state-of-the-art SPARQL query engines, despite having vastly different optimization criteria.
-
Querying a Decentralized Web
The road towards effective query execution of Decentralized Knowledge Graphs.
Most of today’s applications are built based around the assumption that data is centralized. However, with recent decentralization efforts such as Solid quickly gaining popularity, we may be evolving towards a future where data is massively decentralized. In order to enable applications over decentralized data, there is a need for new querying techniques that can effectively execute over it. This post discusses the impact of decentralization on query execution, and the problems that need to be solved before we can use it effectively in a decentralized Web.
-
5 rules for open source maintenance
Guidelines for publishing and maintaining open source projects.
Thanks to continuing innovation of software development tools and services, it has never been easier to start a software project and publish it under an open license. While this has lead to the availability of a huge arsenal of open source software projects, the number of qualitative projects that are worth reusing is of a significantly smaller order of magnitude. Based on personal experience, I provide five guidelines in this post that will help you to publish and maintain highly qualitative open-source software.
-
Who says using RDF is hard?
A brief overview on RDF and Linked Data development in JavaScript.
The Linked Data ecosystem and RDF as its graph data model have been around for many years already. Even though there is an increasing interest in knowledge graphs, many developers are scared off by RDF due to its reputation of being complicated. In this post, I will show some concrete examples on how RDF can be used in JavaScript applications, to illustrate that RDF is actually pretty easy to work with if you use the right tools.
-
A story of streaming RDF parsers
How I implemented streaming JSON-LD and RDF/XML parsers in JavaScript.
Multiple serialization formats are currently being recommended by the W3C to represent RDF. JSON-LD and RDF/XML are examples of RDF serializations that are respectively based on the JSON and XML formats. The ability to parse RDF serializations in a streaming way offers many advantages, such as handling huge documents with only a limited amount of memory, and processing elements as soon as they are parsed. In this post, I discuss the motivation behind my streaming parser implementations for JSON-LD and RDF/XML, their architecture, and I show some live examples.