Hi!
I'm Ruben Taelman, a Web postdoctoral researcher at IDLab,
with a focus on decentralization, Linked Data publishing, and querying.

My goal is to make data accessible for everyone by providing
intelligent infrastructure and algorithms for data publication and retrieval.

To support my research, I develop various open source JavaScript libraries such as streaming RDF parsers and the Comunica engine to query Linked Data on the Web.
As this website itself contains Linked Data, you can query it live with Comunica.

Have a look at my publications or projects
and contact me if any of those topics interest you.

Latest blog posts

  • Querying a Decentralized Web
    The road towards effective query execution of Decentralized Knowledge Graphs.

    Most of today’s applications are built based around the assumption that data is centralized. However, with recent decentralization efforts such as Solid quickly gaining popularity, we may be evolving towards a future where data is massively decentralized. In order to enable applications over decentralized data, there is a need for new querying techniques that can effectively execute over it. This post discusses the impact of decentralization on query execution, and the problems that need to be solved before we can use it effectively in a decentralized Web.

  • 5 rules for open source maintenance
    Guidelines for publishing and maintaining open source projects.

    Thanks to continuing innovation of software development tools and services, it has never been easier to start a software project and publish it under an open license. While this has lead to the availability of a huge arsenal of open source software projects, the number of qualitative projects that are worth reusing is of a significantly smaller order of magnitude. Based on personal experience, I provide five guidelines in this post that will help you to publish and maintain highly qualitative open-source software.

More blog posts

Highlighted publications

  1. Conference Link Traversal Query Processing over Decentralized Environments with Structural Assumptions
    1. 0
    2. 1
    In Proceedings of the 22nd International Semantic Web Conference To counter societal and economic problems caused by data silos on the Web, efforts such as Solid strive to reclaim private data by storing it in permissioned documents over a large number of personal vaults across the Web. Building applications on top of such a decentralized Knowledge Graph involves significant technical challenges: centralized aggregation prior to query processing is excluded for legal reasons, and current federated querying techniques cannot handle this large scale of distribution at the expected performance. We propose an extension to Link Traversal Query Processing (LTQP) that incorporates structural properties within decentralized environments to tackle their unprecedented scale. In this article, we analyze the structural properties of the Solid decentralization ecosystem that are relevant for query execution, we introduce novel LTQP algorithms leveraging these structural properties, and evaluate their effectiveness. Our experiments indicate that these new algorithms obtain accurate results in the order of seconds, which existing algorithms cannot achieve. This work reveals that a traversal-based querying method using structural assumptions can be effective for large-scale decentralization, but that advances are needed in the area of query planning for LTQP to handle more complex queries. These insights open the door to query-driven decentralized applications, in which declarative queries shield developers from the inherent complexity of a decentralized landscape. 2023
    More
  2. Conference Comunica: a Modular SPARQL Query Engine for the Web
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 17th International Semantic Web Conference Query evaluation over Linked Data sources has become a complex story, given the multitude of algorithms and techniques for single- and multi-source querying, as well as the heterogeneity of Web interfaces through which data is published online. Today’s query processors are insufficiently adaptable to test multiple query engine aspects in combination, such as evaluating the performance of a certain join algorithm over a federation of heterogeneous interfaces. The Semantic Web research community is in need of a flexible query engine that allows plugging in new components such as different algorithms, new or experimental SPARQL features, and support for new Web interfaces. We designed and developed a Web-friendly and modular meta query engine called Comunica that meets these specifications. In this article, we introduce this query engine and explain the architectural choices behind its design. We show how its modular nature makes it an ideal research platform for investigating new kinds of Linked Data interfaces and querying algorithms. Comunica facilitates the development, testing, and evaluation of new query processing capabilities, both in isolation and in combination with others. 2018
    More
  3. Journal Triple Storage for Random-Access Versioned Querying of RDF Archives
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    In Journal of Web Semantics When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis. 2018
    More
More publications

Latest publications

  1. Demo Demonstration of Link Traversal SPARQL Query Processing over the Decentralized Solid Environment
    1. 0
    2. 1
    In Proceedings of the 27th International Conference on Extending Database Technology (EDBT) To tackle economic and societal problems originating from the centralization of Knowledge Graphs on the Web, there has been an increasing interest towards decentralizing Knowledge Graphs across a large number of small authoritative sources. In order to effectively build user-facing applications, there is a need for efficient query engines that abstract the complexities around accessing such massively Decentralized Knowledge Graphs (DKGs). As such, we have designed and implemented novel Link Traversal Query Processing algorithms into the Comunica query engine framework that are capable of efficiently evaluating SPARQL queries across DKGs provided by the Solid decentralization initiative. In this article, we demonstrate this query engine through a Web-based interface over which SPARQL queries can be executed over simulated and real-world Solid environments. Our demonstration shows the benefits of a traversal-based approach towards querying DKGs, and uncovers opportunities for further optimizations in future work in terms of both query execution and discovery algorithms. 2024
    More
  2. Poster Personalized Medicine Through Personal Data Pods
    1. 1
    2. 4
    In Proceedings of the 15th International SWAT4HCLS Conference Medical care is in the process of becoming increasingly personalized through the use of patient genetic information. Concurrently, privacy concerns regarding collection and storage of sensitive personal genome sequence data have necessitated public debate and legal regulation. Here we identify two fundamental challenges associated with privacy and shareability of genomic data storage and propose the use of Solid pods to address these challenges. We establish that personal data pods using Solid specifications can enable decentralized storage, increased patient control over their data, and support of Linked Data formats, which when combined, could offer solutions to challenges currently restricting personalized medicine in practice. 2024
    More
  3. Poster Towards Algebraic Mapping Operators for Knowledge Graph Construction
    1. 1
    2. 2
    3. 3
    In Proceedings of the 22nd International Semantic Web Conference: Posters and Demos Declarative knowledge graph construction has matured to the point where state of the art techniques are focusing on optimizing the mapping processes. However, these optimization techniques use the syntax of the mapping language without considering the impact of the semantics. As a result, it is difficult to compare different engines fairly due to the obscurity in their semantic differences. In this poster paper, we propose an initial set of algebraic mapping operators to define the operational semantics of mapping processes, and provide a first step towards a theoretical foundation for mapping languages. We translated a series of RML documents to algebraic mapping operators to show the feasibility of our approach. We believe that further pursuing these initial results will lead to greater interoperability of mapping engines and languages, intensify requirements analysis for the upcoming RML standardization work, and an improved developer experience for all current and future mapping engines. 2023
    More
  4. Workshop The Need for Better RDF Archiving Benchmarks
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 9th Workshop on Managing the Evolution and Preservation of the Data Web The advancements and popularity of Semantic Web technologies in the last decades have led to an exponential adoption and availability of Web-accessible datasets. While most solutions consider such datasets to be static, they often evolve over time. Hence, efficient archiving solutions are needed to meet the users’ and maintainers’ needs. While some solutions to these challenges already exist, standardized benchmarks are needed to systematically test the different capabilities of existing solutions and identify their limitations. Unfortunately, the development of new benchmarks has not kept pace with the evolution of RDF archiving systems. In this paper, we therefore identify the current state of the art in RDF archiving benchmarks and discuss to what degree such benchmarks reflect the current needs of real-world use cases and their requirements. Through this empirical assessment, we highlight the need for the development of more advanced and comprehensive benchmarks that align with the evolving landscape of RDF archiving. 2023
    More
  5. Workshop How Does the Link Queue Evolve during Traversal-Based Query Processing?
    1. 0
    2. 1
    3. 2
    In Proceedings of the 7th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs Link Traversal-based Query Processing (LTQP) is an integrated querying approach that allows the query engine to start with zero knowledge of the data to query and discover data sources on the fly. The query engine starts with some seed documents and dynamically discovers new data sources by dereferencing hyperlinks in previously ingested data. Given the dynamic nature of source discovery, query processing tends to be relatively slow. Optimization techniques exist, such as exploiting existing structural information, but they depend on a deep understanding of the link queue during LTQP. To this end, we investigate the evolution of the types of link sources in the link queue and introduce metrics that describe key link queue characteristics. This paper analyses the link queue to guide future work on LTQP query optimization approaches that exploit structural information within a Solid environment. We find that queries exhibit two different execution patterns, one where the link queue is primarily empty and the other where the link queue fills faster than the engine can process. Our results show that the link queue is not functioning optimally and that our current approach to link discovery is not sufficiently selective. 2023
    More
More publications