Hi!
I'm Ruben Taelman, a Web postdoctoral researcher at IDLab,
with a focus on decentralization, Linked Data publishing, and querying.

My goal is to make data accessible for everyone by providing
intelligent infrastructure and algorithms for data publication and retrieval.

To support my research, I develop various open source JavaScript libraries such as streaming RDF parsers and the Comunica engine to query Linked Data on the Web.
As this website itself contains Linked Data, you can query it live with Comunica.

Have a look at my publications or projects
and contact me if any of those topics interest you.

Latest blog posts

  • Querying a Decentralized Web
    The road towards effective query execution of Decentralized Knowledge Graphs.

    Most of today’s applications are built based around the assumption that data is centralized. However, with recent decentralization efforts such as Solid quickly gaining popularity, we may be evolving towards a future where data is massively decentralized. In order to enable applications over decentralized data, there is a need for new querying techniques that can effectively execute over it. This post discusses the impact of decentralization on query execution, and the problems that need to be solved before we can use it effectively in a decentralized Web.

  • 5 rules for open source maintenance
    Guidelines for publishing and maintaining open source projects.

    Thanks to continuing innovation of software development tools and services, it has never been easier to start a software project and publish it under an open license. While this has lead to the availability of a huge arsenal of open source software projects, the number of qualitative projects that are worth reusing is of a significantly smaller order of magnitude. Based on personal experience, I provide five guidelines in this post that will help you to publish and maintain highly qualitative open-source software.

More blog posts

Highlighted publications

  1. Conference Link Traversal Query Processing over Decentralized Environments with Structural Assumptions
    1. 0
    2. 1
    In Proceedings of the 22nd International Semantic Web Conference To counter societal and economic problems caused by data silos on the Web, efforts such as Solid strive to reclaim private data by storing it in permissioned documents over a large number of personal vaults across the Web. Building applications on top of such a decentralized Knowledge Graph involves significant technical challenges: centralized aggregation prior to query processing is excluded for legal reasons, and current federated querying techniques cannot handle this large scale of distribution at the expected performance. We propose an extension to Link Traversal Query Processing (LTQP) that incorporates structural properties within decentralized environments to tackle their unprecedented scale. In this article, we analyze the structural properties of the Solid decentralization ecosystem that are relevant for query execution, we introduce novel LTQP algorithms leveraging these structural properties, and evaluate their effectiveness. Our experiments indicate that these new algorithms obtain accurate results in the order of seconds, which existing algorithms cannot achieve. This work reveals that a traversal-based querying method using structural assumptions can be effective for large-scale decentralization, but that advances are needed in the area of query planning for LTQP to handle more complex queries. These insights open the door to query-driven decentralized applications, in which declarative queries shield developers from the inherent complexity of a decentralized landscape. 2023
    More
  2. Conference Comunica: a Modular SPARQL Query Engine for the Web
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 17th International Semantic Web Conference Query evaluation over Linked Data sources has become a complex story, given the multitude of algorithms and techniques for single- and multi-source querying, as well as the heterogeneity of Web interfaces through which data is published online. Today’s query processors are insufficiently adaptable to test multiple query engine aspects in combination, such as evaluating the performance of a certain join algorithm over a federation of heterogeneous interfaces. The Semantic Web research community is in need of a flexible query engine that allows plugging in new components such as different algorithms, new or experimental SPARQL features, and support for new Web interfaces. We designed and developed a Web-friendly and modular meta query engine called Comunica that meets these specifications. In this article, we introduce this query engine and explain the architectural choices behind its design. We show how its modular nature makes it an ideal research platform for investigating new kinds of Linked Data interfaces and querying algorithms. Comunica facilitates the development, testing, and evaluation of new query processing capabilities, both in isolation and in combination with others. 2018
    More
  3. Journal Triple Storage for Random-Access Versioned Querying of RDF Archives
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    In Journal of Web Semantics When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis. 2018
    More
More publications

Latest publications

  1. Poster Towards Algebraic Mapping Operators for Knowledge Graph Construction
    1. 1
    2. 2
    3. 3
    In Proceedings of the 22nd International Semantic Web Conference: Posters and Demos Declarative knowledge graph construction has matured to the point where state of the art techniques are focusing on optimizing the mapping processes. However, these optimization techniques use the syntax of the mapping language without considering the impact of the semantics. As a result, it is difficult to compare different engines fairly due to the obscurity in their semantic differences. In this poster paper, we propose an initial set of algebraic mapping operators to define the operational semantics of mapping processes, and provide a first step towards a theoretical foundation for mapping languages. We translated a series of RML documents to algebraic mapping operators to show the feasibility of our approach. We believe that further pursuing these initial results will lead to greater interoperability of mapping engines and languages, intensify requirements analysis for the upcoming RML standardization work, and an improved developer experience for all current and future mapping engines. 2023
    More
  2. Workshop The Need for Better RDF Archiving Benchmarks
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 9th Workshop on Managing the Evolution and Preservation of the Data Web The advancements and popularity of Semantic Web technologies in the last decades have led to an exponential adoption and availability of Web-accessible datasets. While most solutions consider such datasets to be static, they often evolve over time. Hence, efficient archiving solutions are needed to meet the users’ and maintainers’ needs. While some solutions to these challenges already exist, standardized benchmarks are needed to systematically test the different capabilities of existing solutions and identify their limitations. Unfortunately, the development of new benchmarks has not kept pace with the evolution of RDF archiving systems. In this paper, we therefore identify the current state of the art in RDF archiving benchmarks and discuss to what degree such benchmarks reflect the current needs of real-world use cases and their requirements. Through this empirical assessment, we highlight the need for the development of more advanced and comprehensive benchmarks that align with the evolving landscape of RDF archiving. 2023
    More
  3. Workshop How Does the Link Queue Evolve during Traversal-Based Query Processing?
    1. 0
    2. 1
    3. 2
    In Proceedings of the 7th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs Link Traversal-based Query Processing (LTQP) is an integrated querying approach that allows the query engine to start with zero knowledge of the data to query and discover data sources on the fly. The query engine starts with some seed documents and dynamically discovers new data sources by dereferencing hyperlinks in previously ingested data. Given the dynamic nature of source discovery, query processing tends to be relatively slow. Optimization techniques exist, such as exploiting existing structural information, but they depend on a deep understanding of the link queue during LTQP. To this end, we investigate the evolution of the types of link sources in the link queue and introduce metrics that describe key link queue characteristics. This paper analyses the link queue to guide future work on LTQP query optimization approaches that exploit structural information within a Solid environment. We find that queries exhibit two different execution patterns, one where the link queue is primarily empty and the other where the link queue fills faster than the engine can process. Our results show that the link queue is not functioning optimally and that our current approach to link discovery is not sufficiently selective. 2023
    More
  4. Workshop In-Memory Dictionary-Based Indexing of Quoted RDF Triples
    1. 0
    2. 1
    In Proceedings of the 7th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs The upcoming RDF 1.2 recommendation is scheduled to introduce the concept of quoted triples, which allows statements to be made about other statements. Since quoted triples enable new forms of data access in SPARQL 1.2, in the form of quoted triple patterns, there is a need for new indexing strategies that can efficiently handle these data access patterns. As such, we explore and evaluate different in-memory indexing approaches for quoted triples. In this paper, we investigate four indexing approaches, and evaluate their performance over an artificial dataset with custom triple pattern queries. Our findings show that the so-called indexed quoted triples dictionary vastly outperforms other approaches in terms of query execution time at the cost of increased storage size and ingestion time. Our work shows that indexing quoted triples in a dictionary separate from non-quoted RDF terms achieves good performance, and can be implemented using well-known indexing techniques into existing systems. Therefore, we illustrate that the addition of quoted triples into the RDF stack can be achieved in a performant manner. 2023
    More
  5. Conference LDkit: Linked Data Object Graph Mapping toolkit for Web Applications
    1. 0
    2. 1
    3. 2
    In Proceedings of the 22nd International Semantic Web Conference The adoption of Semantic Web and Linked Data technologies in web application development has been hindered by the complexity of numerous standards, such as RDF and SPARQL, as well as the challenges associated with querying data from distributed sources and a variety of interfaces. Understandably, web developers often prefer traditional solutions based on relational or document databases due to the higher level of data predictability and superior developer experience. To address these issues, we present LDkit, a novel Object Graph Mapping (OGM) framework for TypeScript designed to provide a model-based abstraction for RDF. LDkit facilitates the direct utilization of Linked Data in web applications, effectively working as the data access layer. It accomplishes this by querying and retrieving data, and transforming it into TypeScript primitives according to user defined data schemas, while ensuring end-to-end data type safety. This paper describes the design and implementation of LDkit, highlighting its ability to simplify the integration of Semantic Web technologies into web applications, while adhering to both general web standards and Linked Data specific standards. Furthermore, we discuss how LDkit framework has been designed to integrate seamlessly with popular web application technologies that developers are already familiar with. This approach promotes ease of adoption, allowing developers to harness the power of Linked Data without disrupting their current workflows. Through the provision of an efficient and intuitive toolkit, LDkit aims to enhance the web ecosystem by promoting the widespread adoption of Linked Data and Semantic Web technologies. 2023
    More
More publications