Publications

As a researcher, I write publications about my work, which are listed on this page. If you are interested in any of these, feel free to contact me so we can discuss this further.

2025

Workshop Towards tackling SPARQL heterogeneity through modular parsing
1. Jitse De Smet0
2. Ruben Taelman1
In Joint Proceedings of Posters, Demos, Workshops, and Tutorials of the 21st International Conference on Semantic Systems co-located with 20th International Conference on Semantic Systems (SEMANTiCS 2025) The SPARQL ecosystem has become increasingly fragmented as engines introduce valuable but incompatible language extensions. This growing diversity undermines query portability, tooling reliability, and the pace of innovation. To address this, we designed a modular parser architecture that supports dynamic extension and modular grammar definitions. This paper presents a builder-based, TypeScript-native parser framework inspired by Chevrotain and the modular principles of Comunica. Our prototype demonstrates that key SPARQL extensions can be integrated, altered, or removed with minimal effort and strong type safety. These results suggest that modular, declarative parsing is not only feasible but essential for keeping pace with evolving SPARQL standards. Looking forward, we identify the need for round-trippable ASTs, Babel-inspired generators, and transformer pipelines to enable a complete, future-proof SPARQL toolchain. 2025
More
Conference Link Traversal over Decentralised Environments using Restart-Based Query Planning
1. Jonni Hanski0
2. Ruben Taelman1
3. Ruben Verborgh2
In Proceedings of the 25th International Conference on Web Engineering With the emergence of decentralisation initiatives to address various issues around regulatory compliance and barriers of entry to data- driven markets, data access abstraction layers in the form of query engines are needed to assist in developing services on top of such environments. Prior work, however, has demonstrated significant network overhead dur- ing data retrieval in traversal-based query execution over decentralised Linked Data sources, dwarfing the relative impact of local processing and query optimisations. Certain decentralisation initatives, however, offer an environment with seemingly sufficient structure to address this, allowing client-side query engines to attain measurable performance improvements through local optimisations. One example is the Solid initiative, offering distributed well-defined user data stores, helping traversal-based query execution approaches in efficiently locating and accessing query-relevant data. Within this work, we demonstrate the impact of client-side adaptive query planning optimisations within structured distributed environments, using the Solid ecosystem as an example, to highlight the potential for tangible improvements in traversal-based execution. Through the implementation of a restart-based query planning technique, we achieve average query execution time reductions of up to 36% compared to a baseline of unchanged query plan execution. Conversely, we also demonstrate how such techniques, when applied without robust cost-benefit estimation, can effectively double the query execution time. This illustrates the importance and potential of client-side techniques even in such distributed environments, and highlights the importance of further investigation in the direction of these techniques. 2025
More
Journal Algebraic Mapping Operators for Knowledge Graph Generation
1. Sitt Min Oo0
2. Ben De Meester1
3. Ruben Taelman2
4. Pieter Colpaert3
In Semantic Web Journal Recent advancements in declarative knowledge graph generation have led to the development of multiple mapping languages, their various versions, and different mapping engines that can interpret these languages and execute the mapping process. The field has progressed to the extent that current studies are now more focused on optimizing the knowledge graph generation process. Although different mapping engines share the common functionality of generating knowledge graphs from heterogeneous data sources, sharing the various optimization techniques and features of these engines remains challenging due to the lack of formal operational semantics for the general mapping processes. A set of algebraic mapping operators can provide the necessary operational semantics for general mapping processes, establish a theoretical foundation for mapping languages, and facilitate the introduction and evaluation of a compliant implementation, that is capable of interpreting and executing multiple mapping languages. In this paper, we propose such an algebra based on the SPARQL algebra. This allows us to maximally reuse established definitions, and further bridge the world of knowledge graph generation with query engines. To evaluate that our work is not limited to a single specific mapping language, we translated mapping languages ShExML and RML to our mapping plan composed of algebraic mapping operators. The results of our completeness evaluation shows that our algebraic operators cover the operational semantics of RML and partially for ShExML. To fully cover ShExML, further analysis into ShExML’s concise operational semantics is needed (e.g. for joining data from two input sources). For performance evaluation, our proof-of-concept algebraic mapping engine has a consistent and low memory usage across the different workloads, and achieved second place in the Knowledge Graph Construction Workshop’s performance challenge. Algebraic mapping operators decouple mapping engines from the mapping languages, enabling multilingual mapping engines. Furthermore, the mapping plan can incorporate optimization techniques as a separate process from the mapping itself, allowing us to benefit from state-of-the-art mapping process optimizations. The proposed set of algebraic mapping operators will lay the foundation for future studies on the theoretical analysis of complexity and expressiveness of mapping languages, and will provide consistency in the execution semantics of mapping engines. Furthermore, the alignment of our algebra with SPARQL will enable further research into advanced methods such as virtualization, enabling heterogeneous data querying. 2025
More
Conference Incremunica: Web-based Incremental View Maintenance for SPARQL
In Proceedings of the 22nd Extended Semantic Web Conference The dynamic nature of Linked Data from IoT devices, social media, and the financial sector requires efficient mechanisms to keep SPARQL query results up to date, as traditional reevaluation methods are computationally expensive and impractical. Incremental view maintenance (IVM) offers a more efficient alternative by updating query results incrementally. However, existing engines lack support for federated querying, dynamically adding and removing sources during query execution, SPARQL Query Language support, multiple IVM techniques, and client-side execution. In this paper, we present Incremunica, an incremental query engine that addresses these gaps. Incremunica uniquely integrates multiple state-of-the-art incremental operators, allowing it to adapt to different queries and data for optimal performance. In this article, we provide 1) a requirements analysis comparing Incremunica to related work, 2) an explanation of Incremunica’s architecture and features, 3) a performance evaluation showing improvements over reevaluation, and 4) a demonstration of its benefits through a social media watch party application. 2025
More
Demo SGF: SPARQL Updates over Decentralized Knowledge Graphs without Access Path Dependencies
1. Jitse De Smet0
2. Ruben Taelman1
In Proceedings of the 22nd Extended Semantic Web Conference: Posters and Demos Decentralized data ecosystems, such as the Solid project, empower users to control their data but introduce complexities in data storage and retrieval. Current solutions provide mechanisms for describing data structures but lack sufficient guidance for determining where to create or update resources. To address this challenge, we propose the Storage Guiding Framework (SGF), a framework that enables clients to manage RDF resource storage within Solid pods. This paper introduces SGF, detailing the describing structure and how SGF allows clients to treat Solid pods as RDF collections rather than a collection of unstructured HTTP documents. Our findings show that SGV enhances data accessibility by eliminating the access path data-dependency and providing clear storage strategies. This improvement simplifies client-side data management while maintaining flexibility in data organization. 2025
More

2024

Poster Optimizing Traversal Queries of Sensor Data Using a Rule-Based Reachability Approach
In Proceedings of the 23rd International Semantic Web Conference: Posters and Demos Link Traversal queries face challenges in completeness and long execution time due to the size of the web. Reachability criteria define completeness by restricting the links followed by engines. However, the number of links to dereference remains the bottleneck of the approach. Web environments often have structures exploitable by query engines to prune irrelevant sources. Current criteria rely on using information from the query definition and predefined predicate. However, it is difficult to use them to traverse environments where logical expressions indicate the location of resources. We propose to use a rule-based reachability criterion that captures logical statements expressed in hypermedia descriptions within linked data documents to prune irrelevant sources. In this poster paper, we show how the Comunica link traversal engine is modified to take hints from a hypermedia control vocabulary, to prune irrelevant sources. Our preliminary findings show that by using this strategy, the query engine can significantly reduce the number of HTTP requests and the query execution time without sacrificing the completeness of results. Our work shows that the investigation of hypermedia controls in link pruning of traversal queries is a worthy effort for optimizing web queries of unindexed decentralized databases. 2024
More
Journal Towards Applications on the Decentralized Web using Hypermedia-driven Query Engines
1. Ruben Taelman0
In ACM SIGWEB Newsletter The Web is facing unprecedented challenges related to the control and ownership of data. Due to recent privacy and manipulation scandals caused by the increasing centralization of data on the Web into increasingly fewer large data silos, there is a growing demand for the re-decentralization of the Web. Enforced by user-empowering legislature such as GDPR and CCPA, decentralization initiatives such as Solid are being designed that aim to break personal data out of these silos and give back control to the users. While the ability to choose where and how data is stored provides significant value to users, it leads to major technical challenges due to the massive distribution of data across the Web. Since we cannot expect application developers to take up this burden of coping with all the complexities of handling this data distribution, there is a need for a software layer that abstracts away these complexities, and makes this decentralized data feel as being centralized. Concretely, we propose personal query engines to play the role of such an abstraction layer. In this article, we discuss what the requirements are for such a query-based abstraction layer, what the state of the art is in this area, and what future research is required. Even though many challenges remain to achieve a developer-friendly abstraction for interacting with decentralized data on the Web, the pursuit of it is highly valuable for both end-users that want to be in control of their data and its processing and businesses wanting to enable this at a low development cost. 2024
More
Workshop The R3 Metric: Measuring Performance of Link Prioritization during Traversal-based Query Processing
In Proceedings of the 16th Alberto Mendelzon International Workshop on Foundations of Data Management The decentralization envisioned for the current centralized web requires querying approaches capable of accessing multiple small data sources while complying with legal constraints related to personal data, such as licenses and the GDPR. Link Traversal-based Query Processing (LTQP) is a querying approach designed for highly decentralized environments that satisfies these legal requirements. An important optimization avenue in LTQP is the order in which links are dereferenced, which involves prioritizing links to query-relevant documents. However, assessing and comparing the algorithmic performance of these systems is challenging due to various compounding factors during query execution. Therefore, researchers need an implementation-agnostic and deterministic metric that accurately measures the marginal effectiveness of link prioritization algorithms in LTQP engines. In this paper, we motivate the need for accurately measuring link prioritization performance, define and test such a metric, and outline the challenges and potential extensions of the proposed metric. Our findings show that the proposed metric highlights differences in link prioritization performance depending on the queried data fragmentation strategy. The proposed metric allows for evaluating link prioritization performance and enables easily assessing the effectiveness of future link prioritization algorithms. 2024
More
Workshop Opportunities for Shape-based Optimization of Link Traversal Queries
In Proceedings of the 16th Alberto Mendelzon International Workshop on Foundations of Data Management Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main difficulties of LTQP are the lack of apriori information about data sources and the high number of HTTP requests. Exploring decentralized-friendly ways to document unindexed networks of data sources could lead to solutions to alleviate those difficulties. RDF data shapes are widely used to validate linked data documents, therefore, it is worthwhile to investigate their potential for LTQP optimization. In our work, we built an early version of a source selection algorithm for LTQP using RDF data shape mappings with linked data documents and measured its performance in a realistic setup. In this article, we present our algorithm and early results, thus, opening opportunities for further research for shape-based optimization of link traversal queries. Our initial experiments show that with little maintenance and work from the server, our method can reduce up to 80% the execution time and 97% the number of links traversed during realistic queries. Given our early results and the descriptive power of RDF data shapes it would be worthwhile to investigate non-heuristic-based query planning using RDF shapes. 2024
More
Conference Decentralized Search over Personal Online Datastores: Architecture and Performance Evaluation
1. Mohamed Ragab0
2. Yury Savateev1
3. Helen Oliver2
4. Thanassis Tiropanis3
5. Alex Poulovassilis4
6. Adriane Chapman5
7. Ruben Taelman6
8. George Roussos7
In Proceedings of the 24th International Conference on Web Engineering Data privacy and sovereignty are open challenges in today’s Web, which the Solid ecosystem aims to meet by providing personal online datastores (pods) where individuals can control access to their data. Solid allows developers to deploy applications with access to data stored in pods, subject to users’ permission. For the decentralised Web to succeed, the problem of search over pods with varying access permissions must be solved. The ESPRESSO framework takes the first step in exploring such a search architecture, enabling large-scale keyword search across Solid pods with varying access rights. This paper provides a comprehensive experimental evaluation of the performance and scalability of decentralised keyword search across pods on the current ESPRESSO prototype. The experiments specifically investigate how controllable experimental parameters influence search performance across a range of decentralised settings. This includes examining the impact of different text dataset sizes (0.5MB to 50MB per pod, divided into 1 to 10,000 files), different access control levels (10%, 25%, 50%, or 100% file access), and a range of configurations for Solid servers and pods (from 1 to 100 pods across 1 to 50 servers). The experimental results confirm the feasibility of deploying a decentralised search system to conduct keyword search at scale in a decentralised environment. 2024
More
Poster Observations on Bloom Filters for Traversal-Based Query Execution over Solid Pods
1. Jonni Hanski0
2. Ruben Taelman1
3. Ruben Verborgh2
In Proceedings of the 21st Extended Semantic Web Conference: Posters and Demos Traversal-based query execution enables the resolving of queries over Linked Data documents, using a follow-your-nose approach to locating query-relevant data by following series of links through documents. This traversal, however, incurs an unavoidable overhead in the form of data access costs. Through only following links known to be relevant for answering a given query, this overhead could be minimized. Prior work exists in the form of reachability conditions to determine the links to dereference, however this does not take into consideration the contents behind a given link. Within this work, we have explored the possibility of using Bloom filters to prune query-irrelevant links based on the triple patterns contained within a given query, when performing traversal-based query execution over Solid pods containing simulated social network data as an example use case. Our discoveries show that, with relatively uniform data across an entire benchmark dataset, this approach fails to effectively filter links, especially when the queries contain triple patterns with low selectivity. Thus, future work should consider the query plan beyond individual patterns, or the structure of the data beyond individual triples, to allow for more effective pruning of links. 2024
More
Workshop DFDP: A Declarative Form Description Pipeline for Decentralizing Web Forms
1. Ieben Smessaert0
2. Patrick Hochstenbach
3. Ben De Meester2
4. Ruben Taelman3
5. Ruben Verborgh4
In Proceedings of the 2nd International Workshop on Semantics in Dataspaces Forms are key to bidirectional communication on the Web: without them, end-users would be unable to place online orders or file support tickets. Organizations often need multiple, highly similar forms, which currently require multiple implementations. Moreover, the data is tightly coupled to the application, restricting the end-user from reusing it with other applications, or storing the data somewhere else. Organizations and end-users have a need for a technique to create forms that are more controllable, reusable, and decentralized. To address this problem, we introduce the Declarative Form Description Pipeline (DFDP) that meets these requirements. DFDP achieves controllability through end-users’ editable declarative form descriptions. Reusability for organizations is ensured through descriptions of the form fields and associated actions. Finally, by leveraging a decentralized environment like Solid, the application is decoupled from the storage, preserving end-user control over their data. In this paper, we introduce and explain how such a declarative form description can be created and used without assumptions about the viewing environment or data storage. We show how separate applications can interoperate and be interchanged by using a description that contains details for form rendering and data submission decisions using a form, policy, and rule ontology. Furthermore, we prove how this approach solves the shortcomings of traditional Web forms. Our proposed pipeline enables organizations to save time by building similar forms without starting from scratch. Similarly, end-users can save time by letting machines prefill the form with existing data. Additionally, DFDP empowers end-users to be in control of the application they use to manage their data in a data store. User study results provide insights to further improve usability by providing automatic suggestions based on field labels entered. 2024
More
Workshop Requirements and Challenges for Query Execution across Decentralized Environments
1. Ruben Taelman0
In Companion Proceedings of the ACM Web Conference 2024 Due to the economic and societal problems being caused by the Web’s growing centralization, there is an increasing interest in decentralizing data on the Web. This decentralization does however cause a number of technical challenges. If we want to give users in decentralized environments the same level of user experience as they are used to with centralized applications, we need solutions to these challenges. We discuss how query engines can act as layer between applications on the one hand, and decentralized environments on the other hand, Query engines therefore act as an abstraction layer that hides the complexities of decentralized data management for application developers. In this article, we outline the requirements for query engines over decentralized environments. Furthermore, we show how existing approaches meet these requirements, and which challenges remain. As such, this article offers a high-level overview of a roadmap in the query and decentralization research domains. 2024
More
Demo Demonstration of Link Traversal SPARQL Query Processing over the Decentralized Solid Environment
1. Ruben Taelman0
2. Ruben Verborgh1
In Proceedings of the 27th International Conference on Extending Database Technology (EDBT) To tackle economic and societal problems originating from the centralization of Knowledge Graphs on the Web, there has been an increasing interest towards decentralizing Knowledge Graphs across a large number of small authoritative sources. In order to effectively build user-facing applications, there is a need for efficient query engines that abstract the complexities around accessing such massively Decentralized Knowledge Graphs (DKGs). As such, we have designed and implemented novel Link Traversal Query Processing algorithms into the Comunica query engine framework that are capable of efficiently evaluating SPARQL queries across DKGs provided by the Solid decentralization initiative. In this article, we demonstrate this query engine through a Web-based interface over which SPARQL queries can be executed over simulated and real-world Solid environments. Our demonstration shows the benefits of a traversal-based approach towards querying DKGs, and uncovers opportunities for further optimizations in future work in terms of both query execution and discovery algorithms. 2024
More
Poster Personalized Medicine Through Personal Data Pods
1. Elias Crum0
2. Ruben Taelman1
3. Bart Buelens2
4. Gokhan Ertaylan3
5. Ruben Verborgh4
In Proceedings of the 15th International SWAT4HCLS Conference Medical care is in the process of becoming increasingly personalized through the use of patient genetic information. Concurrently, privacy concerns regarding collection and storage of sensitive personal genome sequence data have necessitated public debate and legal regulation. Here we identify two fundamental challenges associated with privacy and shareability of genomic data storage and propose the use of Solid pods to address these challenges. We establish that personal data pods using Solid specifications can enable decentralized storage, increased patient control over their data, and support of Linked Data formats, which when combined, could offer solutions to challenges currently restricting personalized medicine in practice. 2024
More

2023

Poster Towards Algebraic Mapping Operators for Knowledge Graph Construction
1. Sitt Min Oo0
2. Ben De Meester1
3. Ruben Taelman2
4. Pieter Colpaert3
In Proceedings of the 22nd International Semantic Web Conference: Posters and Demos Declarative knowledge graph construction has matured to the point where state of the art techniques are focusing on optimizing the mapping processes. However, these optimization techniques use the syntax of the mapping language without considering the impact of the semantics. As a result, it is difficult to compare different engines fairly due to the obscurity in their semantic differences. In this poster paper, we propose an initial set of algebraic mapping operators to define the operational semantics of mapping processes, and provide a first step towards a theoretical foundation for mapping languages. We translated a series of RML documents to algebraic mapping operators to show the feasibility of our approach. We believe that further pursuing these initial results will lead to greater interoperability of mapping engines and languages, intensify requirements analysis for the upcoming RML standardization work, and an improved developer experience for all current and future mapping engines. 2023
More
Workshop The Need for Better RDF Archiving Benchmarks
1. Olivier Pelgrin0
2. Ruben Taelman1
3. Luis Galárraga2
4. Katja Hose3
In Proceedings of the 9th Workshop on Managing the Evolution and Preservation of the Data Web The advancements and popularity of Semantic Web technologies in the last decades have led to an exponential adoption and availability of Web-accessible datasets. While most solutions consider such datasets to be static, they often evolve over time. Hence, efficient archiving solutions are needed to meet the users’ and maintainers’ needs. While some solutions to these challenges already exist, standardized benchmarks are needed to systematically test the different capabilities of existing solutions and identify their limitations. Unfortunately, the development of new benchmarks has not kept pace with the evolution of RDF archiving systems. In this paper, we therefore identify the current state of the art in RDF archiving benchmarks and discuss to what degree such benchmarks reflect the current needs of real-world use cases and their requirements. Through this empirical assessment, we highlight the need for the development of more advanced and comprehensive benchmarks that align with the evolving landscape of RDF archiving. 2023
More
Workshop How Does the Link Queue Evolve during Traversal-Based Query Processing?
In Proceedings of the 7th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs Link Traversal-based Query Processing (LTQP) is an integrated querying approach that allows the query engine to start with zero knowledge of the data to query and discover data sources on the fly. The query engine starts with some seed documents and dynamically discovers new data sources by dereferencing hyperlinks in previously ingested data. Given the dynamic nature of source discovery, query processing tends to be relatively slow. Optimization techniques exist, such as exploiting existing structural information, but they depend on a deep understanding of the link queue during LTQP. To this end, we investigate the evolution of the types of link sources in the link queue and introduce metrics that describe key link queue characteristics. This paper analyses the link queue to guide future work on LTQP query optimization approaches that exploit structural information within a Solid environment. We find that queries exhibit two different execution patterns, one where the link queue is primarily empty and the other where the link queue fills faster than the engine can process. Our results show that the link queue is not functioning optimally and that our current approach to link discovery is not sufficiently selective. 2023
More
Workshop In-Memory Dictionary-Based Indexing of Quoted RDF Triples
1. Ruben Taelman0
2. Ruben Verborgh1
In Proceedings of the 7th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs The upcoming RDF 1.2 recommendation is scheduled to introduce the concept of quoted triples, which allows statements to be made about other statements. Since quoted triples enable new forms of data access in SPARQL 1.2, in the form of quoted triple patterns, there is a need for new indexing strategies that can efficiently handle these data access patterns. As such, we explore and evaluate different in-memory indexing approaches for quoted triples. In this paper, we investigate four indexing approaches, and evaluate their performance over an artificial dataset with custom triple pattern queries. Our findings show that the so-called indexed quoted triples dictionary vastly outperforms other approaches in terms of query execution time at the cost of increased storage size and ingestion time. Our work shows that indexing quoted triples in a dictionary separate from non-quoted RDF terms achieves good performance, and can be implemented using well-known indexing techniques into existing systems. Therefore, we illustrate that the addition of quoted triples into the RDF stack can be achieved in a performant manner. 2023
More
Conference LDkit: Linked Data Object Graph Mapping toolkit for Web Applications
In Proceedings of the 22nd International Semantic Web Conference The adoption of Semantic Web and Linked Data technologies in web application development has been hindered by the complexity of numerous standards, such as RDF and SPARQL, as well as the challenges associated with querying data from distributed sources and a variety of interfaces. Understandably, web developers often prefer traditional solutions based on relational or document databases due to the higher level of data predictability and superior developer experience. To address these issues, we present LDkit, a novel Object Graph Mapping (OGM) framework for TypeScript designed to provide a model-based abstraction for RDF. LDkit facilitates the direct utilization of Linked Data in web applications, effectively working as the data access layer. It accomplishes this by querying and retrieving data, and transforming it into TypeScript primitives according to user defined data schemas, while ensuring end-to-end data type safety. This paper describes the design and implementation of LDkit, highlighting its ability to simplify the integration of Semantic Web technologies into web applications, while adhering to both general web standards and Linked Data specific standards. Furthermore, we discuss how LDkit framework has been designed to integrate seamlessly with popular web application technologies that developers are already familiar with. This approach promotes ease of adoption, allowing developers to harness the power of Linked Data without disrupting their current workflows. Through the provision of an efficient and intuitive toolkit, LDkit aims to enhance the web ecosystem by promoting the widespread adoption of Linked Data and Semantic Web technologies. 2023
More
Conference Link Traversal Query Processing over Decentralized Environments with Structural Assumptions
1. Ruben Taelman0
2. Ruben Verborgh1
In Proceedings of the 22nd International Semantic Web Conference To counter societal and economic problems caused by data silos on the Web, efforts such as Solid strive to reclaim private data by storing it in permissioned documents over a large number of personal vaults across the Web. Building applications on top of such a decentralized Knowledge Graph involves significant technical challenges: centralized aggregation prior to query processing is excluded for legal reasons, and current federated querying techniques cannot handle this large scale of distribution at the expected performance. We propose an extension to Link Traversal Query Processing (LTQP) that incorporates structural properties within decentralized environments to tackle their unprecedented scale. In this article, we analyze the structural properties of the Solid decentralization ecosystem that are relevant for query execution, we introduce novel LTQP algorithms leveraging these structural properties, and evaluate their effectiveness. Our experiments indicate that these new algorithms obtain accurate results in the order of seconds, which existing algorithms cannot achieve. This work reveals that a traversal-based querying method using structural assumptions can be effective for large-scale decentralization, but that advances are needed in the area of query planning for LTQP to handle more complex queries. These insights open the door to query-driven decentralized applications, in which declarative queries shield developers from the inherent complexity of a decentralized landscape. 2023
More
Poster Reinforcement Learning-based SPARQL Join Ordering Optimizer
1. Ruben Eschauzier0
2. Ruben Taelman1
3. Meike Morren
4. Ruben Verborgh3
In Proceedings of the 20th Extended Semantic Web Conference: Posters and Demos In recent years, relational databases successfully leverage reinforcement learning to optimize query plans. For graph databases and RDF quad stores, such research has been limited, so there is a need to understand the impact of reinforcement learning techniques. We explore a reinforcement learning-based join plan optimizer that we design specifically for optimizing join plans during SPARQL query planning. This paper presents key aspects of this method and highlights open research problems. We argue that while we can reuse aspects of relational database optimization, SPARQL query optimization presents unique challenges not encountered in relational databases. Nevertheless, initial benchmarks show promising results that warrant further exploration. 2023
More
Demo GLENDA: Querying RDF Archives with full SPARQL
1. Olivier Pelgrin0
2. Ruben Taelman1
3. Luis Galárraga2
4. Katja Hose3
In Proceedings of the 20th Extended Semantic Web Conference: Posters and Demos The dynamicity of semantic data has propelled the research on RDF Archiving, i.e., the task of storing and making the full history of large RDF datasets accessible. However, existing archiving techniques fail to scale when confronted with very large RDF datasets and support only simple SPARQL queries. In this demonstration, we therefore showcase GLENDA, a system that can run full SPARQL 1.1 compliant queries over large RDF archives. We achieve this through a multi-snapshot change- based storage architecture that we interface using the Comunica query engine. Thanks to this integration we demonstrate that fast SPARQL query processing over multiple versions of a knowledge graph is possible. Moreover, our demonstration provides different statistics about the history of RDF datasets that can be useful for tasks beyond querying and by providing insights about the evolution dynamics of the data. 2023
More
Workshop Distributed Social Benefit Allocation using Reasoning over Personal Data in Solid
1. Jonni Hanski0
2. Pieter Heyvaert1
3. Ben De Meester2
4. Ruben Taelman3
5. Ruben Verborgh4
In Proceedings of the 1st International Workshop on Data Management for Knowledge Graphs When interacting with government institutions, citizens may often be asked to provide a number of documents to various officials, due to the way the data is being processed by the government, and regulation or guidelines that restrict sharing of that data between institutions. Occasionally, documents from third parties, such as the private sector, are involved, as the data, rules, regulations and individual private data may be controlled by different parties. Facilitating efficient flow of information in such cases is therefore important, while still respecting the ownership and privacy of that data. Addressing these types of use cases in data storage and sharing, the Solid initiative allows individuals, organisations and the public sector to store their data in personal online datastores. Solid has been previously applied in data storage within government contexts, so we decided to extend that work by adding data processing services on top of such data and including multiple parties such as citizen and the private sector. However, introducing multiple parties within the data processing flow may impose new challenges, and implementing such data processing services in practice on top of Solid might present opportunities for improvement from the perspective of the implementer of the services. Within this work, together with the City of Antwerp in Belgium, we have produced a proof-of-concept service implementation operating at the described intersection of public sector, citizens and private sector, to manage social benefit allocation in a distributed environment. The service operates on distributed Linked Data stored in multiple Solid pods in RDF, using Notation3 rules to process that data and SPARQL queries to access and modify it. This way, our implementation seeks to respect the design principles of Solid, while taking advantage of the related technologies for representing, processing and modifying Linked Data. This document will describe our chosen use case, service design and implementation, and our observations resulting from this experiment. Through the proof-of-concept implementation, we have established a preliminary understanding of the current challenges in implementing such a service using the chosen technologies. We have identified topics such as verification of data that should be addressed when using such an approach in practice, assumptions related to data locations and tight coupling between our logic between the rules and program code. Addressing these topics in future work should help further the adoption of Linked Data as a means to solve challenges around data sharing, processing and ownership such as with government processes involving multiple parties. 2023
More
Journal Distributed Subweb Specifications for Traversing the Web
1. Bart Bogaerts0
2. Bas Ketsman1
3. Younes Zeboudj2
4. Heba Aamer3
5. Ruben Taelman4
6. Ruben Verborgh5
In Theory and Practice of Logic Programming Link Traversal–based Query Processing (LTQP), in which a SPARQL query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing LTQP approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves. 2023
More
Conference Scaling Large RDF Archives To Very Long Histories
1. Olivier Pelgrin0
2. Ruben Taelman1
3. Luis Galárraga2
4. Katja Hose3
In Proceedings of the 17th IEEE International Conference on Semantic Computing In recent years, research in RDF archiving has gained traction due to the ever-growing nature of semantic data and the emergence of community-maintained knowledge bases. Several solutions have been proposed to manage the history of large RDF graphs, including approaches based on independent copies, time-based indexes, and change-based schemes. In particular, aggregated changesets have been shown to be relatively efficient at handling very large datasets. However, ingestion time can still become prohibitive as the revision history increases. To tackle this challenge, we propose a hybrid storage approach based on aggregated changesets, snapshots, and multiple delta chains. We evaluate different snapshot creation strategies on the BEAR benchmark for RDF archives, and show that our techniques can speed up ingestion time up to two orders of magnitude while keeping competitive performance for version materialization and delta queries. This allows us to support revision histories of lengths that are beyond reach with existing approaches. 2023
More

2022

Workshop A Prospective Analysis of Security Vulnerabilities within Link Traversal-Based Query Processing
1. Ruben Taelman0
2. Ruben Verborgh1
In Proceedings of the 6th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs The societal and economic consequences surrounding Big Data-driven platforms have increased the call for decentralized solutions. However, retrieving and querying data in more decentralized environments requires fundamentally different approaches, whose properties are not yet well understood. Link-Traversal-based Query Processing (LTQP) is a technique for querying over decentralized data networks, in which a client-side query engine discovers data by traversing links between documents. Since decentralized environments are potentially unsafe due to their non-centrally controlled nature, there is a need for client-side LTQP query engines to be resistant against security threats aimed at the query engine’s host machine or the query initiator’s personal data. As such, we have performed an analysis of potential security vulnerabilities of LTQP. This article provides an overview of security threats in related domains, which are used as inspiration for the identification of 10 LTQP security threats. This list of security threats forms a basis for future work in which mitigations for each of these threats need to be developed and tested for their effectiveness. With this work, we start filling the unknowns for enabling query execution over decentralized environments. Aside from future work on security, wider research will be needed to uncover missing building blocks for enabling true data decentralization. 2022
More
Demo Solid Web Monetization
1. Merlijn Sebrechts0
2. Tom Goethals1
3. Thomas Dupont2
4. Wannes Kerckhove3
5. Ruben Taelman4
6. Filip De Turck5
7. Bruno Volckaert6
In International Conference on Web Engineering The Solid decentralization effort decouples data from services, so that users are in full control over their personal data. In this light, Web Monetization has been proposed as an alternative business model for web services that does not depend on data collection anymore. Integrating Web Monetization with Solid, however, remains difficult because of the heterogeneity of Interledger wallet implementations, lack of mechanisms for securely paying on behalf of a user, and an inherent issue of trusting content providers to handle payments. We propose the Web Monetization Provider as a solution to these challenges. The WMP acts as a third party, hiding the underlying complexity of transactions and acting as a source of trust in Web Monetization interactions. This demo shows a working end-to-end example including a website providing monetized content, a WMP, and a dashboard for configuring WMP into a Solid identity. 2022
More
Workshop A Policy-Oriented Architecture for Enforcing Consent in Solid
In Proceedings of the 2nd International Workshop on Consent Management in Online Services, Networks and Things The Solid project aims to restore end-users’ control over their data by decoupling services and applications from data storage. To realize data governance by the user, the Solid Protocol 0.9 relies on Web Access Control, which has limited expressivity and interpretability. In contrast, recent privacy and data protection regulations impose strict requirements on personal data processing applications and the scope of their operation. The Web Access Control mechanism lacks the granularity and contextual awareness needed to enforce these regulatory requirements. Therefore, we suggest a possible architecture for relating Solid’s low-level technical access control rules with higher-level concepts such as the legal basis and purpose for data processing, the abstract types of information being processed, and the data sharing preferences of the data subject. Our architecture combines recent technical efforts by the Solid community panels with prior proposals made by researchers on the use of ODRL and SPECIAL policies as an extension to Solid’s authorization mechanism. While our approach appears to avoid a number of pitfalls identified in previous research, further work is needed before it can be implemented and used in a practical setting. 2022
More
Journal Components.js: Semantic Dependency Injection
In Semantic Web Journal A common practice within object-oriented software is using composition to realize complex object behavior in a reusable way. Such compositions can be managed by Dependency Injection (DI), a popular technique in which components only depend on minimal interfaces and have their concrete dependencies passed into them. Instead of requiring program code, this separation enables describing the desired instantiations in declarative configuration files, such that objects can be wired together automatically at runtime. Configurations for existing DI frameworks typically only have local semantics, which limits their usage in other contexts. Yet some cases require configurations outside of their local scope, such as for the reproducibility of experiments, static program analysis, and semantic workflows. As such, there is a need for globally interoperable, addressable, and discoverable configurations, which can be achieved by leveraging Linked Data. We created Components.js as an open-source semantic DI framework for TypeScript and JavaScript applications, providing global semantics via Linked Data-based configuration files. In this article, we report on the Components.js framework by explaining its architecture and configuration, and discuss its impact by mentioning where and how applications use it. We show that Components.js is a stable framework that has seen significant uptake during the last couple of years. We recommend it for software projects that require high flexibility, configuration without code changes, sharing configurations with others, or applying these configurations in other contexts such as experimentation or static program analysis. We anticipate that Components.js will continue driving concrete research and development projects that require high degrees of customization to facilitate experimentation and testing, including the Comunica query engine and the Community Solid Server for decentralized data publication. 2022
More

2021

Conference Towards a personal data vault society: an interplay between technological and business perspectives
In 60th FITCE Communication Days Congress for ICT Professionals: Industrial Data–Cloud, Low Latency and Privacy (FITCE) The Web is evolving more and more to a small set of walled gardens where a very small number of platforms determine the way that people get access to the Web (i.e. "log in via Facebook"). As a pushback against this ongoing centralization of data and services on the Web, decentralization efforts are taking place that move data into personal data vaults. This positioning paper discusses a potential personal data vault society and the required research steps in order to get there. Emphasis is given on the needed interplay between technological and business research perspectives. The focus is on the situation of Flanders, based on a recent announcement of the Flemish government to set up a Data Utility Company. The concepts as well as the suggested path for adoption can easily be extended, however, to other situations/regions as well. 2021
More
Journal Optimizing Storage of RDF Archives using Bidirectional Delta Chains
In Semantic Web Journal Linked Open Datasets on the Web that are published as RDF can evolve over time. There is a need to be able to store such evolving RDF datasets, and query across their versions. Different storage strategies are available for managing such versioned datasets, each being efficient for specific types of versioned queries. In recent work, a hybrid storage strategy has been introduced that combines these different strategies to lead to more efficient query execution for all versioned query types at the cost of increased ingestion time. While this trade-off is beneficial in the context of Web querying, it suffers from exponential ingestion times in terms of the number of versions, which becomes problematic for RDF datasets with many versions. As such, there is a need for an improved storage strategy that scales better in terms of ingestion time for many versions. We have designed, implemented, and evaluated a change to the hybrid storage strategy where we make use of a _bidirectional delta chain_ instead of the default _unidirectional delta chain_. In this article, we introduce a concrete architecture for this change, together with accompanying ingestion and querying algorithms. Experimental results from our implementation show that the ingestion time is significantly reduced. As an additional benefit, this change also leads to lower total storage size and even improved query execution performance in some cases. This work shows that modifying the structure of delta chains within the hybrid storage strategy can be highly beneficial for RDF archives. In future work, other modifications to this delta chain structure deserve to be investigated, to further improve the scalability of ingestion and querying of datasets with many versions. 2021
More
Conference Link Traversal with Distributed Subweb Specifications
1. Bart Bogaerts0
2. Bas Ketsman1
3. Younes Zeboudj2
4. Heba Aamer3
5. Ruben Taelman4
6. Ruben Verborgh5
In Rules and Reasoning: 5th International Joint Conference, RuleML+RR 2021, Leuven, Belgium, September 8 – September 15, 2021, Proceedings Link Traversal–based Query Processing (LTQP), in which a SPARQL query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While LTQP allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing LTQP approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. 2021
More
Demo PROV4ITDaTa: Transparent and direct transfer of personal data to personal stores
1. Gertjan De Mulder0
2. Ben De Meester1
3. Pieter Heyvaert2
4. Ruben Taelman3
5. Ruben Verborgh4
6. Anastasia Dimou5
In WWW2021, the International World Wide Web Conference Data is scattered across service providers, heterogeneously structured in various formats. By lack of interoperability, data portability is hindered, and thus user control is inhibited. An interoperable dataportability solution for transferring personal data is needed. We demo PROV4ITDaTa: a Web application, that allows users to transfer personal data into an interoperable format to their personal datastore. PROV4ITDaTa leverages the open-source solutions RML.io, Comunica, and Solid: (i) the RML.io toolset to describe how to accessdata from service providers and generate interoperable datasets; (ii) Comunica to query these and more flexibly generate enricheddatasets; and (iii) Solid Pods to store the generated data as LinkedData in personal data stores. As opposed to other (hard-coded) so-lutions, PROV4ITDaTa is fully transparent, where each componentof the pipeline is fully configurable and automatically generatesdetailed provenance trails. Furthermore, transforming the personaldata into RDF allows for an interopable solution. By maximizing theuse of open-source tools and open standards, PROV4ITDaTa facilitates the shift towards a data ecosystem wherein users have controlof their data, and providers can focus on their service instead oftrying to adhere to interoperability requirements. 2021
More
Journal Semantic micro-contributions with decentralized nanopublication services
1. Tobias Kuhn0
2. Ruben Taelman1
3. Vincent Emonet2
4. Haris Antonatos3
5. Stian Soiland-Reyes4
6. Michel Dumontier5
In PeerJ Computer Science While the publication of Linked Data has become increasingly common, the process tends to be a relatively complicated and heavy-weight one. Linked Data is typically published by centralized entities in the form of larger dataset releases, which has the downside that there is a central bottleneck in the form of the organization or individual responsible for the releases. Moreover, certain kinds of data entries, in particular those with subjective or original content, currently do not fit into any existing dataset and are therefore more difficult to publish. To address these problems, we present here an approach to use nanopublications and a decentralized network of services to allow users to directly publish small Linked Data statements through a simple and user-friendly interface, called Nanobench, powered by semantic templates that are themselves published as nanopublications. The published nanopublications are cryptographically verifiable and can be queried through a redundant and decentralized network of services, based on the grlc API generator and a new quad extension of Triple Pattern Fragments. We show here that these two kinds of services are complementary and together allow us to query nanopublications in a reliable and efficient manner. We also show that Nanobench makes it indeed very easy for users to publish Linked Data statements, even for those who have no prior experience in Linked Data publishing. 2021
More

2020

Workshop Optimizing Approximate Membership Metadata in Triple Pattern Fragments for Clients and Servers
In Proceedings of the 13th International Workshop on Scalable Semantic Web Knowledge Base Systems Depending on the HTTP interface used for publishing Linked Data, the effort of evaluating a SPARQL query can be redistributed differently between clients and servers. For instance, lower server-side CPU usage can be realized at the expense of higher bandwidth consumption. Previous work has shown that complementing lightweight interfaces such as Triple Pattern Fragments (TPF) with additional metadata can positively impact the performance of clients and servers. Specifically, Approximate Membership Filters (AMFs)—data structures that are small and probabilistic—in the context of TPF were shown to reduce the number of HTTP requests, at the expense of increasing query execution times. In order to mitigate this significant drawback, we have investigated unexplored aspects of AMFs as metadata on TPF interfaces. In this article, we introduce and evaluate alternative approaches for server-side publication and client-side consumption of AMFs within TPF to achieve faster query execution, while maintaining low server-side effort. Our alternative client-side algorithm and the proposed server configurations significantly reduce both the number of HTTP requests and query execution time, with only a small increase in server load, thereby mitigating the major bottleneck of AMFs within TPF. Compared to regular TPF, average query execution is more than 2 times faster and requires only 10% of the number of HTTP requests, at the cost of at most a 10% increase in server load. These findings translate into a set of concrete guidelines for data publishers on how to configure AMF metadata on their servers. 2020
More
Workshop RDF Test Suite: Improving Developer Experience for Building Specification-Compliant Libraries
1. Ruben Taelman0
In Proceedings of the 1st Workshop on The Semantic Web in Practice: Tools and Pedagogy Guaranteeing compliance of software libraries to certain specifications is crucial for ensuring interoperability within the Semantic Web. While many specifications publish their own declarative test suite for testing compliance, the actual execution of these test suites can add a huge overhead for library developers. In order to remove this burden from these developers, we introduce a JavaScript tool called RDF Test Suite that takes care of all the required bookkeeping behind test suite execution. In this report, we discuss the design goals of RDF Test Suite, how it has been implemented, and how it can be used. In practice, this tool is being used in a variety of libraries, and it ensures their specification compliance with minimal overhead. 2020
More
Conference LDflex: a Read/Write Linked Data Abstraction for Front-End Web Developers
1. Ruben Verborgh0
2. Ruben Taelman1
In Proceedings of the 19th International Semantic Web Conference Many Web developers nowadays are trained to build applications with a user-facing browser front-end that obtains predictable data structures from a single, well-known back-end. Linked Data invalidates such assumptions, since data can combine several ontologies and span multiple servers with different APIs. Front-end developers, who specialize in creating end-user experiences rather than back-ends, need an abstraction layer to the Web of Data that matches their existing mindset and workflow. We have developed LDflex, a domain-specific language that exposes common Linked Data access patterns as concise JavaScript expressions. In this article, we describe the design and embedding of the language, and discuss its daily usage within two companies. LDflex succeeds in eliminating a dedicated data layer for common and straightforward data access patterns, without striving to be a replacement for more complex cases. Our experiences reveal that designing a Linked Data developer experience—analogous to a user experience—is crucial for adoption by the target group, who can create Linked Data applications for end users. Crucially, simple abstractions require research to hide the underlying complexity. 2020
More
Conference Facilitating the Analysis of COVID-19 Literature Through a Knowledge Graph
1. Bram Steenwinckel0
2. Gilles Vandewiele1
3. Ilja Rausch2
4. Pieter Heyvaert3
5. Ruben Taelman4
6. Pieter Colpaert5
7. Pieter Simoens6
8. Anastasia Dimou7
9. Filip De Turck8
10. Femke Ongenae9
In Proceedings of the 19th International Semantic Web Conference At the end of 2019, Chinese authorities alerted the World Health Organization (WHO) of the outbreak of a new strain of the coronavirus, called SARS-CoV-2, which struck humanity by an unprecedented disaster a few months later. In response to this pandemic, a publicly available dataset was released on Kaggle which contained information of over 63,000 papers. In order to facilitate the analysis of this large mass of literature, we have created a knowledge graph based on this dataset. Within this knowledge graph, all information of the original dataset is linked together, which makes it easier to search for relevant information. The knowledge graph is also enriched with additional links to appropriate, already existing external resources. In this paper, we elaborate on the different steps performed to construct such a knowledge graph from structured documents. Moreover, we discuss, on a conceptual level, several possible applications and analyses that can be built on top of this knowledge graph. As such, we aim to provide a resource that allows people to more easily build applications that give more insights into the COVID-19 pandemic. 2020
More
Workshop Towards Querying in Decentralized Environments with Privacy-Preserving Aggregation
In Proceedings of the 4th Workshop on Storing, Querying, and Benchmarking the Web of Data The Web is a ubiquitous economic, educational, and collaborative space. However, it also serves as a haven for personal information harvesting. Existing decentralised Web-based ecosystems, such as Solid, aim to combat personal data exploitation on the Web by enabling individuals to manage their data in the personal data store of their choice. Since personal data in these decentralised ecosystems are distributed across many sources, there is a need for techniques to support efficient privacy-preserving query execution over personal data stores. Towards this end, in this position paper we present a framework for efficient privacy preserving federated querying, and highlight open research challenges and opportunities. The overarching goal being to provide a means to position future research into privacy-preserving querying within decentralised environments. 2020
More
Conference Pattern-based Access Control in a Decentralised Collaboration Environment
1. Jeroen Werbrouck0
2. Ruben Taelman1
3. Ruben Verborgh2
4. Pieter Pauwels3
5. Jakob Beetz
6. Erik Mannens5
In Proceedings of LDAC2020 - 8th Linked Data in Architecture and Construction Workshop As the building industry is rapidly catching up with digital advancements, and Web technologies grow in both maturity and security , a data-and Web-based construction practice comes within reach. In such an environment, private project information and open online data can be combined to allow cross-domain interoperability at data level, using Semantic Web technologies. As construction projects often feature complex and temporary networks of stakeholder firms and their employees , a property-based access control mechanism is necessary to enable a flexible and automated management of distributed building projects. In this article, we propose a method to facilitate such mechanism using existing Web technologies: RDF, SHACL, WebIDs, nanopublications and the Linked Data Platform. The proposed method will be illustrated with an extension of a custom nodeJS Solid server. The potential of the Solid ecosystem has been put forward earlier as a basis for a Linked Data-based Common Data Environment: its decentralised setup, connection of both RDF and non-RDF resources and fine-grained access control mechanisms are considered an apt foundation to manage distributed building data. 2020
More
Preprint Guided Link-Traversal-Based Query Processing
1. Ruben Verborgh0
2. Ruben Taelman1
Link-Traversal-Based Query Processing (LTBQP) is a technique for evaluating queries over a web of data by starting with a set of seed documents that is dynamically expanded through following hyperlinks. Compared to query evaluation over a static set of sources, LTBQP is significantly slower because of the number of needed network requests. Furthermore, there are concerns regarding relevance and trustworthiness of results, given that sources are selected dynamically. To address both issues, we propose guided LTBQP, a technique in which information about document linking structure and content policies is passed to a query processor. Thereby, the processor can prune the search tree of documents by only following relevant links, and restrict the result set to desired results by limiting which documents are considered for what kinds of content. In this exploratory paper, we describe the technique at a high level and sketch some of its applications. We argue that such guidance can make LTBQP a valuable query strategy in decentralized environments, where data is spread across documents with varying levels of user trust. 2020
More
Poster Towards Cost-Model-Based Query Execution over Hybrid Linked Data Fragments Interfaces
1. Amr Azzam
2. Ruben Taelman1
3. Axel Polleres
In The Semantic Web: ESWC 2020 Satellite Events: ESWC 2020 Satellite Events, Heraklion, Crete, Greece, May 31–June 4, 2020, Revised Selected Papers 17 A multitude of Linked Data Fragments (LDF) server interfaces have been proposed to expose Knowledge Graphs (KGs) on the Web. Each interface leads to different trade-offs when clients execute queries over them, such as how query execution effort is distributed between server and client. There is however no single silver bullet that works best everywhere. Each of these interfaces has diverse characteristics that vary the performance based on server load, client re‐ sources, and network bandwidth. Currently, publishers can only pick one of these interfaces to expose their KGs on the Web. However, in some cases, multiple interfaces may be suitable for the publisher, and these may even vary over time based on the aforementioned factors. As such, we propose a hybrid LDF interface that can expose multiple interfaces based on a server-side cost model. Additionally, we sketch a negotiation protocol through which clients can determine desirable interfaces during query planning using a client-side cost model. In this paper, we lay out the high-level ideas behind this hybrid framework, and we explain our future steps regarding implementation and evaluation. As such, our work provides a basis for exploiting the trade-offs that exist between different LDF interfaces for optimally exposing KGs on the Web. 2020
More
PhD Thesis Storing and Querying Evolving Knowledge Graphs on the Web
1. Ruben Taelman0
In Storing and Querying Evolving Knowledge Graphs on the Web The Web has become our most valuable tool for sharing information. Currently, this Web is mainly targeted at humans, whereas machines typically have a hard time understanding information on the Web. Using knowledge graphs, this information can be linked in a structured way, so that intelligent agents can act upon this data autonomously. Current knowledge graphs remain however rather static. As there is a lot of value in acting upon evolving knowledge, there is a need for evolving knowledge graphs, and ways to manage them. As such, the goal of this PhD is to allow such evolving knowledge graphs to be stored and queried, taking into account the decentralized nature of the Web where anyone should be able to say anything about anything. Concretely, four challenges related to this goal are investigated: (1) generation of evolving data, (2) storage of evolving data, (3) querying over heterogeneous datasets, and (4) querying evolving data. For each of these challenges, techniques and algorithms have been developed, which prove to be valuable for storing and querying evolving knowledge graphs on the Web. This work therefore brings us closer towards a Web in which both human and machine can act upon evolving knowledge. 2020
More

2019

Conference Streamlining governmental processes by putting citizens in control of their personal data
1. Raf Buyle0
2. Ruben Taelman1
3. Katrien Mostaert2
4. Geroen Joris3
5. Erik Mannens4
6. Ruben Verborgh5
7. Tim Berners-Lee6
In Proceedings of the 6th International Conference on Electronic Governance and Open Society: Challenges in Eurasia Governments typically store large amounts of personal information on their citizens, such as a home address, marital status, and occupation, to offer public services. Because governments consist of various governmental agen- cies, multiple copies of this data often exist. This raises concerns regarding data consistency, privacy, and access control, especially under recent legal frame- works such as GDPR. To solve these problems, and to give citizens true control over their data, we explore an approach using the decentralised Solid ecosys- tem, which enables citizens to maintain their data in personal data pods. We have applied this approach to two high-impact use cases, where citizen infor- mation is stored in personal data pods, and both public and private organisations are selectively granted access. Our findings indicate that Solid allows reshaping the relationship between citizens, their personal data, and the applications they use in the public and private sector. We strongly believe that the insights from this Flemish Solid Pilot can speed up the process for public administrations and private organisations that want to put the users in control of their data. 2019
More
Demo Discovering Data Sources in a Distributed Network of Heritage Information
1. Miel Vander Sande0
2. Sjors de Valk
3. Enno Meijers
4. Ruben Taelman3
5. Herbert Van de Sompel
6. Ruben Verborgh5
In Proceedings of the Posters and Demo Track of the 15th International Conference on Semantic Systems The Netwerk Digitaal Erfgoed is a Dutch partnership that focuses on improving the visibility, usability and sustainability of digital collections in the cultural heritage sector. The vision is to improve the usability of the data by sur-mounting the borders between the separate collections of the cultural heritage institutions. Key concepts in this vision are the alignment of the data by using shared descriptions (e.g. thesauri), and the publication of the data as Linked Open Data. This demo paper describes a Proof of Concept to test this vision. It uses a register, where only summaries of datasets are stored, instead of all the data. Based on these summaries, a portal can query the register to find what datasources might be of interest, and then query the data directly from the relevant data sources. 2019
More
Conference Reflections on: Triple Storage for Random-Access Versioned Querying of RDF Archives
In Proceedings of the 18th International Semantic Web Conference In addition to their latest version, Linked Open Datasets on the Web can also contain useful information in or between previous versions. In order to exploit this information, we can maintain history in RDF archives. Existing approaches either require much storage space, or they do not meet sufficiently expressive querying demands. In this extended abstract, we discuss an RDF archive indexing technique that has a low storage overhead, and adds metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating versioned queries. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. Our storage technique reduces query evaluation time through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis. 2019
More
Demo Using an existing website as a queryable low-cost LOD publishing interface
In Proceedings of the 16th Extended Semantic Web Conference: Posters and Demos Maintaining an Open Dataset comes at an extra recurring cost when it is published in a dedicated Web interface. As there is not often a direct financial return from publishing a dataset publicly, these extra costs need to be minimized. Therefore we want to explore reusing existing infrastructure by enriching existing websites with Linked Data. In this demonstrator, we advised the data owner to annotate a digital heritage website with JSON-LD snippets, resulting in a dataset of more than three million triples that is now available and officially maintained. The website itself is paged, and thus hydra partial collection view controls were added in the snippets. We then extended the modular query engine Comunica to support following page controls and extracting data from HTML documents while querying. This way, a SPARQL or GraphQL query over multiple heterogeneous data sources can power automated data reuse. While the query performance on such an interface is visibly poor, it becomes easy to create composite data dumps. As a result of implementing these building blocks in Comunica, any paged collection and enriched HTML page now becomes queryable by the query engine. This enables heterogenous data interfaces to share functionality and become technically interoperable. 2019
More
Position Statement Bridges between GraphQL and RDF
In W3C Workshop on Web Standardization for Graph Data GraphQL offers a highly popular query languages for graphs, which is well known among Web developers. Each GraphQL data graph is scoped within an interface-specific schema, which makes it difficult to integrate data from multiple interfaces. The RDF graph model offers a solution to this integration problem. Different techniques can enable querying over RDF graphs using GraphQL queries. In this position statement, we provide a high-level overview of the existing techniques, and how they differ. We argue that each of these techniques have their merits, but standardization is essential to simplify the link between GraphQL and RDF. 2019
More

2018

Conference Comunica: a Modular SPARQL Query Engine for the Web
In Proceedings of the 17th International Semantic Web Conference Query evaluation over Linked Data sources has become a complex story, given the multitude of algorithms and techniques for single- and multi-source querying, as well as the heterogeneity of Web interfaces through which data is published online. Today’s query processors are insufficiently adaptable to test multiple query engine aspects in combination, such as evaluating the performance of a certain join algorithm over a federation of heterogeneous interfaces. The Semantic Web research community is in need of a flexible query engine that allows plugging in new components such as different algorithms, new or experimental SPARQL features, and support for new Web interfaces. We designed and developed a Web-friendly and modular meta query engine called Comunica that meets these specifications. In this article, we introduce this query engine and explain the architectural choices behind its design. We show how its modular nature makes it an ideal research platform for investigating new kinds of Linked Data interfaces and querying algorithms. Comunica facilitates the development, testing, and evaluation of new query processing capabilities, both in isolation and in combination with others. 2018
More
Demo GraphQL-LD: Linked Data Querying with GraphQL
In Proceedings of the 17th International Semantic Web Conference: Posters and Demos The Linked Open Data cloud has the potential of significantly enhancing and transforming end-user applications. For example, the use of URIs to identify things allows data joining between separate data sources. Most popular (Web) application frameworks, such as React and Angular have limited support for querying the Web of Linked Data, which leads to a high-entry barrier for Web application developers. Instead, these developers increasingly use the highly popular GraphQL query language for retrieving data from GraphQL APIs, because GraphQL is tightly integrated into these frameworks. In order to lower the barrier for developers towards Linked Data consumption, the Linked Open Data cloud needs to be queryable with GraphQL as well. In this article, we introduce a method for transforming GraphQL queries coupled with a JSON-LD context to SPARQL, and a method for converting SPARQL results to the GraphQL query-compatible response. We demonstrate this method by implementing it into the Comunica framework. This approach brings us one step closer towards widespread Linked Data consumption for application development. 2018
More
Demo Demonstration of Comunica, a Web framework for querying heterogeneous Linked Data interfaces
In Proceedings of the 17th International Semantic Web Conference: Posters and Demos Linked Data sources can appear in a variety of forms, going from SPARQL endpoints to Triple Pattern Fragments and data dumps. This heterogeneity among Linked Data sources creates an added layer of complexity when querying or combining results from those sources. To ease this problem, we created a modular engine, Comunica, that has modules for evaluating SPARQL queries and supports heterogeneous interfaces. Other modules for other query or source types can easily be added. In this paper we showcase a Web client that uses Comunica to evaluate federated SPARQL queries through automatic source type identification and interaction. 2018
More
Workshop The Fundamentals of Semantic Versioned Querying
In Proceedings of the 12th International Workshop on Scalable Semantic Web Knowledge Base Systems co-located with 17th International Semantic Web Conference The domain of RDF versioning concerns itself with the storage of different versions of Linked Datasets. The ability of querying over these versions is an active area of research, and allows for basic insights to be discovered, such as tracking the evolution of certain things in datasets. Querying can however only get you so far. In order to derive logical consequences from existing knowledge, we need to be able to reason over this data, such as ontology-based inferencing. In order to achieve this, we explore fundamental concepts on semantic querying of versioned datasets using ontological knowledge. In this work, we present these concepts as a semantic extension of the existing RDF versioning concepts that focus on syntactical versioning. We remain general and assume that versions do not necessarily follow a purely linear temporal relation. This work lays a foundation for reasoning over RDF versions from a querying perspective, using which RDF versioning storage, query and reasoning systems can be designed. 2018
More
Conference On the Semantics of TPF-QS towards Publishing and Querying RDF Streams at Web-scale
In Proceedings of the 14th International Conference on Semantic Systems RDF Stream Processing (RSP) is a rapidly evolving area of research that focuses on extensions of the Semantic Web in order to model and process Web data streams. While state-of-the-art approaches concentrate on server-side processing of RDF streams, we investigate the Triple Pattern Fragments Query Streamer (TPF-QS) method for server-side publishing of RDF streams, which moves the workload of continuous querying to clients. We formalize TPF-QS in terms of the RSP-QL reference model in order to formally compare it with existing RSP query languages. We experimentally validate that, compared to the state of the art, the server load of TPF-QS scales better with increasing numbers of concurrent clients in case of simple queries, at the cost of increased bandwidth consumption. This shows that TPF-QS is an important first step towards a viable solution for Web-scale publication and continuous processing of RDF streams. 2018
More
Journal Triple Storage for Random-Access Versioned Querying of RDF Archives
In Journal of Web Semantics When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis. 2018
More
Journal Generating Public Transport Data based on Population Distributions for RDF Benchmarking
1. Ruben Taelman0
2. Pieter Colpaert1
3. Erik Mannens2
4. Ruben Verborgh3
In Semantic Web Journal When benchmarking RDF data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing. For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility. Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic, raising questions about the generalizability of benchmark results to real-world scenarios. In order to benchmark geospatial and temporal RDF data management systems such as route planners with sufficient external validity and depth, we designed PoDiGG, a highly configurable generation algorithm for synthetic public transport datasets with realistic geospatial and temporal characteristics comparable to those of their real-world variants. The algorithm is inspired by real-world public transit network design and scheduling methodologies. This article discusses the design and implementation of PoDiGG and validates the properties of its generated datasets. Our findings show that the generator achieves a sufficient level of realism, based on the existing coherence metric and new metrics we introduce specifically for the public transport domain. Thereby, podigg provides a flexible foundation for benchmarking RDF data management systems with geospatial and temporal data. 2018
More
Challenge Versioned Querying with OSTRICH and Comunica in MOCHA 2018
In Proceedings of the 5th SemWebEval Challenge at ESWC 2018 In order to exploit the value of historical information in Linked Datasets, we need to be able to store and query different versions of such datasets efficiently. The 2018 edition of the Mighty Storage Challenge (MOCHA) is organized to discover the efficiency of such Linked Data stores and to detect their bottlenecks. One task in this challenge focuses on the storage and querying of versioned datasets, in which we participated by combining the OSTRICH triple store and the Comunica SPARQL engine. In this article, we briefly introduce our system for the versioning task of this challenge. We present the evaluation results that show that our system achieves fast query times for the supported queries, but not all queries are supported by Comunica at the time of writing. These results of this challenge will serve as a guideline for further improvements to our system. 2018
More
Conference Components.js: A Semantic Dependency Injection Framework
In Proceedings of the The Web Conference: Developers Track Components.js is a dependency injection framework for JavaScript applications that allows components to be instantiated and wired together declaratively using semantic configuration files. The advantage of these semantic configuration files is that software components can be uniquely and globally identified using URIs. As an example, this documentation has been made self-instantiatable using Components.js. This makes it possible to view the HTML-version of any page to the console, or serve it via HTTP on a local webserver. 2018
More
Demo OSTRICH: Versioned Random-Access Triple Store
In Proceedings of the 27th International Conference Companion on World Wide Web The Linked Open Data cloud is evergrowing and many datasets are frequently being updated. In order to fully exploit the potential of the information that is available in and over historical dataset versions, such as discovering evolution of taxonomies or diseases in biomedical datasets, we need to be able to store and query the different versions of Linked Datasets efficiently. In this demonstration, we introduce OSTRICH, which is an efficient triple store with supported for versioned query evaluation. We demonstrate the capabilities of OSTRICH using a Web-based graphical user interface in which a store can be opened or created. Using this interface, the user is able to query in, between, and over different versions, ingest new versions, and retrieve summarizing statistics. 2018
More
Workshop A Preliminary Open Data Publishing Strategy for Live Data in Flanders
In Proceedings of the 27th International Conference Companion on World Wide Web For smart decision making, user agents need live and historic access to open data from sensors installed in the public domain. In contrast to a closed environment, for Open Data and federated query processing algorithms, the data publisher cannot anticipate in advance on specific questions, nor can it deal with a bad cost-efficiency of the server interface when data consumers increase. When publishing observations from sensors, different fragmentation strategies can be thought of depending on how the historic data needs to be queried. Furthermore, both publish/subscribe and polling strategies exist to publish live updates. Each of these strategies come with their own trade-offs regarding cost-efficiency of the server-interface, user-perceived performance and cpu use. A polling strategy where multiple observations are published in a paged collection was tested in a proof of concept for parking spaces availability. In order to understand the different resource trade-offs presented by publish/subscribe and polling publication strategies, we devised an experiment on two machines, for a scalability test. The preliminary results were inconclusive and suggest more large scale tests are needed in order to see a trend. While the large-scale tests will be performed in future work, the proof of concept helped to identify the technical Open Data principles for the 13 biggest cities in Flanders. 2018
More

2017

Conference Declaratively Describing Responses of Hypermedia-Driven Web APIs
1. Ruben Taelman0
2. Ruben Verborgh1
In Proceedings of the 9th International Conference on Knowledge Capture While humans browse the Web by following links, these hypermedia links can also be used by machines for browsing. While efforts such as Hydra semantically describe the hypermedia controls on Web interfaces to enable smarter interface-agnostic clients, they are largely limited to the input parameters to interfaces, and clients therefore do not know what response to expect from these interfaces. In order to convey such expectations, interfaces need to declaratively describe the response structure of their parameterized hypermedia controls. We therefore explored techniques to represent this parameterized response structure in a generic but expressive way. In this work, we discuss four different approaches for declaring a response structure, and we compare them based on a model that we introduce. Based on this model, we conclude that a SHACL shape-based approach can be used for declaring such a parameterized response structure, as it conforms to the REST architectural style that has helped shape the Web into its current form. 2017
More
Workshop Describing configurations of software experiments as Linked Data
In Proceedings of the First Workshop on Enabling Open Semantic Science (SemSci) Within computer science engineering, research articles often rely on software experiments in order to evaluate contributions. Reproducing such experiments involves setting up software, benchmarks, and test data. Unfortunately, many articles ambiguously refer to software by name only, leaving out crucial details for reproducibility, such as module and dependency version numbers or the configuration of individual components in different setups. To address this, we created the Object-Oriented Components ontology for the semantic description of software components and their configuration. This article discusses the ontology and its application, and demonstrates with a use case how to publish experiments and their software configurations on the Web. In order to enable semantic interlinking between configurations and modules, we published the metadata of all 500,000+ JavaScript libraries on npm as 200,000,000+ RDF triples. Through our work, research articles can refer by URL to fine-grained descriptions of experimental setups. This brings us faster to accurate reproductions of experiments, and facilitates the evaluation of new research contributions with different software configurations. In the future, software could be instantiated automatically based on these descriptions and configurations, reasoning and querying can be applied to software configurations for meta-research purposes. 2017
More
Demo Live Storage and Querying of Versioned Datasets on the Web
1. Ruben Taelman0
2. Miel Vander Sande1
3. Ruben Verborgh2
4. Erik Mannens3
In Proceedings of the 14th Extended Semantic Web Conference: Posters and Demos Linked Datasets often evolve over time for a variety of reasons. While typical scenarios rely on the latest version only, useful knowledge may still be contained within or between older versions, such as the historical information of biomedical patient data. In order to make this historical information cost- efficiently available on the Web, a low-cost interface is required for providing access to versioned datasets. For our demonstration, we set up a live Triple Pattern Fragments interface for a versioned dataset with queryable access. We explain different version query types of this interface, and how it communicates with a storage solution that can handle these queries efficiently. 2017
More
Workshop Versioned Triple Pattern Fragments: A Low-cost Linked Data Interface Feature for Web Archives
1. Ruben Taelman0
2. Miel Vander Sande1
3. Ruben Verborgh2
4. Erik Mannens3
In Proceedings of the 3rd Workshop on Managing the Evolution and Preservation of the Data Web Linked Datasets typically evolve over time because triples can be removed from or added to datasets, which results in different dataset versions. While most attention is typically given to the latest dataset version, a lot of useful information is still present in previous versions and its historical evolution. In order to make this historical information queryable at Web scale, a low-cost interface is required that provides access to different dataset versions. In this paper, we add a versioning feature to the existing Triple Pattern Fragments interface for queries at, between and for versions, with an accompanying vocabulary for describing the results, metadata and hypermedia controls. This interface feature is an important step into the direction of making versioned datasets queryable on the Web, with a low publication cost and effort. 2017
More
Poster PoDiGG: A Public Transport RDF Dataset Generator
1. Ruben Taelman0
2. Ruben Verborgh1
3. Tom De Nies
4. Erik Mannens3
In Proceedings of the 26th International Conference Companion on World Wide Web A large amount of public transport data is made available by many different providers, which makes RDF a great method for integrating these datasets. Furthermore, this type of data provides a great source of information that combines both geospatial and temporal data. These aspects are currently undertested in RDF data management systems, because of the limited availability of realistic input datasets. In order to bring public transport data to the world of benchmarking, we need to be able to create synthetic variants of this data. In this paper, we introduce a dataset generator with the capability to create realistic public transport data. This dataset generator, and the ability to configure it on different levels, makes it easier to use public transport data for benchmarking with great flexibility. 2017
More
Tutorial Modeling, Generating, and Publishing Knowledge as Linked Data
In Knowledge Engineering and Knowledge Management: EKAW 2016 Satellite Events, EKM and Drift-an-LOD, Bologna, Italy, November 19–23, 2016, Revised Selected Papers The process of extracting, structuring, and organizing knowledge from one or multiple data sources and preparing it for the Semantic Web requires a dedicated class of systems. They enable processing large and originally heterogeneous data sources and capturing new knowledge. Offering existing data as Linked Data increases its shareability, extensibility, and reusability. However, using Linking Data as a means to represent knowledge can be easier said than done. In this tutorial, we elaborate on the importance of semantically annotating data and how existing technologies facilitate their mapping to Linked Data. We introduce [R2]RML languages to generate Linked Data derived from different heterogeneous data formats –e.g., DBs, XML, or JSON– and from different interfaces –e.g., files or Web apis. Those who are not Semantic Web experts can annotate their data with the RMLEditor, whose user interface hides all underlying Semantic Web technologies to data owners. Last, we show how to easily publish Linked Data on the Web as Triple Pattern Fragments. As a result, participants, independently of their knowledge background, can model, annotate and publish data on their own. 2017
More

2016

Poster Exposing RDF Archives using Triple Pattern Fragments
1. Ruben Taelman0
2. Ruben Verborgh1
3. Erik Mannens2
In Proceedings of the 20th International Conference on Knowledge Engineering and Knowledge Management: Posters and Demos Linked Datasets typically change over time, and knowledge of this historical information can be useful. This makes the storage and querying of Dynamic Linked Open Data an important area of research. With the current versioning solutions, publishing Dynamic Linked Open Data at Web-Scale is possible, but too expensive. We investigate the possibility of using the low-cost Triple Pattern Fragments (TPF) interface to publish versioned Linked Open Data. In this paper, we discuss requirements for supporting versioning in the TPF framework, on the level of the interface, storage and client, and investigate which trade-offs exist. These requirements lay the foundations for further research in the area of low-cost, Web-Scale dynamic Linked Open Data publication and querying. 2016
More
Demo Querying Dynamic Datasources with Continuously Mapped Sensor Data
1. Ruben Taelman0
2. Pieter Heyvaert1
3. Ruben Verborgh2
4. Erik Mannens3
In Proceedings of the 15th International Semantic Web Conference: Posters and Demos The world contains a large amount of sensors that produce new data at a high frequency. It is currently very hard to find public services that expose these measurements as dynamic Linked Data. We investigate how sensor data can be published continuously on the Web at a low cost. This paper describes how the publication of various sensor data sources can be done by continuously mapping raw sensor data to RDF and inserting it into a live, low-cost server. This makes it possible for clients to continuously evaluate dynamic queries using public sensor data. For our demonstration, we will illustrate how this pipeline works for the publication of temperature and humidity data originating from a microcontroller, and how it can be queried. 2016
More
Demo Linked Sensor Data Generation using Queryable RML Mappings
1. Pieter Heyvaert0
2. Ruben Taelman1
3. Ruben Verborgh2
4. Erik Mannens3
In Proceedings of the 15th International Semantic Web Conference: Posters and Demos As the amount of generated sensor data is increasing, semantic interoperability becomes an important aspect in order to support efficient data distribution and communication. Therefore, the integration of (sensor) data is important, as this data is coming from different data sources and might be in different formats. Furthermore, reusable and extensible methods for this integration are required in order to be able to scale with the growing number of applications that generate semantic sensor data. Current research efforts allow to map sensor data to Linked Data in order to provide semantic interoperability. However, they lack support for multiple data sources, hampering the integration. Furthermore, the used methods are not available for reuse or are not extensible, which hampers the development of applications. In this paper, we describe how the RDF Mapping Language (RML) and a Triple Pattern Fragments (TPF) server are used to address these shortcomings. The demonstration consists of a micro controller that generates sensor data. The data is captured and mapped to rdf triples using module-specific RML mappings, which are queried from a TPF server. 2016
More
Workshop Multidimensional Interfaces for Selecting Data within Ordinal Ranges
1. Ruben Taelman0
2. Pieter Colpaert1
3. Ruben Verborgh2
4. Pieter Colpaert3
5. Erik Mannens4
In Proceedings of the 7th International Workshop on Consuming Linked Data Linked Data interfaces exist in many flavours, as evidenced by subject pages, sparql endpoints, triple pattern interfaces, and data dumps. These interfaces are mostly used to retrieve parts of a complete dataset, such parts can for example be defined by ranges in one or more dimensions. Filtering Linked Data by dimensions such as time range, geospatial area, or genomic location, requires the lookup of data within ordinal ranges. To make retrieval by such ranges generic and cost-efficient, we propose a REST solution in-between looking up data within ordinal ranges entirely on the server, or entirely on the client. To this end, we introduce a method for extending any Linked Data interface with an n-dimensional interface-level index such that n-dimensional ordinal data can be selected using n-dimensional ranges. We formally define Range Gates and Range Fragments and theoretically evaluate the cost-efficiency of hosting such an interface. By adding a multidimensional index to a Linked Data interface for multidimensional ordinal data, we found that we can get benefits from both worlds: the expressivity of the server raises, yet remains more cost-efficient than an interface providing the full functionality on the server-side. Furthermore, the client now shares in the effort to filter the data. This makes query processing becomes more flexible to the end-user, because the query plan can be altered by the engine. In future work we hope to apply Range Gates and Range Fragments to real-world interfaces to give quicker access to data within ordinal ranges 2016
More
Conference Continuous Client-Side Query Evaluation over Dynamic Linked Data
1. Ruben Taelman0
2. Ruben Verborgh1
3. Pieter Colpaert2
4. Erik Mannens3
In The Semantic Web: ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 – June 2, 2016, Revised Selected Papers Existing solutions to query dynamic Linked Data sources extend the sparql language, and require continuous server processing for each query. Traditional sparql endpoints already accept highly expressive queries, so extending these endpoints for time-sensitive queries increases the server cost even further. To make continuous querying over dynamic Linked Data more affordable, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries. In this paper, we introduce the TPF Query Streamer that allows clients to evaluate sparql queries with continuously updating results. Our experiments indicate that this extension significantly lowers the server complexity, at the expense of an increase in the execution time per query. We prove that by moving the complexity of continuously evaluating queries over dynamic Linked Data to the clients and thus increasing bandwidth usage, the cost at the server side is significantly reduced. Our results show that this solution makes real-time querying more scalable for a large amount of concurrent clients when compared to the alternatives. 2016
More
Poster Moving Real-Time Linked Data Query Evaluation to the Client
1. Ruben Taelman0
2. Ruben Verborgh1
3. Pieter Colpaert2
4. Erik Mannens3
5. Rik Van de Walle4
In Proceedings of the 13th Extended Semantic Web Conference: Posters and Demos Traditional RDF stream processing engines work completely server-side, which contributes to a high server cost. For allowing a large number of concurrent clients to do continuous querying, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries. In this poster, we give the overview of a client-side RDF stream processing engine on top of TPF. Our experiments show that our solution significantly lowers the server load while increasing the load on the clients. Preliminary results indicate that our solution moves the complexity of continuously evaluating real-time queries from the server to the client, which makes real-time querying much more scalable for a large amount of concurrent clients when compared to the alternatives. 2016
More
PhD Symposium Continuously Self-Updating Query Results over Dynamic Heterogeneous Linked Data
1. Ruben Taelman0
In The Semantic Web. Latest Advances and New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 – June 2, 2016, Proceedings Our society is evolving towards massive data consumption from heterogeneous sources, which includes rapidly changing data like public transit delay information. Many applications that depend on dynamic data consumption require highly available server interfaces. Existing interfaces involve substantial costs to publish rapidly changing data with high availability, and are therefore only possible for organisations that can afford such an expensive infrastructure. In my doctoral research, I investigate how to publish and consume real-time and historical Linked Data on a large scale. To reduce server-side costs for making dynamic data publication affordable, I will examine different possibilities to divide query evaluation between servers and clients. This paper discusses the methods I aim to follow together with preliminary results and the steps required to use this solution. An initial prototype achieves significantly lower server processing cost per query, while maintaining reasonable query execution times and client costs. Given these promising results, I feel confident this research direction is a viable solution for offering low-cost dynamic Linked Data interfaces as opposed to the existing high-cost solutions. 2016
More
Workshop Continuously Updating Query Results over Real-Time Linked Data
1. Ruben Taelman0
2. Ruben Verborgh1
3. Pieter Colpaert2
4. Erik Mannens3
5. Rik Van de Walle4
In Proceedings of the 2nd Workshop on Managing the Evolution and Preservation of the Data Web Existing solutions to query dynamic Linked Data sources extend the SPARQL language, and require continuous server processing for each query. Traditional SPARQL endpoints accept highly expressive queries, contributing to high server cost. Extending these endpoints for time-sensitive queries increases the server cost even further. To make continuous querying over real-time Linked Data more affordable, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries. In this paper, we discuss a framework on top of TPF that allows clients to execute SPARQL queries with continuously updating results. Our experiments indicate that this extension significantly lowers the server complexity. The trade-off is an increase in the execution time per query. We prove that by moving the complexity of continuously evaluating real-time queries over Linked Data to the clients and thus increasing the bandwidth usage, the cost of server-side interfaces is significantly reduced. Our results show that this solution makes real-time querying more scalable in terms of cpu usage for a large amount of concurrent clients when compared to the alternatives. 2016
More

2015

Master’s Thesis Continuously Updating Queries over Real-Time Linked Data
1. Ruben Taelman0
This dissertation investigates the possibilities of having continuously updating queries over Linked Data with a focus on server availability. This work builds upon the ideas of Linked Data Fragments to let the clients do most of the work when executing a query. The server adds metadata to make the clients aware of the data volatility for making sure the query results are always up-to-date. The implementation of the framework that is proposed, is eventually tested and compared to other alternative approaches. 2015
More