As a researcher, I write publications about my work, which are listed on this page. If you are interested in any of these, feel free to contact me so we can discuss this further.

2024

  1. Demo Demonstration of Link Traversal SPARQL Query Processing over the Decentralized Solid Environment
    1. 0
    2. 1
    In Proceedings of the 27th International Conference on Extending Database Technology (EDBT) To tackle economic and societal problems originating from the centralization of Knowledge Graphs on the Web, there has been an increasing interest towards decentralizing Knowledge Graphs across a large number of small authoritative sources. In order to effectively build user-facing applications, there is a need for efficient query engines that abstract the complexities around accessing such massively Decentralized Knowledge Graphs (DKGs). As such, we have designed and implemented novel Link Traversal Query Processing algorithms into the Comunica query engine framework that are capable of efficiently evaluating SPARQL queries across DKGs provided by the Solid decentralization initiative. In this article, we demonstrate this query engine through a Web-based interface over which SPARQL queries can be executed over simulated and real-world Solid environments. Our demonstration shows the benefits of a traversal-based approach towards querying DKGs, and uncovers opportunities for further optimizations in future work in terms of both query execution and discovery algorithms. 2024
    More
  2. Poster Personalized Medicine Through Personal Data Pods
    1. 1
    2. 4
    In Proceedings of the 15th International SWAT4HCLS Conference Medical care is in the process of becoming increasingly personalized through the use of patient genetic information. Concurrently, privacy concerns regarding collection and storage of sensitive personal genome sequence data have necessitated public debate and legal regulation. Here we identify two fundamental challenges associated with privacy and shareability of genomic data storage and propose the use of Solid pods to address these challenges. We establish that personal data pods using Solid specifications can enable decentralized storage, increased patient control over their data, and support of Linked Data formats, which when combined, could offer solutions to challenges currently restricting personalized medicine in practice. 2024
    More

2023

  1. Poster Towards Algebraic Mapping Operators for Knowledge Graph Construction
    1. 1
    2. 2
    3. 3
    In Proceedings of the 22nd International Semantic Web Conference: Posters and Demos Declarative knowledge graph construction has matured to the point where state of the art techniques are focusing on optimizing the mapping processes. However, these optimization techniques use the syntax of the mapping language without considering the impact of the semantics. As a result, it is difficult to compare different engines fairly due to the obscurity in their semantic differences. In this poster paper, we propose an initial set of algebraic mapping operators to define the operational semantics of mapping processes, and provide a first step towards a theoretical foundation for mapping languages. We translated a series of RML documents to algebraic mapping operators to show the feasibility of our approach. We believe that further pursuing these initial results will lead to greater interoperability of mapping engines and languages, intensify requirements analysis for the upcoming RML standardization work, and an improved developer experience for all current and future mapping engines. 2023
    More
  2. Workshop The Need for Better RDF Archiving Benchmarks
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 9th Workshop on Managing the Evolution and Preservation of the Data Web The advancements and popularity of Semantic Web technologies in the last decades have led to an exponential adoption and availability of Web-accessible datasets. While most solutions consider such datasets to be static, they often evolve over time. Hence, efficient archiving solutions are needed to meet the users’ and maintainers’ needs. While some solutions to these challenges already exist, standardized benchmarks are needed to systematically test the different capabilities of existing solutions and identify their limitations. Unfortunately, the development of new benchmarks has not kept pace with the evolution of RDF archiving systems. In this paper, we therefore identify the current state of the art in RDF archiving benchmarks and discuss to what degree such benchmarks reflect the current needs of real-world use cases and their requirements. Through this empirical assessment, we highlight the need for the development of more advanced and comprehensive benchmarks that align with the evolving landscape of RDF archiving. 2023
    More
  3. Workshop How Does the Link Queue Evolve during Traversal-Based Query Processing?
    1. 0
    2. 1
    3. 2
    In Proceedings of the 7th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs Link Traversal-based Query Processing (LTQP) is an integrated querying approach that allows the query engine to start with zero knowledge of the data to query and discover data sources on the fly. The query engine starts with some seed documents and dynamically discovers new data sources by dereferencing hyperlinks in previously ingested data. Given the dynamic nature of source discovery, query processing tends to be relatively slow. Optimization techniques exist, such as exploiting existing structural information, but they depend on a deep understanding of the link queue during LTQP. To this end, we investigate the evolution of the types of link sources in the link queue and introduce metrics that describe key link queue characteristics. This paper analyses the link queue to guide future work on LTQP query optimization approaches that exploit structural information within a Solid environment. We find that queries exhibit two different execution patterns, one where the link queue is primarily empty and the other where the link queue fills faster than the engine can process. Our results show that the link queue is not functioning optimally and that our current approach to link discovery is not sufficiently selective. 2023
    More
  4. Workshop In-Memory Dictionary-Based Indexing of Quoted RDF Triples
    1. 0
    2. 1
    In Proceedings of the 7th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs The upcoming RDF 1.2 recommendation is scheduled to introduce the concept of quoted triples, which allows statements to be made about other statements. Since quoted triples enable new forms of data access in SPARQL 1.2, in the form of quoted triple patterns, there is a need for new indexing strategies that can efficiently handle these data access patterns. As such, we explore and evaluate different in-memory indexing approaches for quoted triples. In this paper, we investigate four indexing approaches, and evaluate their performance over an artificial dataset with custom triple pattern queries. Our findings show that the so-called indexed quoted triples dictionary vastly outperforms other approaches in terms of query execution time at the cost of increased storage size and ingestion time. Our work shows that indexing quoted triples in a dictionary separate from non-quoted RDF terms achieves good performance, and can be implemented using well-known indexing techniques into existing systems. Therefore, we illustrate that the addition of quoted triples into the RDF stack can be achieved in a performant manner. 2023
    More
  5. Conference LDkit: Linked Data Object Graph Mapping toolkit for Web Applications
    1. 0
    2. 1
    3. 2
    In Proceedings of the 22nd International Semantic Web Conference The adoption of Semantic Web and Linked Data technologies in web application development has been hindered by the complexity of numerous standards, such as RDF and SPARQL, as well as the challenges associated with querying data from distributed sources and a variety of interfaces. Understandably, web developers often prefer traditional solutions based on relational or document databases due to the higher level of data predictability and superior developer experience. To address these issues, we present LDkit, a novel Object Graph Mapping (OGM) framework for TypeScript designed to provide a model-based abstraction for RDF. LDkit facilitates the direct utilization of Linked Data in web applications, effectively working as the data access layer. It accomplishes this by querying and retrieving data, and transforming it into TypeScript primitives according to user defined data schemas, while ensuring end-to-end data type safety. This paper describes the design and implementation of LDkit, highlighting its ability to simplify the integration of Semantic Web technologies into web applications, while adhering to both general web standards and Linked Data specific standards. Furthermore, we discuss how LDkit framework has been designed to integrate seamlessly with popular web application technologies that developers are already familiar with. This approach promotes ease of adoption, allowing developers to harness the power of Linked Data without disrupting their current workflows. Through the provision of an efficient and intuitive toolkit, LDkit aims to enhance the web ecosystem by promoting the widespread adoption of Linked Data and Semantic Web technologies. 2023
    More
  6. Conference Link Traversal Query Processing over Decentralized Environments with Structural Assumptions
    1. 0
    2. 1
    In Proceedings of the 22nd International Semantic Web Conference To counter societal and economic problems caused by data silos on the Web, efforts such as Solid strive to reclaim private data by storing it in permissioned documents over a large number of personal vaults across the Web. Building applications on top of such a decentralized Knowledge Graph involves significant technical challenges: centralized aggregation prior to query processing is excluded for legal reasons, and current federated querying techniques cannot handle this large scale of distribution at the expected performance. We propose an extension to Link Traversal Query Processing (LTQP) that incorporates structural properties within decentralized environments to tackle their unprecedented scale. In this article, we analyze the structural properties of the Solid decentralization ecosystem that are relevant for query execution, we introduce novel LTQP algorithms leveraging these structural properties, and evaluate their effectiveness. Our experiments indicate that these new algorithms obtain accurate results in the order of seconds, which existing algorithms cannot achieve. This work reveals that a traversal-based querying method using structural assumptions can be effective for large-scale decentralization, but that advances are needed in the area of query planning for LTQP to handle more complex queries. These insights open the door to query-driven decentralized applications, in which declarative queries shield developers from the inherent complexity of a decentralized landscape. 2023
    More
  7. Poster Reinforcement Learning-based SPARQL Join Ordering Optimizer
    1. 0
    2. 1
    3. 3
    In Proceedings of the 20th Extended Semantic Web Conference: Posters and Demos In recent years, relational databases successfully leverage reinforcement learning to optimize query plans. For graph databases and RDF quad stores, such research has been limited, so there is a need to understand the impact of reinforcement learning techniques. We explore a reinforcement learning-based join plan optimizer that we design specifically for optimizing join plans during SPARQL query planning. This paper presents key aspects of this method and highlights open research problems. We argue that while we can reuse aspects of relational database optimization, SPARQL query optimization presents unique challenges not encountered in relational databases. Nevertheless, initial benchmarks show promising results that warrant further exploration. 2023
    More
  8. Demo GLENDA: Querying RDF Archives with full SPARQL
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 20th Extended Semantic Web Conference: Posters and Demos The dynamicity of semantic data has propelled the research on RDF Archiving, i.e., the task of storing and making the full history of large RDF datasets accessible. However, existing archiving techniques fail to scale when confronted with very large RDF datasets and support only simple SPARQL queries. In this demonstration, we therefore showcase GLENDA, a system that can run full SPARQL 1.1 compliant queries over large RDF archives. We achieve this through a multi-snapshot change- based storage architecture that we interface using the Comunica query engine. Thanks to this integration we demonstrate that fast SPARQL query processing over multiple versions of a knowledge graph is possible. Moreover, our demonstration provides different statistics about the history of RDF datasets that can be useful for tasks beyond querying and by providing insights about the evolution dynamics of the data. 2023
    More
  9. Workshop Distributed Social Benefit Allocation using Reasoning over Personal Data in Solid
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    In Proceedings of the 1st International Workshop on Data Management for Knowledge Graphs When interacting with government institutions, citizens may often be asked to provide a number of documents to various officials, due to the way the data is being processed by the government, and regulation or guidelines that restrict sharing of that data between institutions. Occasionally, documents from third parties, such as the private sector, are involved, as the data, rules, regulations and individual private data may be controlled by different parties. Facilitating efficient flow of information in such cases is therefore important, while still respecting the ownership and privacy of that data. Addressing these types of use cases in data storage and sharing, the Solid initiative allows individuals, organisations and the public sector to store their data in personal online datastores. Solid has been previously applied in data storage within government contexts, so we decided to extend that work by adding data processing services on top of such data and including multiple parties such as citizen and the private sector. However, introducing multiple parties within the data processing flow may impose new challenges, and implementing such data processing services in practice on top of Solid might present opportunities for improvement from the perspective of the implementer of the services. Within this work, together with the City of Antwerp in Belgium, we have produced a proof-of-concept service implementation operating at the described intersection of public sector, citizens and private sector, to manage social benefit allocation in a distributed environment. The service operates on distributed Linked Data stored in multiple Solid pods in RDF, using Notation3 rules to process that data and SPARQL queries to access and modify it. This way, our implementation seeks to respect the design principles of Solid, while taking advantage of the related technologies for representing, processing and modifying Linked Data. This document will describe our chosen use case, service design and implementation, and our observations resulting from this experiment. Through the proof-of-concept implementation, we have established a preliminary understanding of the current challenges in implementing such a service using the chosen technologies. We have identified topics such as verification of data that should be addressed when using such an approach in practice, assumptions related to data locations and tight coupling between our logic between the rules and program code. Addressing these topics in future work should help further the adoption of Linked Data as a means to solve challenges around data sharing, processing and ownership such as with government processes involving multiple parties. 2023
    More
  10. Journal Distributed Subweb Specifications for Traversing the Web
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    6. 5
    In Theory and Practice of Logic Programming Link Traversal–based Query Processing (LTQP), in which a SPARQL query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing LTQP approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves. 2023
    More
  11. Conference Scaling Large RDF Archives To Very Long Histories
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 17th IEEE International Conference on Semantic Computing In recent years, research in RDF archiving has gained traction due to the ever-growing nature of semantic data and the emergence of community-maintained knowledge bases. Several solutions have been proposed to manage the history of large RDF graphs, including approaches based on independent copies, time-based indexes, and change-based schemes. In particular, aggregated changesets have been shown to be relatively efficient at handling very large datasets. However, ingestion time can still become prohibitive as the revision history increases. To tackle this challenge, we propose a hybrid storage approach based on aggregated changesets, snapshots, and multiple delta chains. We evaluate different snapshot creation strategies on the BEAR benchmark for RDF archives, and show that our techniques can speed up ingestion time up to two orders of magnitude while keeping competitive performance for version materialization and delta queries. This allows us to support revision histories of lengths that are beyond reach with existing approaches. 2023
    More

2022

  1. Workshop A Prospective Analysis of Security Vulnerabilities within Link Traversal-Based Query Processing
    1. 0
    2. 1
    In Proceedings of the 6th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs The societal and economic consequences surrounding Big Data-driven platforms have increased the call for decentralized solutions. However, retrieving and querying data in more decentralized environments requires fundamentally different approaches, whose properties are not yet well understood. Link-Traversal-based Query Processing (LTQP) is a technique for querying over decentralized data networks, in which a client-side query engine discovers data by traversing links between documents. Since decentralized environments are potentially unsafe due to their non-centrally controlled nature, there is a need for client-side LTQP query engines to be resistant against security threats aimed at the query engine’s host machine or the query initiator’s personal data. As such, we have performed an analysis of potential security vulnerabilities of LTQP. This article provides an overview of security threats in related domains, which are used as inspiration for the identification of 10 LTQP security threats. This list of security threats forms a basis for future work in which mitigations for each of these threats need to be developed and tested for their effectiveness. With this work, we start filling the unknowns for enabling query execution over decentralized environments. Aside from future work on security, wider research will be needed to uncover missing building blocks for enabling true data decentralization. 2022
    More
  2. Demo Solid Web Monetization
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    6. 5
    7. 6
    In International Conference on Web Engineering The Solid decentralization effort decouples data from services, so that users are in full control over their personal data. In this light, Web Monetization has been proposed as an alternative business model for web services that does not depend on data collection anymore. Integrating Web Monetization with Solid, however, remains difficult because of the heterogeneity of Interledger wallet implementations, lack of mechanisms for securely paying on behalf of a user, and an inherent issue of trusting content providers to handle payments. We propose the Web Monetization Provider as a solution to these challenges. The WMP acts as a third party, hiding the underlying complexity of transactions and acting as a source of trust in Web Monetization interactions. This demo shows a working end-to-end example including a website providing monetized content, a WMP, and a dashboard for configuring WMP into a Solid identity. 2022
    More
  3. Workshop A Policy-Oriented Architecture for Enforcing Consent in Solid
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 2nd International Workshop on Consent Management in Online Services, Networks and Things The Solid project aims to restore end-users’ control over their data by decoupling services and applications from data storage. To realize data governance by the user, the Solid Protocol 0.9 relies on Web Access Control, which has limited expressivity and interpretability. In contrast, recent privacy and data protection regulations impose strict requirements on personal data processing applications and the scope of their operation. The Web Access Control mechanism lacks the granularity and contextual awareness needed to enforce these regulatory requirements. Therefore, we suggest a possible architecture for relating Solid’s low-level technical access control rules with higher-level concepts such as the legal basis and purpose for data processing, the abstract types of information being processed, and the data sharing preferences of the data subject. Our architecture combines recent technical efforts by the Solid community panels with prior proposals made by researchers on the use of ODRL and SPECIAL policies as an extension to Solid’s authorization mechanism. While our approach appears to avoid a number of pitfalls identified in previous research, further work is needed before it can be implemented and used in a practical setting. 2022
    More
  4. Journal Components.js: Semantic Dependency Injection
    1. 0
    2. 1
    3. 2
    4. 3
    In Semantic Web Journal A common practice within object-oriented software is using composition to realize complex object behavior in a reusable way. Such compositions can be managed by Dependency Injection (DI), a popular technique in which components only depend on minimal interfaces and have their concrete dependencies passed into them. Instead of requiring program code, this separation enables describing the desired instantiations in declarative configuration files, such that objects can be wired together automatically at runtime. Configurations for existing DI frameworks typically only have local semantics, which limits their usage in other contexts. Yet some cases require configurations outside of their local scope, such as for the reproducibility of experiments, static program analysis, and semantic workflows. As such, there is a need for globally interoperable, addressable, and discoverable configurations, which can be achieved by leveraging Linked Data. We created Components.js as an open-source semantic DI framework for TypeScript and JavaScript applications, providing global semantics via Linked Data-based configuration files. In this article, we report on the Components.js framework by explaining its architecture and configuration, and discuss its impact by mentioning where and how applications use it. We show that Components.js is a stable framework that has seen significant uptake during the last couple of years. We recommend it for software projects that require high flexibility, configuration without code changes, sharing configurations with others, or applying these configurations in other contexts such as experimentation or static program analysis. We anticipate that Components.js will continue driving concrete research and development projects that require high degrees of customization to facilitate experimentation and testing, including the Comunica query engine and the Community Solid Server for decentralized data publication. 2022
    More

2021

  1. Conference Towards a personal data vault society: an interplay between technological and business perspectives
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    6. 5
    In 60th FITCE Communication Days Congress for ICT Professionals: Industrial Data–Cloud, Low Latency and Privacy (FITCE) The Web is evolving more and more to a small set of walled gardens where a very small number of platforms determine the way that people get access to the Web (i.e. "log in via Facebook"). As a pushback against this ongoing centralization of data and services on the Web, decentralization efforts are taking place that move data into personal data vaults. This positioning paper discusses a potential personal data vault society and the required research steps in order to get there. Emphasis is given on the needed interplay between technological and business research perspectives. The focus is on the situation of Flanders, based on a recent announcement of the Flemish government to set up a Data Utility Company. The concepts as well as the suggested path for adoption can easily be extended, however, to other situations/regions as well. 2021
    More
  2. Journal Optimizing Storage of RDF Archives using Bidirectional Delta Chains
    1. 0
    2. 1
    3. 2
    4. 3
    In Semantic Web Journal Linked Open Datasets on the Web that are published as RDF can evolve over time. There is a need to be able to store such evolving RDF datasets, and query across their versions. Different storage strategies are available for managing such versioned datasets, each being efficient for specific types of versioned queries. In recent work, a hybrid storage strategy has been introduced that combines these different strategies to lead to more efficient query execution for all versioned query types at the cost of increased ingestion time. While this trade-off is beneficial in the context of Web querying, it suffers from exponential ingestion times in terms of the number of versions, which becomes problematic for RDF datasets with many versions. As such, there is a need for an improved storage strategy that scales better in terms of ingestion time for many versions. We have designed, implemented, and evaluated a change to the hybrid storage strategy where we make use of a _bidirectional delta chain_ instead of the default _unidirectional delta chain_. In this article, we introduce a concrete architecture for this change, together with accompanying ingestion and querying algorithms. Experimental results from our implementation show that the ingestion time is significantly reduced. As an additional benefit, this change also leads to lower total storage size and even improved query execution performance in some cases. This work shows that modifying the structure of delta chains within the hybrid storage strategy can be highly beneficial for RDF archives. In future work, other modifications to this delta chain structure deserve to be investigated, to further improve the scalability of ingestion and querying of datasets with many versions. 2021
    More
  3. Conference Link Traversal with Distributed Subweb Specifications
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    6. 5
    In Rules and Reasoning: 5th International Joint Conference, RuleML+RR 2021, Leuven, Belgium, September 8 – September 15, 2021, Proceedings Link Traversal–based Query Processing (LTQP), in which a SPARQL query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While LTQP allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing LTQP approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. 2021
    More
  4. Demo PROV4ITDaTa: Transparent and direct transfer of personal data to personal stores
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    6. 5
    In WWW2021, the International World Wide Web Conference Data is scattered across service providers, heterogeneously structured in various formats. By lack of interoperability, data portability is hindered, and thus user control is inhibited. An interoperable dataportability solution for transferring personal data is needed. We demo PROV4ITDaTa: a Web application, that allows users to transfer personal data into an interoperable format to their personal datastore. PROV4ITDaTa leverages the open-source solutions RML.io, Comunica, and Solid: (i) the RML.io toolset to describe how to accessdata from service providers and generate interoperable datasets; (ii) Comunica to query these and more flexibly generate enricheddatasets; and (iii) Solid Pods to store the generated data as LinkedData in personal data stores. As opposed to other (hard-coded) so-lutions, PROV4ITDaTa is fully transparent, where each componentof the pipeline is fully configurable and automatically generatesdetailed provenance trails. Furthermore, transforming the personaldata into RDF allows for an interopable solution. By maximizing theuse of open-source tools and open standards, PROV4ITDaTa facilitates the shift towards a data ecosystem wherein users have controlof their data, and providers can focus on their service instead oftrying to adhere to interoperability requirements. 2021
    More
  5. Journal Semantic micro-contributions with decentralized nanopublication services
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    6. 5
    In PeerJ Computer Science While the publication of Linked Data has become increasingly common, the process tends to be a relatively complicated and heavy-weight one. Linked Data is typically published by centralized entities in the form of larger dataset releases, which has the downside that there is a central bottleneck in the form of the organization or individual responsible for the releases. Moreover, certain kinds of data entries, in particular those with subjective or original content, currently do not fit into any existing dataset and are therefore more difficult to publish. To address these problems, we present here an approach to use nanopublications and a decentralized network of services to allow users to directly publish small Linked Data statements through a simple and user-friendly interface, called Nanobench, powered by semantic templates that are themselves published as nanopublications. The published nanopublications are cryptographically verifiable and can be queried through a redundant and decentralized network of services, based on the grlc API generator and a new quad extension of Triple Pattern Fragments. We show here that these two kinds of services are complementary and together allow us to query nanopublications in a reliable and efficient manner. We also show that Nanobench makes it indeed very easy for users to publish Linked Data statements, even for those who have no prior experience in Linked Data publishing. 2021
    More

2020

  1. Workshop Optimizing Approximate Membership Metadata in Triple Pattern Fragments for Clients and Servers
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 13th International Workshop on Scalable Semantic Web Knowledge Base Systems Depending on the HTTP interface used for publishing Linked Data, the effort of evaluating a SPARQL query can be redistributed differently between clients and servers. For instance, lower server-side CPU usage can be realized at the expense of higher bandwidth consumption. Previous work has shown that complementing lightweight interfaces such as Triple Pattern Fragments (TPF) with additional metadata can positively impact the performance of clients and servers. Specifically, Approximate Membership Filters (AMFs)—data structures that are small and probabilistic—in the context of TPF were shown to reduce the number of HTTP requests, at the expense of increasing query execution times. In order to mitigate this significant drawback, we have investigated unexplored aspects of AMFs as metadata on TPF interfaces. In this article, we introduce and evaluate alternative approaches for server-side publication and client-side consumption of AMFs within TPF to achieve faster query execution, while maintaining low server-side effort. Our alternative client-side algorithm and the proposed server configurations significantly reduce both the number of HTTP requests and query execution time, with only a small increase in server load, thereby mitigating the major bottleneck of AMFs within TPF. Compared to regular TPF, average query execution is more than 2 times faster and requires only 10% of the number of HTTP requests, at the cost of at most a 10% increase in server load. These findings translate into a set of concrete guidelines for data publishers on how to configure AMF metadata on their servers. 2020
    More
  2. Workshop RDF Test Suite: Improving Developer Experience for Building Specification-Compliant Libraries
    1. 0
    In Proceedings of the 1st Workshop on The Semantic Web in Practice: Tools and Pedagogy Guaranteeing compliance of software libraries to certain specifications is crucial for ensuring interoperability within the Semantic Web. While many specifications publish their own declarative test suite for testing compliance, the actual execution of these test suites can add a huge overhead for library developers. In order to remove this burden from these developers, we introduce a JavaScript tool called RDF Test Suite that takes care of all the required bookkeeping behind test suite execution. In this report, we discuss the design goals of RDF Test Suite, how it has been implemented, and how it can be used. In practice, this tool is being used in a variety of libraries, and it ensures their specification compliance with minimal overhead. 2020
    More
  3. Conference LDflex: a Read/Write Linked Data Abstraction for Front-End Web Developers
    1. 0
    2. 1
    In Proceedings of the 19th International Semantic Web Conference Many Web developers nowadays are trained to build applications with a user-facing browser front-end that obtains predictable data structures from a single, well-known back-end. Linked Data invalidates such assumptions, since data can combine several ontologies and span multiple servers with different APIs. Front-end developers, who specialize in creating end-user experiences rather than back-ends, need an abstraction layer to the Web of Data that matches their existing mindset and workflow. We have developed LDflex, a domain-specific language that exposes common Linked Data access patterns as concise JavaScript expressions. In this article, we describe the design and embedding of the language, and discuss its daily usage within two companies. LDflex succeeds in eliminating a dedicated data layer for common and straightforward data access patterns, without striving to be a replacement for more complex cases. Our experiences reveal that designing a Linked Data developer experience—analogous to a user experience—is crucial for adoption by the target group, who can create Linked Data applications for end users. Crucially, simple abstractions require research to hide the underlying complexity. 2020
    More
  4. Conference Facilitating the Analysis of COVID-19 Literature Through a Knowledge Graph
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    6. 5
    7. 6
    8. 7
    9. 8
    10. 9
    In Proceedings of the 19th International Semantic Web Conference At the end of 2019, Chinese authorities alerted the World Health Organization (WHO) of the outbreak of a new strain of the coronavirus, called SARS-CoV-2, which struck humanity by an unprecedented disaster a few months later. In response to this pandemic, a publicly available dataset was released on Kaggle which contained information of over 63,000 papers. In order to facilitate the analysis of this large mass of literature, we have created a knowledge graph based on this dataset. Within this knowledge graph, all information of the original dataset is linked together, which makes it easier to search for relevant information. The knowledge graph is also enriched with additional links to appropriate, already existing external resources. In this paper, we elaborate on the different steps performed to construct such a knowledge graph from structured documents. Moreover, we discuss, on a conceptual level, several possible applications and analyses that can be built on top of this knowledge graph. As such, we aim to provide a resource that allows people to more easily build applications that give more insights into the COVID-19 pandemic. 2020
    More
  5. Workshop Towards Querying in Decentralized Environments with Privacy-Preserving Aggregation
    1. 0
    2. 1
    3. 2
    In Proceedings of the 4th Workshop on Storing, Querying, and Benchmarking the Web of Data The Web is a ubiquitous economic, educational, and collaborative space. However, it also serves as a haven for personal information harvesting. Existing decentralised Web-based ecosystems, such as Solid, aim to combat personal data exploitation on the Web by enabling individuals to manage their data in the personal data store of their choice. Since personal data in these decentralised ecosystems are distributed across many sources, there is a need for techniques to support efficient privacy-preserving query execution over personal data stores. Towards this end, in this position paper we present a framework for efficient privacy preserving federated querying, and highlight open research challenges and opportunities. The overarching goal being to provide a means to position future research into privacy-preserving querying within decentralised environments. 2020
    More
  6. Conference Pattern-based Access Control in a Decentralised Collaboration Environment
    1. 0
    2. 1
    3. 2
    4. 3
    5. 5
    In Proceedings of LDAC2020 - 8th Linked Data in Architecture and Construction Workshop As the building industry is rapidly catching up with digital advancements, and Web technologies grow in both maturity and security , a data-and Web-based construction practice comes within reach. In such an environment, private project information and open online data can be combined to allow cross-domain interoperability at data level, using Semantic Web technologies. As construction projects often feature complex and temporary networks of stakeholder firms and their employees , a property-based access control mechanism is necessary to enable a flexible and automated management of distributed building projects. In this article, we propose a method to facilitate such mechanism using existing Web technologies: RDF, SHACL, WebIDs, nanopublications and the Linked Data Platform. The proposed method will be illustrated with an extension of a custom nodeJS Solid server. The potential of the Solid ecosystem has been put forward earlier as a basis for a Linked Data-based Common Data Environment: its decentralised setup, connection of both RDF and non-RDF resources and fine-grained access control mechanisms are considered an apt foundation to manage distributed building data. 2020
    More
  7. Preprint Guided Link-Traversal-Based Query Processing
    1. 0
    2. 1
    Link-Traversal-Based Query Processing (LTBQP) is a technique for evaluating queries over a web of data by starting with a set of seed documents that is dynamically expanded through following hyperlinks. Compared to query evaluation over a static set of sources, LTBQP is significantly slower because of the number of needed network requests. Furthermore, there are concerns regarding relevance and trustworthiness of results, given that sources are selected dynamically. To address both issues, we propose guided LTBQP, a technique in which information about document linking structure and content policies is passed to a query processor. Thereby, the processor can prune the search tree of documents by only following relevant links, and restrict the result set to desired results by limiting which documents are considered for what kinds of content. In this exploratory paper, we describe the technique at a high level and sketch some of its applications. We argue that such guidance can make LTBQP a valuable query strategy in decentralized environments, where data is spread across documents with varying levels of user trust. 2020
    More
  8. PhD Thesis Storing and Querying Evolving Knowledge Graphs on the Web
    1. 0
    In Storing and Querying Evolving Knowledge Graphs on the Web The Web has become our most valuable tool for sharing information. Currently, this Web is mainly targeted at humans, whereas machines typically have a hard time understanding information on the Web. Using knowledge graphs, this information can be linked in a structured way, so that intelligent agents can act upon this data autonomously. Current knowledge graphs remain however rather static. As there is a lot of value in acting upon evolving knowledge, there is a need for evolving knowledge graphs, and ways to manage them. As such, the goal of this PhD is to allow such evolving knowledge graphs to be stored and queried, taking into account the decentralized nature of the Web where anyone should be able to say anything about anything. Concretely, four challenges related to this goal are investigated: (1) generation of evolving data, (2) storage of evolving data, (3) querying over heterogeneous datasets, and (4) querying evolving data. For each of these challenges, techniques and algorithms have been developed, which prove to be valuable for storing and querying evolving knowledge graphs on the Web. This work therefore brings us closer towards a Web in which both human and machine can act upon evolving knowledge. 2020
    More

2019

  1. Conference Streamlining governmental processes by putting citizens in control of their personal data
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    6. 5
    7. 6
    In Proceedings of the 6th International Conference on Electronic Governance and Open Society: Challenges in Eurasia Governments typically store large amounts of personal information on their citizens, such as a home address, marital status, and occupation, to offer public services. Because governments consist of various governmental agen- cies, multiple copies of this data often exist. This raises concerns regarding data consistency, privacy, and access control, especially under recent legal frame- works such as GDPR. To solve these problems, and to give citizens true control over their data, we explore an approach using the decentralised Solid ecosys- tem, which enables citizens to maintain their data in personal data pods. We have applied this approach to two high-impact use cases, where citizen infor- mation is stored in personal data pods, and both public and private organisations are selectively granted access. Our findings indicate that Solid allows reshaping the relationship between citizens, their personal data, and the applications they use in the public and private sector. We strongly believe that the insights from this Flemish Solid Pilot can speed up the process for public administrations and private organisations that want to put the users in control of their data. 2019
    More
  2. Demo Discovering Data Sources in a Distributed Network of Heritage Information
    1. 0
    2. 3
    3. 5
    In Proceedings of the Posters and Demo Track of the 15th International Conference on Semantic Systems The Netwerk Digitaal Erfgoed is a Dutch partnership that focuses on improving the visibility, usability and sustainability of digital collections in the cultural heritage sector. The vision is to improve the usability of the data by sur-mounting the borders between the separate collections of the cultural heritage institutions. Key concepts in this vision are the alignment of the data by using shared descriptions (e.g. thesauri), and the publication of the data as Linked Open Data. This demo paper describes a Proof of Concept to test this vision. It uses a register, where only summaries of datasets are stored, instead of all the data. Based on these summaries, a portal can query the register to find what datasources might be of interest, and then query the data directly from the relevant data sources. 2019
    More
  3. Conference Reflections on: Triple Storage for Random-Access Versioned Querying of RDF Archives
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    In Proceedings of the 18th International Semantic Web Conference In addition to their latest version, Linked Open Datasets on the Web can also contain useful information in or between previous versions. In order to exploit this information, we can maintain history in RDF archives. Existing approaches either require much storage space, or they do not meet sufficiently expressive querying demands. In this extended abstract, we discuss an RDF archive indexing technique that has a low storage overhead, and adds metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating versioned queries. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. Our storage technique reduces query evaluation time through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis. 2019
    More
  4. Demo Using an existing website as a queryable low-cost LOD publishing interface
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 16th Extended Semantic Web Conference: Posters and Demos Maintaining an Open Dataset comes at an extra recurring cost when it is published in a dedicated Web interface. As there is not often a direct financial return from publishing a dataset publicly, these extra costs need to be minimized. Therefore we want to explore reusing existing infrastructure by enriching existing websites with Linked Data. In this demonstrator, we advised the data owner to annotate a digital heritage website with JSON-LD snippets, resulting in a dataset of more than three million triples that is now available and officially maintained. The website itself is paged, and thus hydra partial collection view controls were added in the snippets. We then extended the modular query engine Comunica to support following page controls and extracting data from HTML documents while querying. This way, a SPARQL or GraphQL query over multiple heterogeneous data sources can power automated data reuse. While the query performance on such an interface is visibly poor, it becomes easy to create composite data dumps. As a result of implementing these building blocks in Comunica, any paged collection and enriched HTML page now becomes queryable by the query engine. This enables heterogenous data interfaces to share functionality and become technically interoperable. 2019
    More
  5. Position Statement Bridges between GraphQL and RDF
    1. 0
    2. 1
    3. 2
    In W3C Workshop on Web Standardization for Graph Data GraphQL offers a highly popular query languages for graphs, which is well known among Web developers. Each GraphQL data graph is scoped within an interface-specific schema, which makes it difficult to integrate data from multiple interfaces. The RDF graph model offers a solution to this integration problem. Different techniques can enable querying over RDF graphs using GraphQL queries. In this position statement, we provide a high-level overview of the existing techniques, and how they differ. We argue that each of these techniques have their merits, but standardization is essential to simplify the link between GraphQL and RDF. 2019
    More

2018

  1. Conference Comunica: a Modular SPARQL Query Engine for the Web
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 17th International Semantic Web Conference Query evaluation over Linked Data sources has become a complex story, given the multitude of algorithms and techniques for single- and multi-source querying, as well as the heterogeneity of Web interfaces through which data is published online. Today’s query processors are insufficiently adaptable to test multiple query engine aspects in combination, such as evaluating the performance of a certain join algorithm over a federation of heterogeneous interfaces. The Semantic Web research community is in need of a flexible query engine that allows plugging in new components such as different algorithms, new or experimental SPARQL features, and support for new Web interfaces. We designed and developed a Web-friendly and modular meta query engine called Comunica that meets these specifications. In this article, we introduce this query engine and explain the architectural choices behind its design. We show how its modular nature makes it an ideal research platform for investigating new kinds of Linked Data interfaces and querying algorithms. Comunica facilitates the development, testing, and evaluation of new query processing capabilities, both in isolation and in combination with others. 2018
    More
  2. Demo GraphQL-LD: Linked Data Querying with GraphQL
    1. 0
    2. 1
    3. 2
    In Proceedings of the 17th International Semantic Web Conference: Posters and Demos The Linked Open Data cloud has the potential of significantly enhancing and transforming end-user applications. For example, the use of URIs to identify things allows data joining between separate data sources. Most popular (Web) application frameworks, such as React and Angular have limited support for querying the Web of Linked Data, which leads to a high-entry barrier for Web application developers. Instead, these developers increasingly use the highly popular GraphQL query language for retrieving data from GraphQL APIs, because GraphQL is tightly integrated into these frameworks. In order to lower the barrier for developers towards Linked Data consumption, the Linked Open Data cloud needs to be queryable with GraphQL as well. In this article, we introduce a method for transforming GraphQL queries coupled with a JSON-LD context to SPARQL, and a method for converting SPARQL results to the GraphQL query-compatible response. We demonstrate this method by implementing it into the Comunica framework. This approach brings us one step closer towards widespread Linked Data consumption for application development. 2018
    More
  3. Demo Demonstration of Comunica, a Web framework for querying heterogeneous Linked Data interfaces
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 17th International Semantic Web Conference: Posters and Demos Linked Data sources can appear in a variety of forms, going from SPARQL endpoints to Triple Pattern Fragments and data dumps. This heterogeneity among Linked Data sources creates an added layer of complexity when querying or combining results from those sources. To ease this problem, we created a modular engine, Comunica, that has modules for evaluating SPARQL queries and supports heterogeneous interfaces. Other modules for other query or source types can easily be added. In this paper we showcase a Web client that uses Comunica to evaluate federated SPARQL queries through automatic source type identification and interaction. 2018
    More
  4. Workshop The Fundamentals of Semantic Versioned Querying
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 12th International Workshop on Scalable Semantic Web Knowledge Base Systems co-located with 17th International Semantic Web Conference The domain of RDF versioning concerns itself with the storage of different versions of Linked Datasets. The ability of querying over these versions is an active area of research, and allows for basic insights to be discovered, such as tracking the evolution of certain things in datasets. Querying can however only get you so far. In order to derive logical consequences from existing knowledge, we need to be able to reason over this data, such as ontology-based inferencing. In order to achieve this, we explore fundamental concepts on semantic querying of versioned datasets using ontological knowledge. In this work, we present these concepts as a semantic extension of the existing RDF versioning concepts that focus on syntactical versioning. We remain general and assume that versions do not necessarily follow a purely linear temporal relation. This work lays a foundation for reasoning over RDF versions from a querying perspective, using which RDF versioning storage, query and reasoning systems can be designed. 2018
    More
  5. Conference On the Semantics of TPF-QS towards Publishing and Querying RDF Streams at Web-scale
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    6. 5
    In Proceedings of the 14th International Conference on Semantic Systems RDF Stream Processing (RSP) is a rapidly evolving area of research that focuses on extensions of the Semantic Web in order to model and process Web data streams. While state-of-the-art approaches concentrate on server-side processing of RDF streams, we investigate the Triple Pattern Fragments Query Streamer (TPF-QS) method for server-side publishing of RDF streams, which moves the workload of continuous querying to clients. We formalize TPF-QS in terms of the RSP-QL reference model in order to formally compare it with existing RSP query languages. We experimentally validate that, compared to the state of the art, the server load of TPF-QS scales better with increasing numbers of concurrent clients in case of simple queries, at the cost of increased bandwidth consumption. This shows that TPF-QS is an important first step towards a viable solution for Web-scale publication and continuous processing of RDF streams. 2018
    More
  6. Journal Triple Storage for Random-Access Versioned Querying of RDF Archives
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    In Journal of Web Semantics When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis. 2018
    More
  7. Journal Generating Public Transport Data based on Population Distributions for RDF Benchmarking
    1. 0
    2. 1
    3. 2
    4. 3
    In Semantic Web Journal When benchmarking RDF data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing. For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility. Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic, raising questions about the generalizability of benchmark results to real-world scenarios. In order to benchmark geospatial and temporal RDF data management systems such as route planners with sufficient external validity and depth, we designed PoDiGG, a highly configurable generation algorithm for synthetic public transport datasets with realistic geospatial and temporal characteristics comparable to those of their real-world variants. The algorithm is inspired by real-world public transit network design and scheduling methodologies. This article discusses the design and implementation of PoDiGG and validates the properties of its generated datasets. Our findings show that the generator achieves a sufficient level of realism, based on the existing coherence metric and new metrics we introduce specifically for the public transport domain. Thereby, podigg provides a flexible foundation for benchmarking RDF data management systems with geospatial and temporal data. 2018
    More
  8. Challenge Versioned Querying with OSTRICH and Comunica in MOCHA 2018
    1. 0
    2. 1
    3. 2
    In Proceedings of the 5th SemWebEval Challenge at ESWC 2018 In order to exploit the value of historical information in Linked Datasets, we need to be able to store and query different versions of such datasets efficiently. The 2018 edition of the Mighty Storage Challenge (MOCHA) is organized to discover the efficiency of such Linked Data stores and to detect their bottlenecks. One task in this challenge focuses on the storage and querying of versioned datasets, in which we participated by combining the OSTRICH triple store and the Comunica SPARQL engine. In this article, we briefly introduce our system for the versioning task of this challenge. We present the evaluation results that show that our system achieves fast query times for the supported queries, but not all queries are supported by Comunica at the time of writing. These results of this challenge will serve as a guideline for further improvements to our system. 2018
    More
  9. Conference Components.js: A Semantic Dependency Injection Framework
    1. 0
    2. 1
    3. 2
    In Proceedings of the The Web Conference: Developers Track Components.js is a dependency injection framework for JavaScript applications that allows components to be instantiated and wired together declaratively using semantic configuration files. The advantage of these semantic configuration files is that software components can be uniquely and globally identified using URIs. As an example, this documentation has been made self-instantiatable using Components.js. This makes it possible to view the HTML-version of any page to the console, or serve it via HTTP on a local webserver. 2018
    More
  10. Demo OSTRICH: Versioned Random-Access Triple Store
    1. 0
    2. 1
    3. 2
    In Proceedings of the 27th International Conference Companion on World Wide Web The Linked Open Data cloud is evergrowing and many datasets are frequently being updated. In order to fully exploit the potential of the information that is available in and over historical dataset versions, such as discovering evolution of taxonomies or diseases in biomedical datasets, we need to be able to store and query the different versions of Linked Datasets efficiently. In this demonstration, we introduce OSTRICH, which is an efficient triple store with supported for versioned query evaluation. We demonstrate the capabilities of OSTRICH using a Web-based graphical user interface in which a store can be opened or created. Using this interface, the user is able to query in, between, and over different versions, ingest new versions, and retrieve summarizing statistics. 2018
    More
  11. Workshop A Preliminary Open Data Publishing Strategy for Live Data in Flanders
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    6. 5
    In Proceedings of the 27th International Conference Companion on World Wide Web For smart decision making, user agents need live and historic access to open data from sensors installed in the public domain. In contrast to a closed environment, for Open Data and federated query processing algorithms, the data publisher cannot anticipate in advance on specific questions, nor can it deal with a bad cost-efficiency of the server interface when data consumers increase. When publishing observations from sensors, different fragmentation strategies can be thought of depending on how the historic data needs to be queried. Furthermore, both publish/subscribe and polling strategies exist to publish live updates. Each of these strategies come with their own trade-offs regarding cost-efficiency of the server-interface, user-perceived performance and cpu use. A polling strategy where multiple observations are published in a paged collection was tested in a proof of concept for parking spaces availability. In order to understand the different resource trade-offs presented by publish/subscribe and polling publication strategies, we devised an experiment on two machines, for a scalability test. The preliminary results were inconclusive and suggest more large scale tests are needed in order to see a trend. While the large-scale tests will be performed in future work, the proof of concept helped to identify the technical Open Data principles for the 13 biggest cities in Flanders. 2018
    More

2017

  1. Conference Declaratively Describing Responses of Hypermedia-Driven Web APIs
    1. 0
    2. 1
    In Proceedings of the 9th International Conference on Knowledge Capture While humans browse the Web by following links, these hypermedia links can also be used by machines for browsing. While efforts such as Hydra semantically describe the hypermedia controls on Web interfaces to enable smarter interface-agnostic clients, they are largely limited to the input parameters to interfaces, and clients therefore do not know what response to expect from these interfaces. In order to convey such expectations, interfaces need to declaratively describe the response structure of their parameterized hypermedia controls. We therefore explored techniques to represent this parameterized response structure in a generic but expressive way. In this work, we discuss four different approaches for declaring a response structure, and we compare them based on a model that we introduce. Based on this model, we conclude that a SHACL shape-based approach can be used for declaring such a parameterized response structure, as it conforms to the REST architectural style that has helped shape the Web into its current form. 2017
    More
  2. Workshop Describing configurations of software experiments as Linked Data
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the First Workshop on Enabling Open Semantic Science (SemSci) Within computer science engineering, research articles often rely on software experiments in order to evaluate contributions. Reproducing such experiments involves setting up software, benchmarks, and test data. Unfortunately, many articles ambiguously refer to software by name only, leaving out crucial details for reproducibility, such as module and dependency version numbers or the configuration of individual components in different setups. To address this, we created the Object-Oriented Components ontology for the semantic description of software components and their configuration. This article discusses the ontology and its application, and demonstrates with a use case how to publish experiments and their software configurations on the Web. In order to enable semantic interlinking between configurations and modules, we published the metadata of all 500,000+ JavaScript libraries on npm as 200,000,000+ RDF triples. Through our work, research articles can refer by URL to fine-grained descriptions of experimental setups. This brings us faster to accurate reproductions of experiments, and facilitates the evaluation of new research contributions with different software configurations. In the future, software could be instantiated automatically based on these descriptions and configurations, reasoning and querying can be applied to software configurations for meta-research purposes. 2017
    More
  3. Demo Live Storage and Querying of Versioned Datasets on the Web
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 14th Extended Semantic Web Conference: Posters and Demos Linked Datasets often evolve over time for a variety of reasons. While typical scenarios rely on the latest version only, useful knowledge may still be contained within or between older versions, such as the historical information of biomedical patient data. In order to make this historical information cost- efficiently available on the Web, a low-cost interface is required for providing access to versioned datasets. For our demonstration, we set up a live Triple Pattern Fragments interface for a versioned dataset with queryable access. We explain different version query types of this interface, and how it communicates with a storage solution that can handle these queries efficiently. 2017
    More
  4. Workshop Versioned Triple Pattern Fragments: A Low-cost Linked Data Interface Feature for Web Archives
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 3rd Workshop on Managing the Evolution and Preservation of the Data Web Linked Datasets typically evolve over time because triples can be removed from or added to datasets, which results in different dataset versions. While most attention is typically given to the latest dataset version, a lot of useful information is still present in previous versions and its historical evolution. In order to make this historical information queryable at Web scale, a low-cost interface is required that provides access to different dataset versions. In this paper, we add a versioning feature to the existing Triple Pattern Fragments interface for queries at, between and for versions, with an accompanying vocabulary for describing the results, metadata and hypermedia controls. This interface feature is an important step into the direction of making versioned datasets queryable on the Web, with a low publication cost and effort. 2017
    More
  5. Poster PoDiGG: A Public Transport RDF Dataset Generator
    1. 0
    2. 1
    3. 3
    In Proceedings of the 26th International Conference Companion on World Wide Web A large amount of public transport data is made available by many different providers, which makes RDF a great method for integrating these datasets. Furthermore, this type of data provides a great source of information that combines both geospatial and temporal data. These aspects are currently undertested in RDF data management systems, because of the limited availability of realistic input datasets. In order to bring public transport data to the world of benchmarking, we need to be able to create synthetic variants of this data. In this paper, we introduce a dataset generator with the capability to create realistic public transport data. This dataset generator, and the ability to configure it on different levels, makes it easier to use public transport data for benchmarking with great flexibility. 2017
    More
  6. Tutorial Modeling, Generating, and Publishing Knowledge as Linked Data
    1. 0
    2. 1
    3. 2
    4. 3
    In Knowledge Engineering and Knowledge Management: EKAW 2016 Satellite Events, EKM and Drift-an-LOD, Bologna, Italy, November 19–23, 2016, Revised Selected Papers The process of extracting, structuring, and organizing knowledge from one or multiple data sources and preparing it for the Semantic Web requires a dedicated class of systems. They enable processing large and originally heterogeneous data sources and capturing new knowledge. Offering existing data as Linked Data increases its shareability, extensibility, and reusability. However, using Linking Data as a means to represent knowledge can be easier said than done. In this tutorial, we elaborate on the importance of semantically annotating data and how existing technologies facilitate their mapping to Linked Data. We introduce [R2]RML languages to generate Linked Data derived from different heterogeneous data formats –e.g., DBs, XML, or JSON– and from different interfaces –e.g., files or Web apis. Those who are not Semantic Web experts can annotate their data with the RMLEditor, whose user interface hides all underlying Semantic Web technologies to data owners. Last, we show how to easily publish Linked Data on the Web as Triple Pattern Fragments. As a result, participants, independently of their knowledge background, can model, annotate and publish data on their own. 2017
    More

2016

  1. Poster Exposing RDF Archives using Triple Pattern Fragments
    1. 0
    2. 1
    3. 2
    In Proceedings of the 20th International Conference on Knowledge Engineering and Knowledge Management: Posters and Demos Linked Datasets typically change over time, and knowledge of this historical information can be useful. This makes the storage and querying of Dynamic Linked Open Data an important area of research. With the current versioning solutions, publishing Dynamic Linked Open Data at Web-Scale is possible, but too expensive. We investigate the possibility of using the low-cost Triple Pattern Fragments (TPF) interface to publish versioned Linked Open Data. In this paper, we discuss requirements for supporting versioning in the TPF framework, on the level of the interface, storage and client, and investigate which trade-offs exist. These requirements lay the foundations for further research in the area of low-cost, Web-Scale dynamic Linked Open Data publication and querying. 2016
    More
  2. Demo Querying Dynamic Datasources with Continuously Mapped Sensor Data
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 15th International Semantic Web Conference: Posters and Demos The world contains a large amount of sensors that produce new data at a high frequency. It is currently very hard to find public services that expose these measurements as dynamic Linked Data. We investigate how sensor data can be published continuously on the Web at a low cost. This paper describes how the publication of various sensor data sources can be done by continuously mapping raw sensor data to RDF and inserting it into a live, low-cost server. This makes it possible for clients to continuously evaluate dynamic queries using public sensor data. For our demonstration, we will illustrate how this pipeline works for the publication of temperature and humidity data originating from a microcontroller, and how it can be queried. 2016
    More
  3. Demo Linked Sensor Data Generation using Queryable RML Mappings
    1. 0
    2. 1
    3. 2
    4. 3
    In Proceedings of the 15th International Semantic Web Conference: Posters and Demos As the amount of generated sensor data is increasing, semantic interoperability becomes an important aspect in order to support efficient data distribution and communication. Therefore, the integration of (sensor) data is important, as this data is coming from different data sources and might be in different formats. Furthermore, reusable and extensible methods for this integration are required in order to be able to scale with the growing number of applications that generate semantic sensor data. Current research efforts allow to map sensor data to Linked Data in order to provide semantic interoperability. However, they lack support for multiple data sources, hampering the integration. Furthermore, the used methods are not available for reuse or are not extensible, which hampers the development of applications. In this paper, we describe how the RDF Mapping Language (RML) and a Triple Pattern Fragments (TPF) server are used to address these shortcomings. The demonstration consists of a micro controller that generates sensor data. The data is captured and mapped to rdf triples using module-specific RML mappings, which are queried from a TPF server. 2016
    More
  4. Workshop Multidimensional Interfaces for Selecting Data within Ordinal Ranges
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    In Proceedings of the 7th International Workshop on Consuming Linked Data Linked Data interfaces exist in many flavours, as evidenced by subject pages, sparql endpoints, triple pattern interfaces, and data dumps. These interfaces are mostly used to retrieve parts of a complete dataset, such parts can for example be defined by ranges in one or more dimensions. Filtering Linked Data by dimensions such as time range, geospatial area, or genomic location, requires the lookup of data within ordinal ranges. To make retrieval by such ranges generic and cost-efficient, we propose a REST solution in-between looking up data within ordinal ranges entirely on the server, or entirely on the client. To this end, we introduce a method for extending any Linked Data interface with an n-dimensional interface-level index such that n-dimensional ordinal data can be selected using n-dimensional ranges. We formally define Range Gates and Range Fragments and theoretically evaluate the cost-efficiency of hosting such an interface. By adding a multidimensional index to a Linked Data interface for multidimensional ordinal data, we found that we can get benefits from both worlds: the expressivity of the server raises, yet remains more cost-efficient than an interface providing the full functionality on the server-side. Furthermore, the client now shares in the effort to filter the data. This makes query processing becomes more flexible to the end-user, because the query plan can be altered by the engine. In future work we hope to apply Range Gates and Range Fragments to real-world interfaces to give quicker access to data within ordinal ranges 2016
    More
  5. Conference Continuous Client-Side Query Evaluation over Dynamic Linked Data
    1. 0
    2. 1
    3. 2
    4. 3
    In The Semantic Web: ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 – June 2, 2016, Revised Selected Papers Existing solutions to query dynamic Linked Data sources extend the sparql language, and require continuous server processing for each query. Traditional sparql endpoints already accept highly expressive queries, so extending these endpoints for time-sensitive queries increases the server cost even further. To make continuous querying over dynamic Linked Data more affordable, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries. In this paper, we introduce the TPF Query Streamer that allows clients to evaluate sparql queries with continuously updating results. Our experiments indicate that this extension significantly lowers the server complexity, at the expense of an increase in the execution time per query. We prove that by moving the complexity of continuously evaluating queries over dynamic Linked Data to the clients and thus increasing bandwidth usage, the cost at the server side is significantly reduced. Our results show that this solution makes real-time querying more scalable for a large amount of concurrent clients when compared to the alternatives. 2016
    More
  6. Poster Moving Real-Time Linked Data Query Evaluation to the Client
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    In Proceedings of the 13th Extended Semantic Web Conference: Posters and Demos Traditional RDF stream processing engines work completely server-side, which contributes to a high server cost. For allowing a large number of concurrent clients to do continuous querying, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries. In this poster, we give the overview of a client-side RDF stream processing engine on top of TPF. Our experiments show that our solution significantly lowers the server load while increasing the load on the clients. Preliminary results indicate that our solution moves the complexity of continuously evaluating real-time queries from the server to the client, which makes real-time querying much more scalable for a large amount of concurrent clients when compared to the alternatives. 2016
    More
  7. PhD Symposium Continuously Self-Updating Query Results over Dynamic Heterogeneous Linked Data
    1. 0
    In The Semantic Web. Latest Advances and New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 – June 2, 2016, Proceedings Our society is evolving towards massive data consumption from heterogeneous sources, which includes rapidly changing data like public transit delay information. Many applications that depend on dynamic data consumption require highly available server interfaces. Existing interfaces involve substantial costs to publish rapidly changing data with high availability, and are therefore only possible for organisations that can afford such an expensive infrastructure. In my doctoral research, I investigate how to publish and consume real-time and historical Linked Data on a large scale. To reduce server-side costs for making dynamic data publication affordable, I will examine different possibilities to divide query evaluation between servers and clients. This paper discusses the methods I aim to follow together with preliminary results and the steps required to use this solution. An initial prototype achieves significantly lower server processing cost per query, while maintaining reasonable query execution times and client costs. Given these promising results, I feel confident this research direction is a viable solution for offering low-cost dynamic Linked Data interfaces as opposed to the existing high-cost solutions. 2016
    More
  8. Workshop Continuously Updating Query Results over Real-Time Linked Data
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
    In Proceedings of the 2nd Workshop on Managing the Evolution and Preservation of the Data Web Existing solutions to query dynamic Linked Data sources extend the SPARQL language, and require continuous server processing for each query. Traditional SPARQL endpoints accept highly expressive queries, contributing to high server cost. Extending these endpoints for time-sensitive queries increases the server cost even further. To make continuous querying over real-time Linked Data more affordable, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries. In this paper, we discuss a framework on top of TPF that allows clients to execute SPARQL queries with continuously updating results. Our experiments indicate that this extension significantly lowers the server complexity. The trade-off is an increase in the execution time per query. We prove that by moving the complexity of continuously evaluating real-time queries over Linked Data to the clients and thus increasing the bandwidth usage, the cost of server-side interfaces is significantly reduced. Our results show that this solution makes real-time querying more scalable in terms of cpu usage for a large amount of concurrent clients when compared to the alternatives. 2016
    More

2015

  1. Master’s Thesis Continuously Updating Queries over Real-Time Linked Data
    1. 0
    This dissertation investigates the possibilities of having continuously updating queries over Linked Data with a focus on server availability. This work builds upon the ideas of Linked Data Fragments to let the clients do most of the work when executing a query. The server adds metadata to make the clients aware of the data volatility for making sure the query results are always up-to-date. The implementation of the framework that is proposed, is eventually tested and compared to other alternative approaches. 2015
    More