Biological Knowledge Graphs

Ruben Taelman

Biological Databases, UGent, 22 April 2025

Biological
Knowledge Graphs

Ghent University – imec – IDLab, Belgium

Storing data as Knowledge Graphs

1989, CERN Switzerland

Tim Berners-Lee
inventor of the World Wide Web

The Web's foundational ideas

1990-... World-wide adoption

Not just for researchers anymore

The Web is a global information space

a.k.a. The World Wide Web (WWW)

News
Social
Media

Web is focused on humans

Achieving tasks requires manual effort

What if intelligent agents
could do these tasks for us?

I Have a Dream

A Web where intelligent
agents are able to achieve
our day-to-day tasks.
2001

The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users.

The dream: Getting there step by step

2015: Agents are being introduced that can perform simple tasks

Google Assistant Alexa Siri
Google Assistant, Alexa, Siri, ...

How do these agents work?

→ Execute Queries over Knowledge Graphs

Query = Structured Question
Knowledge Graph = Structured information

→ Structured questions over structured information

A Knowledge Graph is
a collection of structured information

Semantic Web technologies

Generative AI models

2022: LLMs offer human-like conversations based on crawled Web data

Differences to Knowledge Graphs:

ChatGPT Copilot DeepSeek

RDF enables data exchange on the Web

Facts are represented as RDF triples

RDF triple
RDF triple

Multiple RDF triples form RDF datasets

:Alice :knows :Bob .
:Alice :knows :Carol .
:Alice :name "Alice" .
:Bob :name "Bob" .
:Bob :knows :Carol .

Multiple RDF triples form RDF datasets

RDF Resources are identified by IRIs

https://example.org/Alice

Multiple syntaxes exist for RDF

Turtle

PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

dbr:Alice foaf:knows dbr:Bob .
dbr:Alice foaf:name "Alice" .

JSON-LD

{
    "@context": {
        "dbr": "http://dbpedia.org/resource/",
        "foaf": "http://xmlns.com/foaf/0.1/"
    },
    "@id": "dbr:Alice",
    "foaf:knows": "dbr:Bob",
    "foaf:name": "Alice"
}

SPARQL querying over RDF datasets

Specification: https://www.w3.org/TR/sparql-query/

Find all artists born in York

namedeathDate
Albert Joseph Moore
Charles Francis Hansom1888
David Reed (comedian)
Dustin Gee
E Ridsdale Tate1922

How do query engines process a query?

RDF dataset + SPARQL query

...

query results

How do query engines process a query?

RDF dataset + SPARQL query

SPARQL query processing

query results

Basic Graph Patterns enable graph pattern matching

Query results representation

1 solution sequence with 3 solution mappings:
namebirthplace
Bob Brockmannhttp://dbpedia.org/resource/Louisiana
Bennie Nawahihttp://dbpedia.org/resource/Honolulu
Weird Al Yankovichttp://dbpedia.org/resource/Downey,_California

Steps in SPARQL query processing

Publishing Knowledge Graphs as SPARQL Endpoints

SPARQL endpoint: API that accepts SPARQL queries, and replies with results.

Popular biological SPARQL endpoints

Federation over multiple SPARQL endpoints

Data across multiple datasources can be joined in a single query

Conclusions