Improper neutralization of data within XPath expressions ('XPath Injection')

ID

python.xpath_injection

Severity

high

Resource

Injection

Language

Python

Tags

CWE:643, NIST.SP.800-53, OWASP:2021:A3, PCI-DSS:6.5.1

Description

Improper neutralization of data within XPath expressions ('XPath Injection').

XPath is a query language used to select nodes from XML documents. Similar to SQL injection, XPath Injection involves manipulating the structure of queries using user-supplied input. When the software fails to sanitize input properly, attackers can insert malicious characters or expressions that alter the intended query, potentially exposing sensitive information or allowing access to unauthorized data.

Rationale

If the software uses untrusted input to dynamically construct the XPath expression used to retrieve data from an XML source, but it does not neutralize or incorrectly neutralizes that input, it is vulnerable to XPath Injection. This allows an attacker to control the semantics of the query.

The attacker will have control over the information selected from the XML database and can control application flow, modify logic, retrieve unauthorized data, or bypass important checks (e.g. authentication).

Consider the following insecure example using Python’s lxml library to query an XML document:

from lxml import etree

def get_user_info(username):
    xml_data = '''
    <users>
        <user>
            <username>admin</username>
            <password>adminpass</password>
        </user>
        <user>
            <username>john</username>
            <password>johnpass</password>
        </user>
    </users>
    '''
    root = etree.fromstring(xml_data)

    # Vulnerable to XPath Injection
    xpath_query = f"//user[username='{username}']/password"
    result = root.xpath(xpath_query)

    return result[0].text if result else None

If an attacker inputs admin' or '1'='1, the query could be manipulated to access sensitive data from all users.

Remediation

To protect applications from XPath Injection vulnerabilities, apply the following remediation strategies:

  • Input validation and canonicalization: Rigorously validate and canonicalize all user inputs before using them in XPath expressions to reduce the risk of injection attacks.

  • Parameterized XPath queries: Utilize libraries or techniques that support parameterized XPath queries where possible. Although not as widely available as parameterized SQL queries, parameterization helps isolate user inputs from query logic.

  • Escaping user input: Properly escape special characters in user inputs that are used within XPath expressions to prevent them from altering the query. For example, replacing ' (quote) with &apos;, its encoded version. Please note that escaping alone is not recommended and should be used in conjunction with other measures.

Configuration

The detector has the following configurable parameters:

  • sources, that indicates the source kinds to check.

  • neutralizations, that indicates the neutralization kinds to check.

Unless you need to change the default behavior, you typically do not need to configure this detector.

References

  • CWE-643 : Improper Neutralization of Data within XPath Expressions ('XPath Injection').

  • OWASP Top 10 2021 - A03 : Injection.