Regex Injection

ID

python.regex_injection

Severity

critical

Resource

Injection

Language

Python

Tags

CWE:1333, CWE:625, NIST.SP.800-53, OWASP:2004:A9, PCI-DSS:6.5.1

Description

Improper neutralization of external input used in a regular expression ('Regex Injection').

Regular Expression (Regex) Injection arises when input from users is directly concatenated into a regex pattern without validation or sanitization. This can allow an attacker to manipulate the pattern used for searching or matching, potentially bypassing validation or security checks.

Rationale

Regular expressions are a powerful but complex tool in programming languages like Python. When user input is interpolated directly into regex patterns, it opens the door to regex injection. This can result in ReDoS (Regular Expression Denial of Service), where specially crafted input causes excessive backtracking and consumes significant resources, or even unintended matches or logic bypass.

Python’s re module allows compiling regex patterns from strings, and if any portion of this string is built using unsanitized user input, the risk of regex injection arises.

For example:

import re

# BAD: direct user input in regex pattern
user_input = input("Enter a pattern: ")
pattern = re.compile(user_input)
match = pattern.match("some test string")

If a malicious user enters something like ^(a+)+$, this can cause catastrophic backtracking, particularly when applied to a long string of 'a' characters. This is a classic ReDoS vector.

Similarly, constructing patterns with string interpolation is risky:

search_term = input("Enter search term: ")
regex = re.compile(f"^{search_term}$")  # vulnerable to injection

Here, characters like .*, (), or [a-z] are interpreted as regex metacharacters, which may change the intended logic or performance characteristics of the expression.

Remediation

To remediate Regular Expression Injection vulnerabilities, you can use the following strategies:

  1. Escape User Input: Always escape user input to be included in regex patterns, using the library’s 'regexp quoting' functions.

  2. Use Strict Validation: Implement strict validation on input prior to its inclusion in a regex pattern. Ensuring that input only contains expected characters significantly lowers the risk of injection.

  3. Predefine Regular Expressions: Avoid dynamically building regex patterns based on user input. Instead, use predefined patterns that safely incorporate user input as part of the matching process.

  4. Input Filtering: Apply input filtering to remove known harmful characters or sequences from user input before processing them in regex operations.

  5. Security Tools and Static Analysis: Utilize SAST tools to identify regex injection vulnerabilities during the development process. These tools can provide early warnings about unsafe patterns or input concatenations.

By following these measures, you not only prevent Regular Expression Injection but also ensure that your application remains robust and secure against a variety of input-based exploits.

Configuration

The detector has the following configurable parameters:

  • sources, that indicates the source kinds to check.

  • neutralizations, that indicates the neutralization kinds to check.

Unless you need to change the default behavior, you typically do not need to configure this detector.

References