Regular Expression DoS

ID

php.redos

Severity

critical

Resource

Resource Management

Language

Php

Tags

CAPEC:492, CWE:1333, NIST.SP.800-53, PCI-DSS:6.5.6

Description

Potential denial-of-service attack through malicious regular expression ('ReDoS').

Most regular expression engines use non-deterministic Finite Automaton (NFA) with backtracking, to try all possible execution paths of the regular expression when evaluating an input. In some cases this can lead to a situation called catastrophic backtracking. In the worst case, the complexity of the regular expression is exponential in the size of the input, meaning that a small crafted input (such as 20 characters) can trigger catastrophic backtracking and cause a denial of service to the application.

While these regular expression engines can quickly confirm a positive match, confirming a negative match can take much longer.

Rationale

This detector analyzes the runtime complexity of a regular expression, checking if catastrophic backtracking is possible.

Obviously, an attacker with total or partial control over a regular expression may carry out a denial-of-service (DoS) attack, by adding vulnerable patterns and providing the crafted input to be matched by the regular expression. This results in a Denial of Service attack.

But even if the attacker does not have control over the regular expression, vulnerable regular expressions can also be exploited using malicious input to launch a catastrophic backtracking attack.

Consider the following PHP example:

<?php
// User-controlled input
$input = $_GET['input'];

// Vulnerable regex pattern
if (preg_match('/.*.*=.*;/', $input)) {
    echo "Match found.";
} else {
    echo "No match.";
}
?>

Remediation

The following ideas can be used when facing a ReDoS vulnerability, to reduce the risk of a DoS attack exploiting catastrophic backtracking:

  • Do not make the regular expression pattern dependent on untrusted input. If an attacker can choose the regexp at will, he can probably launch a DoS attack. Use quoting for including user-controlled input if needed.

  • Even with literal regexp patterns, an attacker may control the string to match, and care should be taken with nested quantifiers such as \* and +. Follow the hints in "catastrophic backtracking" about how to rewrite a vulnerable regexp pattern to make it safe.

    • Avoid, if possible, nested quantifiers (repeated or alternated tokens inside a group that is itself repeated/alternated), like (T1+T2+)+T3. It is better to split into groups and then process each group in a second pass, than to use overly complex regular expressions with nested quantifiers.

    • When available, possessive quantifiers (*+, ++ and {n, m}+) significantly improve performance when the regex match fails, and no backtracking is needed.

    • The (?>subexp) element (atomic grouping) disables backtracking.

    • If possible, define a maximum number of expected repetitions using the bounded quantifiers {n, m}, like {1,3} instead of +.

    • Use negated character classes excluding separators, instead of .* to match tokens with separators. For example, the pattern .*_.* can be improved by changing it to [^_]*_.*

  • When creating a regex follow the advice: "if it’s going to fail, make it fail as quickly as possible". You can use features like atomic groups and possessive quantifiers for that, but if you get the structure of the regex right, you rarely need them. Backtracking is not inevitable.

Configuration

The detector has the following configurable parameters:

  • sources, that indicates the source kinds to check.

  • neutralizations, that indicates the neutralization kinds to check.

Unless you need to change the default behavior, you typically do not need to configure this detector.

References