Improper neutralization of input during web page generation ('Cross-site Scripting' aka 'XSS')

python.cross_site_scripting

Severity

critical

Resource

Injection

Language

Python

Description

Improper neutralization of input during web page generation ('Cross-site Scripting' aka 'XSS').

Cross-Site Scripting is a prevalent web application vulnerability that allows attackers to inject malicious scripts into content delivered to other users. These scripts can hijack user sessions, deface websites, or redirect users to malicious sites. XSS commonly arises when an application takes user input, incorporates it into dynamic content served to clients, and fails to sanitize or escape this input correctly.

There are different kinds of XSS. The kind relevant for this check is Reflected XSS, where the attacker causes the victim to supply malicious content to a vulnerable web application, which renders HTML content embedding a malicious script executed in the victim’s browser. A variant is named DOM-based XSS, where the vulnerable software does not generate content depending on user input but includes script code that uses user-controlled input.

Rationale

A XSS vulnerability happens when untrusted input ends in a place where it is evaluated as HTML or JavaScript code, without proper sanitization. This means that the attacker’s chosen input can be executed by the victim’s browser when the page is rendered by the vulnerable application.

Consider a typical Flask application vulnerable to reflected XSS:

from flask import Flask, request

app = Flask(__name__)

@app.route('/greet')
def greet_user():
    user_name = request.args.get('name')
    return f"<h1>Hello, {user_name}!</h1>" # FLAW

if __name__ == "__main__":
    app.run()

In this example, user input is directly inserted into HTML without sanitization, leading to potential XSS attacks if malicious scripts are passed through the name parameter.

Remediation

Follow the recommendations given by OWASP in Cross-Site Scripting Prevention Cheat Sheet.

Output Encoding:

The best technique to protect against XSS is contextual output encoding: encode data written to HTML documents, but as XSS exploitation techniques vary by HTML context, each context has a specific encoding to prevent JavaScript code from being interpreted. In essence, consider the HTML page to be rendered as a template, with 'slots' where a developer is allowed to put untrusted data, escaping properly untrusted data according to context-specific rules before placing it in each 'slot'.

The following table summarizes the encoding to apply for each HTML context:

Context Encoding Example

Context	Encoding	Example
Text nested in tags	HTML entity encoding: & → &, < → > > → >, " → ", ' → '	`<div>{DATA}</div>`
In attribute, but not holding URL or JavaScript.	HTML entity encoding, as above, using \&#xHH. Always put the attribute value between quotes!	`<div class="{DATA}"> … </div>`
In an URL attribute	Validate (whitelist) the URL. Use HTML attribute encoding, as above. For the query string use URL encoding before HTML attribute encoding.	`<a href="{DATA}"> … </a>`
In a JavaScript attribute	DANGEROUS. Avoid if possible.	`<script src="{DATA}"> … </script>` `<div onclick="{DATA}"> … </div>"`
In inline CSS	Safe only with dynamic CSS property value	`<div style="prop: {DATA}"> … </div>` `<style> selector { prop: "{DATA}" } </style>`
In script block	DANGEROUS. Avoid if possible. Use predefined JavaScript functions and choose functions to render based on user input. Use `data-*` attributes to pass data to the script: `<script data-x="{DATA}">`	`<script>{DATA}</script>`

Text nested in tags

HTML entity encoding: & → &, < → > > → >, " → ", ' → '

<div>{DATA}</div>

In attribute, but not holding URL or JavaScript.

HTML entity encoding, as above, using \&#xHH. Always put the attribute value between quotes!

<div class="{DATA}"> … </div>

In an URL attribute

Validate (whitelist) the URL. Use HTML attribute encoding, as above. For the query string use URL encoding before HTML attribute encoding.

<a href="{DATA}"> … </a>

In a JavaScript attribute

DANGEROUS. Avoid if possible.

<script src="{DATA}"> … </script>

<div onclick="{DATA}"> … </div>"

In inline CSS

Safe only with dynamic CSS property value

<div style="prop: {DATA}"> … </div>

<style> selector { prop: "{DATA}" } </style>

In script block

DANGEROUS. Avoid if possible. Use predefined JavaScript functions and choose functions to render based on user input. Use data-* attributes to pass data to the script: <script data-x="{DATA}">

<script>{DATA}</script>

There are inherently dangerous contexts where output encoding is too complex and error-prone: JavaScript dynamic code, as within <`<script>` body, CSS dynamic code, as within <style> body, JavaScript event handlers onXYX, HTML comments. No output encoding should be tried, and only strict input validation with a limited set of values allowed should be attempted.

If templates are used for content rendering, the templates engine chosen could not escape correctly untrusted data, for preventing XSS.

Input Validation:

Input validation here is best understood as a complementary defense-in-depth strategy, particularly when the input type / format is known. A whitelist-approach is recommended. For dangerous contexts, strict input validation is the unique option.

Content Security Policy (CSP):

An allowlist that prevents content being loaded. It is easy to make mistakes with the implementation so it should not be your primary defense mechanism.

Most browsers could limit the damage via security restrictions (e.g. 'same origin policy'), but users generally allow scripting languages (e.g. JavaScript) in their browsers (disabling JavaScript severely limits a web site).

Web Application Firewalls (WAF):

While WAFs provide some XSS protection, they should be considered a complementary layer to output encoding and input validation within the application itself.

The sanitized version of the previous code snippet looks like this:

from flask import Flask, request, escape

app = Flask(__name__)

@app.route('/greet')
def greet_user():
    user_name = request.args.get('name')
    safe_user_name = escape(user_name)
    return f"<h1>Hello, {safe_user_name}!</h1>"

if __name__ == "__main__":
    app.run()

Configuration

The detector has the following configurable parameters:

sources, that indicates the source kinds to check.
neutralizations, that indicates the neutralization kinds to check.

Unless you need to change the default behavior, you typically do not need to configure this detector.

References

CWE-79 : Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting').
OWASP Top 10 2021 - A03 : Injection.
OWASP Cross Site Scripting Prevention Cheat Sheet