Improper neutralization of input during web page generation ('Cross-site Scripting' aka 'XSS')
ID |
python.cross_site_scripting |
Severity |
critical |
Resource |
Injection |
Language |
Python |
Tags |
CWE:79, NIST.SP.800-53, OWASP:2021:A3, PCI-DSS:6.5.7 |
Description
Improper neutralization of input during web page generation ('Cross-site Scripting' aka 'XSS').
Cross-Site Scripting is a prevalent web application vulnerability that allows attackers to inject malicious scripts into content delivered to other users. These scripts can hijack user sessions, deface websites, or redirect users to malicious sites. XSS commonly arises when an application takes user input, incorporates it into dynamic content served to clients, and fails to sanitize or escape this input correctly.
There are different kinds of XSS. The kind relevant for this check is Reflected XSS, where the attacker causes the victim to supply malicious content to a vulnerable web application, which renders HTML content embedding a malicious script executed in the victim’s browser. A variant is named DOM-based XSS, where the vulnerable software does not generate content depending on user input but includes script code that uses user-controlled input. |
Rationale
A XSS vulnerability happens when untrusted input ends in a place where it is evaluated as HTML or JavaScript code, without proper sanitization. This means that the attacker’s chosen input can be executed by the victim’s browser when the page is rendered by the vulnerable application.
Consider a typical Flask application vulnerable to reflected XSS:
from flask import Flask, request
app = Flask(__name__)
@app.route('/greet')
def greet_user():
user_name = request.args.get('name')
return f"<h1>Hello, {user_name}!</h1>" # FLAW
if __name__ == "__main__":
app.run()
In this example, user input is directly inserted into HTML without sanitization, leading to potential XSS attacks if malicious scripts are passed through the name
parameter.
Remediation
Follow the recommendations given by OWASP in Cross-Site Scripting Prevention Cheat Sheet.
Output Encoding:
The best technique to protect against XSS is contextual output encoding: encode data written to HTML documents, but as XSS exploitation techniques vary by HTML context, each context has a specific encoding to prevent JavaScript code from being interpreted. In essence, consider the HTML page to be rendered as a template, with 'slots' where a developer is allowed to put untrusted data, escaping properly untrusted data according to context-specific rules before placing it in each 'slot'.
The following table summarizes the encoding to apply for each HTML context:
Context | Encoding | Example |
---|---|---|
Text nested in tags |
HTML entity encoding: & → &, < → > > → >, " → ", ' → ' |
|
In attribute, but not holding URL or JavaScript. |
HTML entity encoding, as above, using \&#xHH. Always put the attribute value between quotes! |
|
In an URL attribute |
Validate (whitelist) the URL. Use HTML attribute encoding, as above. For the query string use URL encoding before HTML attribute encoding. |
|
In a JavaScript attribute |
DANGEROUS. Avoid if possible. |
|
In inline CSS |
Safe only with dynamic CSS property value |
|
In script block |
DANGEROUS. Avoid if possible. Use predefined JavaScript functions and choose functions to render based on user input.
Use |
|
There are inherently dangerous contexts where output encoding is too complex and error-prone: JavaScript dynamic code, as within <`<script>` body, CSS dynamic code, as within <style>
body, JavaScript event handlers onXYX
, HTML comments. No output encoding should be tried, and only strict input validation with a limited set of values allowed should be attempted.
If templates are used for content rendering, the templates engine chosen could not escape correctly untrusted data, for preventing XSS. |
Input Validation:
Input validation here is best understood as a complementary defense-in-depth strategy, particularly when the input type / format is known. A whitelist-approach is recommended. For dangerous contexts, strict input validation is the unique option.
Content Security Policy (CSP):
An allowlist that prevents content being loaded. It is easy to make mistakes with the implementation so it should not be your primary defense mechanism.
Most browsers could limit the damage via security restrictions (e.g. 'same origin policy'), but users generally allow scripting languages (e.g. JavaScript) in their browsers (disabling JavaScript severely limits a web site).
Web Application Firewalls (WAF):
While WAFs provide some XSS protection, they should be considered a complementary layer to output encoding and input validation within the application itself.
The sanitized version of the previous code snippet looks like this:
from flask import Flask, request, escape
app = Flask(__name__)
@app.route('/greet')
def greet_user():
user_name = request.args.get('name')
safe_user_name = escape(user_name)
return f"<h1>Hello, {safe_user_name}!</h1>"
if __name__ == "__main__":
app.run()
Configuration
The detector has the following configurable parameters:
-
sources
, that indicates the source kinds to check. -
neutralizations
, that indicates the neutralization kinds to check.
Unless you need to change the default behavior, you typically do not need to configure this detector.
References
-
CWE-79 : Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting').
-
OWASP Top 10 2021 - A03 : Injection.