Improper neutralization of input during web page generation ('Cross-site Scripting' aka 'XSS')

javascript.cross_site_scripting

Severity

critical

Resource

Injection

Language

JavaScript

Description

Improper neutralization of input during web page generation ('Cross-site Scripting' aka 'XSS').

Cross-Site Scripting is a prevalent web application vulnerability that allows attackers to inject malicious scripts into content delivered to other users. These scripts can hijack user sessions, deface websites, or redirect users to malicious sites. XSS commonly arises when an application takes user input, incorporates it into dynamic content served to clients, and fails to sanitize or escape this input correctly.

There are different kinds of XSS. The kind relevant for this check is Reflected XSS, where the attacker causes the victim to supply malicious content to a vulnerable web application, which renders HTML content embedding a malicious script executed in the victim’s browser. A variant is named DOM-based XSS, where the vulnerable software does not generate content depending on user input but includes script code that uses user-controlled input.

Rationale

A XSS vulnerability happens when untrusted input ends in a place where it is evaluated as HTML or JavaScript code, without proper sanitization. This means that the attacker’s chosen input can be executed by the victim’s browser when the page is rendered by the vulnerable application.

The following code is a rather trivial example of a reflected XSS vulnerability:

let userInput = document.getElementById('my_text').value;
// ...
document.write("<div>" + userInput + "</div>"); // FLAW

The following code is an example of a server-side XSS vulnerability in an Node express application:

const express = require('express');
const app = express();
// ...
app.get('/user', (req, res) => {
  const uname = req.param('user');
  // ...
  res.send('<p>Profile updated, user ' + uname + '</p>'); // FLAW
})

Remediation

For the reflected XSS vulnerability, you may create the DOM element before embedding user input in the textContent property, with HTML-encodes the input:

let userInput = document.getElementById('my_text').value;
// ...
let container = document.createElement('div');
let sanitizedInput = document.createTextNode(userInput).textContent;
container.textContent = sanitizedInput;
document.body.appendChild(container);

For the server-side XSS vulnerability, you may sanitize the input using a proven library like escape-html:

const express = require('express');
const app = express();
const escapeHtml = require('escape-html'); // your mileage may vary

// ...
app.get('/user', (req, res) => {
  const uname = req.param('user');
  const safeUname = escapeHtml(uname); // Encode the user-provided data
  // ...
  res.send('<p>Profile updated, user ' + safeUname + '</p>'); // Safe
});

Xygeni does not endorse any particular library for sanitization. A JavaScript Anti-XSS sanitizer can be chosen among the following libraries:

DOMPurify - XSS sanitizer for HTML, MathML and SVG.
xss - Sanitize untrusted HTML (to prevent XSS) with a configuration specified by a Whitelist.
xss-filters - Secure XSS Filters.
escape-html - Escape string for use in HTML.

Follow the recommendations given by OWASP in Cross-Site Scripting Prevention Cheat Sheet.

Output Encoding:

The best technique to protect against XSS is contextual output encoding: encode data written to HTML documents, but as XSS exploitation techniques vary by HTML context, each context has a specific encoding to prevent JavaScript code from being interpreted. In essence, consider the HTML page to be rendered as a template, with 'slots' where a developer is allowed to put untrusted data, escaping properly untrusted data according to context-specific rules before placing it in each 'slot'.

The following table summarizes the encoding to apply for each HTML context:

Context Encoding Example

Context	Encoding	Example
Text nested in tags	HTML entity encoding: & → &, < → > > → >, " → ", ' → '	`<div>{DATA}</div>`
In attribute, but not holding URL or JavaScript.	HTML entity encoding, as above, using \&#xHH. Always put the attribute value between quotes!	`<div class="{DATA}"> … </div>`
In an URL attribute	Validate (whitelist) the URL. Use HTML attribute encoding, as above. For the query string use URL encoding before HTML attribute encoding.	`<a href="{DATA}"> … </a>`
In a JavaScript attribute	DANGEROUS. Avoid if possible.	`<script src="{DATA}"> … </script>` `<div onclick="{DATA}"> … </div>"`
In inline CSS	Safe only with dynamic CSS property value	`<div style="prop: {DATA}"> … </div>` `<style> selector { prop: "{DATA}" } </style>`
In script block	DANGEROUS. Avoid if possible. Use predefined JavaScript functions and choose functions to render based on user input. Use `data-*` attributes to pass data to the script: `<script data-x="{DATA}">`	`<script>{DATA}</script>`

Text nested in tags

HTML entity encoding: & → &, < → > > → >, " → ", ' → '

<div>{DATA}</div>

In attribute, but not holding URL or JavaScript.

HTML entity encoding, as above, using \&#xHH. Always put the attribute value between quotes!

<div class="{DATA}"> … </div>

In an URL attribute

Validate (whitelist) the URL. Use HTML attribute encoding, as above. For the query string use URL encoding before HTML attribute encoding.

<a href="{DATA}"> … </a>

In a JavaScript attribute

DANGEROUS. Avoid if possible.

<script src="{DATA}"> … </script>

<div onclick="{DATA}"> … </div>"

In inline CSS

Safe only with dynamic CSS property value

<div style="prop: {DATA}"> … </div>

<style> selector { prop: "{DATA}" } </style>

In script block

DANGEROUS. Avoid if possible. Use predefined JavaScript functions and choose functions to render based on user input. Use data-* attributes to pass data to the script: <script data-x="{DATA}">

<script>{DATA}</script>

There are inherently dangerous contexts where output encoding is too complex and error-prone: JavaScript dynamic code, as within <`<script>` body, CSS dynamic code, as within <style> body, JavaScript event handlers onXYX, HTML comments. No output encoding should be tried, and only strict input validation with a limited set of values allowed should be attempted.

If templates are used for content rendering, the templates engine chosen could not escape correctly untrusted data, for preventing XSS.

Input Validation:

Input validation here is best understood as a complementary defense-in-depth strategy, particularly when the input type / format is known. A whitelist-approach is recommended. For dangerous contexts, strict input validation is the unique option.

Content Security Policy (CSP):

An allowlist that prevents content being loaded. It is easy to make mistakes with the implementation so it should not be your primary defense mechanism.

Most browsers could limit the damage via security restrictions (e.g. 'same origin policy'), but users generally allow scripting languages (e.g. JavaScript) in their browsers (disabling JavaScript severely limits a web site).

Web Application Firewalls (WAF):

While WAFs provide some XSS protection, they should be considered a complementary layer to output encoding and input validation within the application itself.

Configuration

The detector has the following configurable parameters:

sources, that indicates the source kinds to check.
neutralizations, that indicates the neutralization kinds to check.

Unless you need to change the default behavior, you typically do not need to configure this detector.

References

CWE-79 : Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting').
OWASP Top 10 2021 - A03 : Injection.
OWASP Cross Site Scripting Prevention Cheat Sheet