Sanitize Input for Reflected/Stored XSS

There’s a reason why XSS vulnerabilities are so common in the wild: they’re difficult to get rid of. Sanitizing sounds simple in principle, but escaping and disallowing characters can get complicated quickly. Let’s look at various rules from the OWASP XSS Prevention Cheat Sheet,[68] which you should keep in mind when building your site.

But first, a small test: in the following code example there’s an HTML document—actually, an Embedded JavaScript[69] (EJS) template. Do you know where you could in theory put unsafe content and where you should never put unsafe content?

 <!DOCTYPE html>
 <html>
 <head lang=​"en"​>
  <meta charset=​"UTF-8"​>
  <title>My XSS</title>
 <!--<%- 1 %>-->
  <style>
  body {
  color: #000077;
  font-size: ​<%​- 2 ​%>​;
  }
  <%- 3 %>
  </style>
 </head>
 <body>
  <nav>
 <<​%- 4 %> href="/second">Second page>
  <a href=​"/third?x=<%- 5 %>"​>Third page</a>
  </nav>
  <div>
  <div>​<​%- 6 %></div>
  <input ​<%​- 7 ​%​>="nice" value="​<​%- 8 %>" />
 
  <button onclick=​"<%- 9 %>"​>Touch me</button>
  </div>
  <script>
 var​ x = ​'<%- 10 %>'​;
  <%- 11 %>
  </script>
 </body>
 </html>

Did you find all of them? Are you confident? If not, then keep reading.

It turns out there are some locations in an HTML document where sanitizing is so difficult that you’d be better off avoiding them entirely. Unless, of course, you want attackers to target your customers.

 <!DOCTYPE html>
 <html>
 <head lang=​"en"​>
  <meta charset=​"UTF-8"​>
  <title>My XSS</title>
<!--<%- 1 %>-->​ ​<!---->
  <style>
  body {
  color: #000077;
  font-size: large;
  }
  <%- 3 %> <!--<callout id=​"​xss.style​"​/>-->
  </style>
 </head>
 <body>
  <nav>
<<​%- 4 %> href="/second">Second page>​<!---->
  <a href=​"/third?x=1"​>Third page</a>
  </nav>
  <div>
  <div>Labeling</div>
<input ​<%​- 7 ​%​>="nice" />​<!---->
 
  <button onclick=​"alert('why')"​>Touch me</button>
  </div>
  <script>
 var​ x = ​'y'​;
  <%- 11 %> ​<!--​<callout id=​"xss.script"​/>-->
  </script>
 </body>
 </html>

Inside HTML comments (<%- 1 %>)

Directly inside style attribute (<%- 3 %>)

As a tag name (<%- 4 %>)

As an attribute name (<%- 7 %>)

Directly inside a script attribute (<%- 11 %>)

By avoiding these locations you give your website a fighting chance against XSS. Now let’s look at where you potentially can put unsafe data without causing too much harm:

 <!DOCTYPE html>
 <html>
 <head lang=​"en"​>
  <meta charset=​"UTF-8"​>
  <title>My XSS</title>
 <!--Comment-->
  <style>
  body {
  color: #000077;
  font-size: ​<%​- 2 ​%>​; ​<!--<callout​ ​id="xss.css"/>-->
  }
  </style>
 </head>
 <body>
  <nav>
  <a href=​"/second"​>Second page</a>
<a href=​"/third?x=<%- 5 %>"​>Third page</a>​<!---->
  </nav>
  <div>
<div>​<​%- 6 %></div>​<!---->
<input value=​"<%- 8 %>"​ />​<!---->
 
<button onclick=​"<%- 9 %>"​>Touch me</button>​<!---->
  </div>
  <script>
 var​ x = ​'<%- 10 %>'​;​<!--​<callout id=​"xss.js2"​/>-->
  </script>
 </body>
 </html>

As CSS values (<%- 2 %>)

As URL parameters (<%- 5 %>)

Inside HTML elements (<%- 6 %>)

Inside common quoted HTML attributes (<%- 8 %>)

Inside JavaScript data values in attributes (<%- 9 %>)

Inside JavaScript data values in script elements (<%- 10 %>)

All of these locations require their own specific form of sanitizing, so we’ll go over them one by one. And to be safe you should avoid any other locations not mentioned here unless you do thorough research first and confirm it’s okay.

Let’s start sanitizing!

node-esapi

images/aside-icons/info.png

Due to my inability to find context-specific escaping libraries for Node, I’ve ported the ESAPI4JS (Enterprise Security API for JavaScript) encoder module. This module is called ‘node-esapi‘.[70] ESAPI4JS was developed by OWASP and implements the escape rules described in this chapter. As such we’ll be using it as our sanitizing library in the examples.

Rule 1: Escape untrusted data inserted into HTML element content.

When you insert data into an HTML body, you have to HTML escape it. This includes normal tags as well, such as div, p, b, and section. Some template engines like jade[71] do this automatically. However, this is absolutely not sufficient for other HTML contexts, and you have to be certain of your template engine if you want to rely on it to handle encoding automatically:

 <body>...CAN PUT HTML ESCAPED DATA HERE...</body>
 <div>...CAN PUT HTML ESCAPED DATA HERE...</div>
 etc…

You can do this with the ESAPI library:

 ESAPI.encoder().encodeForHTML(untrustedData);

HTML escaping means that you escape the five characters important for XML[72]&, <, >, ", ’—and also the forward slash, /, because it helps end HTML elements. You can use the following conversion table:

 &​ → &amp;
 <​ → &lt;
 > → &gt;
 " → &quot;
 ' → &#x27; &apos; not recommended because it's not in the HTML spec.
 / → &#x2F;

Rule 1.1: Sanitize HTML markup with a library designed for the job.

When your application lets users enter HTML content, you can’t just trust it, but you can’t simply use encoding because it would break the HTML. Use the library designed for the task.

Several different modules in Node.js were written specifically for this purpose; I will highlight two of them:

  • Bleach[73]: designed for easy HTML sanitizing. It supports both whitelist and blacklist sanitizing and has other options as well. Unfortunately this module hasn’t been updated for over a year.
  • Sanitizer[74]: a port of the Caja-HTML-Sanitizer.[75] It’s a thorough HTML sanitizer developed by Google that also supports various options.

Rule 2: Escape untrusted data inserted into HTML attributes.

When you insert untrusted data into common HTML attributes like value, width, and name, you have to encode accordingly. Surround the attribute value with either single or double quotes:

 <!-- inside single quoted attribute -->
 <div attr=​'...CAN PUT ATTRIBUTE ESCAPED DATA HERE...'​>content</div>
 
 <!-- inside double quoted attribute -->
 <div attr=​"...CAN PUT ATTRIBUTE ESCAPED DATA HERE..."​>content</div>

The following shows how to apply the rule with the ESAPI library:

 ESAPI.encoder().encodeForHTMLAttributes(untrustedData);

When escaping for HTML attributes you need to escape all characters, except for alphanumeric characters, with ASCII values less than 256 with the &#xHH; format (or a named entity if available) to prevent switching out of the attribute.

The reason this rule is so broad is that developers frequently leave attributes unquoted. Properly quoted attributes can only be escaped with the corresponding quote. Unquoted attributes, however, can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and |.

This rule does not cover complex attributes like href, src, style or any event handler like onclick. Event handler attributes follow rule 3.

Rule 3: Escape untrusted data inserted into JavaScript data values.

This rule applies to dynamically created JavaScript code—both script blocks and event handlers. The only place to put data in this case is in the quoted data values. Any other JavaScript context is dangerous—it’s easy to switch execution context, because there are many characters that allow the attacker to do so:

 <!-- We might expect a string -->
 <script>alert(<%- userValue %>)</script>
 
 <!-- And instead get -->
 <script>alert(confirm(​'have you been xssd?'​))</script>

Always quote data values because it drastically limits the possible context escape values attackers could use:

 <!-- inside a quoted string -->
 <script>alert(​'...CAN PUT JAVASCRIPT ESCAPED DATA HERE...'​)</script>
 
 
 
 <!-- one side of a quoted expression -->
 <script>x=​'...CAN PUT JAVASCRIPT ESCAPED DATA HERE...'​</script>
 
 <!-- inside quoted event handler -->
 <div onclick=​"x='...CAN PUT JAVASCRIPT ESCAPED DATA HERE...'"​​</​div>

Some JavaScript functions can never safely use untrusted data as input, as shown here:

 window.setInterval(​'...EVEN IF YOU ESCAPE UNTRUSTED DATA YOU ARE XSSED HERE...'​);

You can encode for JavaScript using the ESAPI library:

 ESAPI.encoder().encodeForJS(untrustedData);
 //or
 ESAPI.encoder().encodeForJavaScript(untrustedData);
 //or
 ESAPI.encoder().encodeForJavascript(untrustedData);

When escaping for JavaScript you need to escape all characters, except for alphanumeric characters less than 256, with the &#xHH; format to prevent switching out of the data value into the script context or into another attribute.

Do not use any escaping shortcuts like \" because the quote character will wind up being matched by the HTML attribute parser, which runs first. Escaping shortcuts are also susceptible to escape-the-escape attacks, where the attacker sends \" and the vulnerable code turns that into \\" to enable the quote.

If an event handler is properly quoted, breaking out requires you to have the corresponding quote. This rule is intentionally broad because event handler attributes are often left unquoted. Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and |.

Also, a </ script> closing tag will close a script block even though it’s inside a quoted string because the HTML parser runs before the JavaScript parser.

Rule 3.1: Escape JSON values in an HTML context and read the data with JSON.parse.

In Web 2.0 applications you often generate data by the application and transfer it through JSON. The data can be received with AJAX calls, but that’s not always efficient. You often load an initial block of JSON on the page to act as the base data. Let’s look at how you can do this securely.

First of all, when asking for JSON data from the server, ensure that the HTTP Content-Type header is set correctly to application/json so that the browser doesn’t accidentally try to interpret the content as HTML.

With express, this is handled automatically when you send a response with res.json instead of res.send, because it sets the header internally:

 app.get(​'/json'​, ​function​ (req, res) {
  res.json({my:​'awesome JSON'​});
 });

A common anti-pattern when serving JSON as part of the original HTML looks like the following:

 <script>
 var​ initData = <%- JSON.stringify(data) %>;
 // WARNING! This is not a recommended approach as it
 // is vulnerable without proper escaping
 </script>

The problem with this approach is that it’s possible to change the execution context, because the HTML interpreter runs before the JavaScript interpreter. Instead, I recommend that you separate the server-side data without breaching context barriers. Place JSON into HTML as a normal element and then use JavaScript to parse the contents:

 <script id=​"init_data"​ type=​"application/json"​>
  <%= ESAPI.encoder().encodeForHTML(JSON.stringify(data)) %>
 </script>
 <script>
 var​ dataElement = document.getElementById(​'init_data'​);
 var​ jsonText = dataElement.textContent || dataElement.innerText
 
 // Always use JSON.parse instead of eval
 var​ initData = JSON.parse(jsonText);
 </script>

Rule 4: Escape and validate untrusted data inserted into CSS property values.

Although it might not seem like it, CSS (Cascading Style Sheets) can be used as an XSS attack vector because CSS can execute and include scripts. Here’s an example of how CSS is used in an attack:

 { background-url : ​"javascript:alert(1)"​; } // and all other URLs
 { text-size: ​"expression(alert('XSS'))"​; } // only in IE

When you use untrusted data to construct CSS or set style properties on elements, make sure you perform proper validation checks. Don’t use untrusted data for anything other than property values. Don’t put untrusted data into complex property values such as url and behavior. I suggest also avoiding the Internet Explorer–specific expression property since it allows JavaScript.

 <style>selector { property : ...CAN PUT CSS ESCAPED DATA HERE...; } </style>
 
 <style>selector { property : ​"...CAN PUT CSS ESCAPED DATA HERE..."​; } </style>
 
 <span style=​"property : ...CAN PUT CSS ESCAPED DATA HERE..."​>text</span>

Even if you escape CSS, you still have to ensure all URLs start with http: or https: and not with javascript:. Property values should never start with expression.

Here’s the same example, using the ESAPI library:

 ESAPI.encoder().encodeForCSS(untrustedData);

When escaping for CSS, remember to escape all characters, except for alphanumeric characters, with ASCII values less than 256 with the &#xHH; escaping format. As mentioned in a previous rule, do not use any escaping shortcuts like " because the quote character may be matched by the HTML attribute parser instead. These shortcuts are also susceptible to escape-the-escape attacks where \" turns into \\".

If an attribute is quoted, breaking out requires the corresponding quote. All attributes should be quoted, but your encoding should be strong enough to prevent XSS when untrusted data is placed in unquoted contexts.

Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and |. Also, the </ style> tag will close the style block even though it’s inside a quoted string because the HTML parser runs before the CSS parser.

Please note that aggressive CSS encoding and validation are recommended to prevent XSS attacks for both quoted and unquoted attributes.

Rule 5: Escape untrusted data inserted into HTML URL parameter values.

This is one of the easiest rules to apply. When you want to put data into HTTP GET parameters, URL escape it!

 <a href=​"http://www.somesite.com?test=...CAN PUT URL ESCAPED DATA HERE..."​>link</a>

You can do this easily with the ESAPI library:

 ESAPI.encoder().encodeForURL(untrustedData);

When escaping for URL, escape all characters, except for alphanumeric characters, with ASCII values less than 256 with the %HH escaping format. Don’t include untrusted data in data: URLs because there’s no good way to disable those attacks with escaping.

All attributes should be quoted. Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and |. Note that entity encoding is useless in this context.

Be careful with URL encoding and relative URLs. If the user input is meant to be placed into href or src or other URL-based attributes, then it should be validated beforehand to make sure it doesn’t point to an unexpected protocol or script file. After that, encode URLs based on context, like all other data.

For example, when inserting into a href attribute, you attribute encode it.

How All the Rules Come Together

To avoid XSS attacks when rendering templates on the server side, you should always be careful with unsafe content. You must encode depending on the location where the content is being inserted. Using the wrong encoding format doesn’t help you, and there’s no one-size-fits-all rule that you can apply. Look at the following example to see how you can combine all the methods in one place:

 <!DOCTYPE html>
 <html>
 <head lang=​"en"​>
  <meta charset=​"UTF-8"​>
  <title>My XSS</title>
 <!--This is going to be great-->
 <​% var E = ESAPI.encoder() %>
  <style>
  body {
  color: #000077;
  font-size: ​<%​- E.encodeForCSS(unsafe); ​%>;
  }
  </style>
 </head>
 <body>
  <nav>
  <a href=​"/second?x=<%- E.encodeForURL(unsafe); %>"​>Second page</a>
  </nav>
  <div>
  <h1>​<​%- E.encodeForHTML(unsafe); %></h1>
  <input value=​"<%- E.encodeForHTMLAttributes(unsafe); %>"​ />
 
  <button onclick=​"<%- E.encodeForJS(unsafe); %>"​>Touch me</button>
  </div>
  <script id=​"json"​ type=​"application/json"​>
  <%- E.encodeForHTML(JSON.stringify(data)) %>
  </script>
 
  <script>
 var​ x = ​'<%- E.encodeForJS(unsafe); %>'​;
 var​ json = JSON.parse(document.getElementById(​'json'​).innerHTML);
  </script>
 </body>
 </html>

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.237.77