Cross-Site Scripting (XSS)

We’ve seen the knock-knock joke principle applied to SQL (SQL injection). Let’s take a look at attacks using that same principle when applied to the HTML and JavaScript in a web page. We call this attack cross-site scripting (or XSS for short) if the attack injects JavaScript. We call it DOM injection if it injects regular HTML.

Let’s continue with the example from earlier in the chapter of a blogging site. One of the most basic requirements is for anyone using the site to be able to read posts written by other users. Suppose a reader writes a blog post such as this:

Dear Diary, Today I read the most wonderful book, Practical Security.

The reader would expect to be able to see this blog post in their browser. But what if instead of a heartwarming blog post like the one above, an attacker wrote this:

Dear Diary, <script>alert(‘Look! A pop-up!’’)</script>

In a naive web application, the contents of this blog post would be concatenated directly into the HTML that makes up the page. So when another user loads this page, part of the HTML that will be loaded by the browser will include this script tag and the browser will dutifully execute this JavaScript. This means that anyone who can author blog posts can author JavaScript that will execute in the browser of anyone else who visits the page. The example we’ve seen is harmless. But with a little imagination, we can think of more malicious payloads. Recall that JavaScript has the full ability to interact with all browser UI widgets such as buttons, links, text boxes, and radio buttons. Batching a few of these interactions together means that JavaScript can be written to do anything that the logged-in user can do. This includes things like authoring new blog posts, changing the password of the logged-in user, deleting posts, adding comments to other posts—anything that a logged-in user can do.

Dynamic data can also be inserted into HTML element attributes like this:

 <img src="picture.jpg" alt="This alt text is supplied by the user." />

Here we have an img tag with alt text that’s supplied by the user. The alt text could be supplied in the query string of the page that loads the img, or it could be read out of the database. In a naive web application, an alt text of this:

blah" onload="alert(’Hello from alt text!’);

would turn into this:

 <img src="picture.jpg" alt="blah" onload="alert('Hello from alt text!');" />

Note that this payload contains the opening double quote for the onload attribute, not the closing one. It relies on the double quote that was intended to close the alt text attribute. This keeps the double quotes balanced and results in valid markup.

The most interesting thing about dynamic data in HTML attributes like alt is that it can lead to XSS without using < or > characters. This is another reason that the primary defense against XSS is HTML encoding, not stripping out suspicious characters.

To illustrate how this vulnerability can be exploited, let’s look at what would happen with alt text like this:

blah" onload="document.getElementById(’submitbutton’).click();

If that were loaded naively into the alt text above, we’d have HTML like this:

 <img src="picture.jpg" alt="blah"
 onload="document.getElementById('submitbutton').click();" />

If this were placed into a page with a button with the ID submitbutton, then this JavaScript will click that button when the image loads. From here, you can see how this approach could be extended to script arbitrary interactions with a web page.

For an interesting case study of what XSS can do, consider the case of the Samy worm.[34] Samy Kamkar, the author of the worm, introduced a little bit of JavaScript onto his home page on Myspace. When a logged-in victim visited Samy’s page, Samy’s JavaScript would execute in the victim’s browser. This JavaScript would programmatically click all the buttons that were required to add Samy as a friend and copy itself onto the victim’s home page. Then, when yet another victim visited the first victim’s home page, they too would add Samy as a friend and copy the JavaScript onto their home page. This worm quickly went viral and in less than a day more than one million friends had been added to Samy’s account.

The beauty of the XSS attack is all the malicious code executes in the victim’s web browser. Every click and key press originate from the victim’s machine, so network logs and access logs all show traffic from the victim’s logged-in machine.

Now let’s consider how we can defend against this. A frequently suggested defense that doesn’t work is to strip out < and > characters. One problem with this defense is that sometimes people need to discuss dangerous inputs. Readers of this book, for example, may want to discuss XSS payloads on a web-based forum. Attempts to strip out < and > would stop these conversations. Also, we’ll see that not every XSS attack needs < or >.

HTML Encoding

Before we look at its application for defense, let’s take a look at how HTML encoding works. In the previous paragraph, we touched on an interesting problem in HTML. We use < and > to make HTML tags in our web pages. But what if HTML tags are what we want to talk about in the content of our web pages? At first glance, it would seem that we can’t do that because writing about tags would insert tags into our HTML documents and the tags themselves wouldn’t be displayed. Fortunately, HTML’s authors thought of this and provided a mechanism for allowing discussions of HTML itself in HTML.

Most of the time, the content of an HTML document will consist of literal characters, which get rendered into exactly the characters that make up the source. So HTML markup like this:

<div>abcdefg</div>

gets rendered like this:

abcdefg

Each character inside the div gets rendered just as it appears in the source.

But there is another kind of character in HTML called a character reference.[35] Character references are rendered differently than they appear in source. Character references play two roles in HTML. One role is that they allow you to create content in non-Western languages even if you’re using a Western keyboard. The second role is that they allow you to create content that displays key HTML characters like &, <, >, and " when rendered by a browser. This second role is exactly what we need to defend ourselves from HTML injection and XSS attacks.

HTML has two kinds of character references: named character references and numeric character references. All HTML character references start with an ampersand and end with a semicolon. Named character references will have a mnemonic in the middle. Numeric character references will have a unicode code point in the middle. The unicode code point can be represented in either hex or decimal. Named character references only exist for a set of the most commonly used characters. Numeric character references exist for each unicode character.

Any character can be encoded this way. Let’s take a look at four examples. In this table, each row shows a rendered character in the leftmost column followed by three different ways of writing the character in the source of an HTML page.

Rendered Character

Named Character

Decimal Numeric Character

Hex Numeric Character

&

&amp;

&#38;

&#x26;

<

&lt;

&#60;

&#x3C;

>

&gt;

&#62;

&#x3E;

&quot;

&#34;

&#x22;

HTML Encoding as Defense

Now that we see how HTML encoding works, we can see how we can use this as a defense against HTML injection and XSS. Whenever we’re building up HTML as part of our response to a web browser, if we ever concatenate in user-controlled data, we need to HTML-encode it first. That way, even if an attacker tries to sneak JavaScript into one of our responses, we’ll encode it first and the browser will just display JavaScript source code to the user instead of executing attacker-controlled JavaScript.

The preferred defense is to use the encoding libraries that come with your web framework. That is, most web frameworks have built-in libraries that will HTML-encode user-supplied data like this:

 <script>alert(​'Ha Ha!'​);<​/script​​>

into this:

 `&lt;script&gt;alert('Ha Ha!');&lt;/script&gt;`

or this:

 &#x3C;script&#x3E;alert(&#x27;Ha Ha!&#x27;);&#x3C;/script&#x3E;

As with the previous example, the solution here is to use our web framework’s HTML encoding library. Proper encoding would result in markup like this:

 <img src="picture.jpg" alt="blah&#x22;
 onload=&#x22;$(&#x27;#submitbutton&#x27;).click();" />

The quotes are replaced by &#x22; so the onload is just part of the alt text instead of a new attribute. The HTML encoding prevents the attack.

Handling Attacker-Controlled Data in Other Contexts

Sometimes XSS payloads don’t look much like textbook XSS payloads if they’re built on top of JavaScript frameworks like AngularJS. For more details on Angular-specific attacks, see the excellent article “XSS without HTML: Client-Side Template Injection with AngularJS” by Gareth Heyes.[36] XSS by way of AngularJS expression injection doesn’t need < or >, so traditional web framework escaping doesn’t help. In general, you shouldn’t need to allow dynamic content inside of a dom element that’s decorated with the ng-app attribute. But if for some strange reason you do, be sure to encode the {{ and }} so that attackers can’t inject an AngularJS expression.

In summary, the way to prevent XSS is to restrict user-controlled data in as few kinds of places as possible in a web page. Keep user-controlled input out of dom elements decorated with the ng-app attribute that marks the start of an Angular JS application. And keep user-controlled data out of JavaScript. If you can do this and keep user-controlled data between HTML tags, then you can definitely prevent XSS by making sure to HTML-encode all user-controlled data.

If you really can’t get away without including dynamic data in other kinds of places in your markup (such as inside JavaScript,) consult the OWASP XSS prevention cheat sheet.[37] There are a lot of surprising gotchas to allowing dynamic data throughout your markup.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.31.159