This chapter is about a language that is easy to learn on the surface, but takes years of intense study to really understand. We are talking about HTML (HyperText Markup Language), the markup language for structuring Web pages. As you will see in the examples in this chapter, mastering HTML from a security point of view—in terms of both attack and defense—is complicated and requires almost encyclopedic knowledge.
This chapter attempts to provide you with that knowledge. In addition to discussing the HTML family and its hidden gems for attackers and trapdoors for defenders, this chapter sheds some light on the differences between the different HTML standards and their actual implementations. So, if you like angle brackets, this chapter is for you. Let us dive in and look at the history and basic elements of HTML and markup languages to get a better understanding of how and where to obfuscate.
History and overview
The idea behind the creation of HTML was to find a platform-independent way to structure and output text and similar data for the Web. Strings can be tricky, and complex data types can generate problems regarding platform independence and interoperability, so there was a need for something in between.
The first implementations of HTML came from Charles Goldfarb, who in 1986 created the IBM GML or DCF GML, the IBM Document Creating Facility Generalized Markup Language, which was later renamed and standardized as SGML, the Standard Generalized Markup Language. The basic elements of this language approach, which were documented in the ISO 8879 standard, comprise six major columns. The following six sections describe these columns.
The document type definition
Document Type Definitions (DTDs) define a document's elements, along with their relationships and properties. We will look more closely at doctypes later in this chapter, and discuss what attackers can do to hide vectors and enable the creation of more vectors in an HTML document.
Table 2.1 provides on overview of the most common doctypes for HTML and HTML-like documents.
As you can see, there are several DTDs for different revisions and subsets of HTML and Extensible Hypertext Markup Language (XHTML). That is because the HTML family had to develop over the years to fit the requirements of the growing World Wide Web (WWW) and other areas of the Internet and document types. One of the major differences between the older HTML standards and the XHTML standards is a reduced limitation regarding the output medium, as we will discuss shortly, HTML is geared toward print output, whereas XHTML was designed to be more open and to deal with almost arbitrary output media.
Table 2.2 highlights the major HTML and XHTML variations we have used in the past and work with today.
Table 2.2 Major HTML and XHTML Standards
Standard | Published | Description |
---|
HTML | November 1992 | The first version; provided some basic text formatting |
HTML+ | November 1993 | Never officially published, but added image support and more HTML extensions |
HTML 2.0 | November 1995 | Provided support for forms and included most of HTML+ |
HTML 3.2 | January 1997 | Supported tables, applets, and text flow around images |
HTML 4.0 | December 1997 | Introduced stylesheets, frames, and scripts; represented major progress toward clean document structuring |
HTML 4.01 | December 1999 | Introduced several corrections and extensions for HTML 4.0 |
HTML 5 | April 2009 | The long-awaited successor of HTML 4.01 and XHTML 1.0; added new vocabulary, interfaces, and methodologies |
XHTML 1.0 | January 2000 | More XML-oriented; a redesigned and “cleaner” version of HTML 4.01 |
XHTML 1.1 | May 2001 | Separated the standard into several modules; the frameset and transitional subsets were removed |
XHTML 2.0 | July 2006 | An attempt to introduce new structural elements and enhance XHTML 1.1, but was discontinued in favor of HTML 5 |
Table 2.2 clearly indicates the two branches of development that the revisions and subsets of HTML have taken. This led to a major implementation effort among user agent vendors—and introduced the numerous vectors and security problems we are still facing today, several decades after the first HTML implementations were announced.
The doctype declaration
The doctype declaration is located in the document and is usually one of, if not the first, element in the document. That means the doctype declaration appears before the actual root element of a markup element. Usually, the structure of an HTML or comparable document looks like this:
• Doctype Declaration <!DOCTYPE…>
• Opening Root Element
<HTML>• Header Area <HEAD>…</HEAD>
• Body Area <BODY>…</BODY>
• Closing Root Element </HTML>
The doctype declaration does nothing more than link the DTD with the element to allow the parser or the validator to determine how to deal with the document or to assess its validity. A typical doctype declaration looks like this:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 4.0//EN">
As you can see, the element starts with an exclamation point and the element name—here it is DOCTYPE. It continues with the root element—here it is HTML—and then tells us something about the visibility of the DTD; in our example, the DTD is public and is not an internal DTD. The last part of the doctype declaration is a unique identifier that the parser uses to either request and access the DTD or just create an internal reference to it. These are only a few of the elements a doctype declaration can contain and we will discuss this more fully in the section “XML.”
Tags
Tags are the major structural elements of a markup-based document. The available range of tags is specified via the DTD. HTML 4, for example, provides about 90 tags for authors to use to structure a document. Since the HTML languages are oriented toward structuring text for print and comparable output media, a lot of the tags have references in the world of books and paper-based publications. For example, there is a cloud of tags for headlines (<h1>, <h2>, etc.), paragraphs (<p>), line breaks (<br>), and other elements you might find in printed documents.
Realizing that the Internet was not geared toward paper output, the standardization for the markup language that would succeed HTML took a slightly different path. XHTML is more aimed at output device independence and does not have a strong focus on print. Whereas bold text in HTML is introduced by a <b> tag, in XHTML it is introduced by a <strong> tag. Similarly, the tag for italic text in HTML is <i>, whereas in XHTML it is <em> for emphasis. Although <b> (for bold) and <i> (for italic) clearly indicate how the information enclosed within the tags will look, <strong> and <em> can mean anything depending on the output medium: bold and italic, or loud and a bit louder, or something completely different.
There are two basic types of tags: enclosing tags and self-closing tags. The text within enclosing tags usually wraps around smaller or larger text snippets and the formatting specified by the tags is applied to all the text enclosed within the tags. Of course, the enclosed text can contain other tags, so, for example, a text snippet can be both bold and italic. Self-closing tags do not need to enclose anything; they stand for themselves. You would use a self-closing tag for images or special meta information used in an HTML page header. Self-closing tags usually utilize attributes to extend themselves with actual information. An example would be the image tag utilizing the source attribute to determine from where to request the image, as in
<img src="/folder/an_image_file.gif"/>. Enclosing tags can utilize attributes too—and there are more exotic ways to create self-closing tags. We will learn about this in the sections “Closing Tags” and “Style Attributes.”
A lot of very good tag and attribute references are available. One of the most comprehensive is the Aptana HTML reference (
http://aptana.com/reference/html/api/HTML.index.html), which provides detailed information about almost every supported HTML and XHTML tag and attribute available.
The following code snippet shows some lines from the XHTML 1.0 DTD to illustrate how a DTD defines the tags that can be used in a document.
<!--========== Document Structure =============-->
<!-- the namespace URI designates the document profile -->
<!ELEMENT html (head, body)>
xmlns %URI; #FIXED ‘http://www.w3.org/1999/xhtml’
As you can see in this block, the <html> tag is being introduced and specified. The DTD tells the parser that the <html> tag may have children of the type <head> or <body> and can have two attributes, id and xmlns. If a validator would find an HTML element using a class attribute, it would probably throw a warning and tell us about a mismatch between the DTD specifications and the actual document.
Entities
Entities are very important elements used in markup documents, as they represent the reference to an actual object in its specified form. The entity is not the object itself, but rather contains information about and points to it, thus representing it. Entities in markup languages usually begin with an ampersand and end with a semicolon. In between is either a name, a decimal number, or a hexadecimal number representation.
Let us look at an example. The HTML standard specifies a vast array of entities that can be used and probably will be understood and processed correctly by the parser or user agent. For example, if the author of an HTML document wants to use the € character to express the price of an item in euros, he can do so in two ways. First, he could just type the character on his keyboard, but only if his keyboard has a key for this character. Also, the euro character is not within the ASCII range, since this special symbol was created and standardized decades after the ASCII table was created in the early 1960s. That means that not every transport and output medium will be capable of displaying the character correctly. If this is the case, the original information might be lost, or some other character might be chosen for display by the system, therefore messing up the original information.
ASCII stands for American Standard Code for Information Interchange. The goal of this standard which was first developed in the early 1960s was to create a fixed set of characters for use by teleprinters. The characters in the ASCII table use a seven-bit encoding; thus, 128 characters are available.
There are two main groups of characters in the ASCII table: printable characters and nonprintable or control characters. A look at an old typewriter explains the purpose of both character classes. Whereas the printable characters are visible on the paper being typed on, the nonprintable “characters” are meant to interact with the typewriter itself. These include the carriage return, the newline, the bell, and characters such as the Backspace and Delete keys. Even though decades have passed since most people have used an old-style typewriter, these control characters still play a major role in modern Web technologies and can cause a lot of trouble from a security point of view.
RFC 20 contains good information on ASCII; visit
http://tools.ietf.org/html/rfc20 to view the ASCII table and as well as a list of the 33 control characters and the remaining 94 printable characters, including the letters A through Z, and others.
This is where the entity comes in. User agents usually understand the representation €. No characters outside the ASCII range are being used, so there is low to no risk that problems will occur while the document is being transported and parsed. The parser either knows the entity and displays the matching representation, or just shows the entity as is. Another possibility is to look at the matching character set being used for the document. Assuming the document is being encoded with the ISO/IEC 8859-15 character set, there are 256 characters to choose from, since eight bits are being used for the table index and the table contains language-specific characters for European texts. The € character is in this character table; in fact, it is located at the 164th decimal table index.
So, if we are not sure if the parser or user agent actually understands and translates the named entity €, we can use the numerical entity of ¤ or the hexadecimal representation of ¤. Note that decimal entities are introduced by an ampersand (&) and a hash mark (#), whereas hexadecimal HTML entities are introduced by an ampersand, a hash mark, and an x. Another possibility is if the document is being encoded in the UTF-8 character set. This table is encoded up to 32 bits, and thus contains far more indexes—up to 221 (2.097.152) to be precise. We would usually work with the first 65,536 to save some time—this is the so-called Basic Multilingual Plane (BMP).
Not all of those BMP code points actually contain a usable character, though; only 54,364 of them are defined (we will discuss why later in this chapter). This table also contains an index pointing to the € character; this time it is the decimal index 8364. Thus, the entity would look like € in decimal form and € in hexadecimal form.
So, to summarize, there are a few types of entities that you can use in markup languages such as HTML. The first type is named entities that are specified by the markup standard or DTD being used, or are provided by the parser or user agent. The second type is numerical entities which use the decimal or hexadecimal notation pointing to the index of the character table defined by the document's encoding. Another type, which will be discussed in the section “XML,” is external entities. We can define those in the doctype declaration part of the document or even in our own doctypes to represent arbitrary characters and character sequences in the document.
CDATA sections
Character Data (CDATA) sections in XML tell the parser that the content that follows is not structural markup, but regular text, until the CDATA section ends. Since the basic principle of markup languages is based on predefined character sequences doing predefined things, such as <h1> marking the beginning of a headline and </h1> marking the end of the headline, it is mandatory that we have sections where no syntactical purpose is being interpreted in the given text data.
This basically means that after introducing a CDATA section an author can add any kind of content—even tags and attributes without worrying about breaking the structure of the document—until the closing delimiter for the CDATA section is given and the structural part of the document continues. Let us look at a small example:
Here you can do almost anything you want
without breaking the document structure
In the preceding code, the CDATA section begins with the string
<![CDATA[ and ends with
]]>. This kind of formatting is heavyweight, hard to remember, and, of course, easy to break: an attacker would just have to use
]]> to break out the CDATA section and interfere with the document structure to invalidate or even manipulate it. CDATA sections were first used in the original SGML standard and in many of today's XML subsets; today it is pretty hard to find actual HTML pages that use this heavy weighted delimiter. Although the HTML 4.0 specification clearly defines how user agents should deal with CDATA sections in HTML documents (
www.w3.org/TR/html4/types.html), how they do so is quite different.
Testing with the current major browsers shows that almost each user agent reacts differently to CDATA sections.
Table 2.3 shows what happens when we play with the following markup:
<img src="x" onerror="alert(1)">
Table 2.3 User Agents and CDATA Behavior
User Agent | Resultant Markup | Script Execution? |
---|
Opera 10.10 | <h1><![CDATA[<img src="x"onerror="alert(1)">]]></h1> | No; the data inside the CDATA section are converted into entities |
Firefox 3.5.7 | <h1><!--[CDATA[<img src="x" onerror="alert(1)">]]--></h1> | No; the CDATA section is considered to be an HTML comment |
Chrome 5.0 | <h1><img src="x" onerror="alert(1)">]]></h1> | Yes; Chrome renders the embedded markup and seems to strip the opening CDATA section |
Internet Explorer 6 | | No; only the closing part of the CDATA section is being shown, but it is formatted as <h1> |
Internet Explorer 8 | Same as Internet Explorer 6 | Same as Internet Explorer 6 |
So, as you can see, CDATA sections and HTML are not a good match. Still, we have a good reason to discuss them: We have found a way to generate unpredictable results, and therefore we have a good first base on which to build our discussion of obfuscated markup and hard-to-read code. Since even user agents are not really sure how to deal with CDATA sections, we can assume that it is the same for filter libraries, whether they are homegrown and proprietary or open source and well known.
Modifying the markup a bit shows even more surprising results. By just adding one more character, we can easily convince all tested versions of Internet Explorer to completely ignore the CDATA section and render the markup, and thus execute the JavaScript. The modified string looks like this:
<img src="x" onerror="alert(1)">
We can confuse Opera (as well as Firefox 3.5.7 and all other relevant Gecko-based browsers) into thinking the CDATA section has ended by using ]> instead of just >. (Chromium would have executed the JavaScript with the first version of the string.) So, as you can see, none of the user agents actually follow the specified way when dealing with CDATA sections, even though they are considered one of the most ancient structural SGML and XML elements, having been around since the standard was first specified. This proves a point that is important for you to understand. Although a standard exists, there is no actual standard to rely on. The practical implementation of a lot of tools rarely follows the actual specification or specification drafts. There are countless derivations and quirks we can find when dealing with “simple markup.” The same is true for JavaScript, PHP, databases, and multiple other layers being used in modern Web applications.
Comments
XML-based languages and the HTML family support comments to indicate that certain parts of the document should not be rendered and made visible to the reader. Comments begin with the character sequence <!-- and are supposed to end with the character sequence -->. Everything between these elements should be parsed, but not evaluated and displayed. So, text between comment elements is not visible to the reader unless he looks at the document's source code; scripts as well as stylesheets and other interactive elements are not followed by the user agent.
Some user agents, such as the Internet Explorer family, provide an extension to the usual comment scheme, called Conditional Comments, which allow the user to target a specific version of Internet Explorer and introduce a new conditional syntax. We will discuss this further in the section “Conditional Comments.”
You may not be surprised to learn that the user agents behave differently when dealing with comments, especially slightly invalid comments that are missing one or two of the necessary characters. Let us look at a practical example:
A<!--<img src="x" onerror="alert(1)">-->B
The preceding code displays the expected information in all tested user agents (those listed in
Table 2.3). All we can see is an uppercase A followed by an uppercase B. But as soon as we start messing around with the string, the results start to get strange. Look at what happens if we add one more character to the mix:
A<!--><img src="x" onerror="alert(1)">-->B
Now most of the user agents consider the comment to be closed and render the image, thus executing the JavaScript inside the
onerror attribute. This means a comment can also be closed with a single
> and not only with the expected character sequence of
-->. This might be an interesting way to find a markup injection vulnerability on a tested or attacked Web site since a lot of real-life filter solutions just encode or otherwise treat the
< character but not the
> character. One rather famous URL shortening service utilized this half-baked technique at the time of this writing. Only Chromium 5.0 managed to parse the half-closed comments correctly and did not execute the embedded payload. Using the “View selected Source” feature available in Firefox demonstrates why most user agents stumble in this example. The problem is the attempt to auto-complete or auto-validate the parsed markup. Firefox, for example, realizes that a half-closed comment is present, and automatically closes it by adding the missing dashes. The rendered result thus looks like this:
A<!----><img src="x" onerror="alert(1)">-->B
Firefox 3.5.7 actually executes the JavaScript in the “View selected Source” mode, although this represents something more akin to a weird bug than an actual security issue. But what happens if the string to close the comment comprises the content of an HTML attribute? The following example ensures that the comment is being closed and the payload will be parsed, rendered, and executed.
A<!--<img src="--><img src=x" onerror="alert(1)">-->B
This works on all tested user agents. The comment is being closed inside the source attribute of the image tag. A new image tag with the source x" is being created, and since this image source is probably not available, the event handler is being called and fires the JavaScript alert() method. So, as you can see, parsing HTML comments correctly is not very easy, and a lot of developers are not aware of the potential that comments and injections inside or around comments can have.
Markup today
Thus far, we have discussed the history of markup and the basic structural elements of XML and similar dialects. One conclusion that you might have reached is that user agents do not necessarily behave the same way as soon as they parse mildly invalid or unstructured markup. This is, of course, due to the fact that each browser vendor usually uses its own render engine, and that valid markup might be parsed in almost the same way, but since there are no real standards for handling erroneous markup, the methods might differ a lot. However, this is not entirely true.
At the time of this writing, four major rendering engines for markup exist and are being used in various user agents and browsers. They are often also referred to as the Layout engine, and they include:
• Trident
• Used in the Internet Explorer family. Currently available as Version 4.0 and used in Internet Explorer 8. Proprietary.
• Gecko
• Used by many Mozilla browsers such as Firefox, SeaMonkey, and Songbird. Currently available as Version 1.9.3. Open source.
• Presto
• Used by Opera-based browsers. Currently available as Version 2.6 and used by Opera 10.62. Proprietary.
• WebKit
• Used by Safari, Google Chrome, and other browsers. Open source.
Web developers today are being confronted with an array of extremly error-tolerant user agents. Even if the markup has severe structural damage, such as a missing closing tag for one of the root elements or accidentally added whitespace inside tags and attributes, the user agent still tries to make the best of it and auto-fix the structure to enable correct rendering of the visible output. It does this for a specific reason.
Back in the days when Netscape was dominating the browser market with Netscape Navigator, users had to pay for this product, as only a few mature user agents were available for free at that time. The WWW gained popularity, though, and with the release of Microsoft Plus! for Windows 95 a Netscape Navigator competitor was freely available for all Windows users: Internet Explorer 1.0. Over the following months, Microsoft tried to reach a point of feature parity to be able to compete with Netscape and reduce its market share. That finally happened in 1996 with the release of Microsoft Internet Explorer 3.0, which was the first browser to support scripting, CSS, and similar technologies that were poised to change the face of the WWW. But the major breakthrough came with Internet Explorer 4.0, which came preinstalled on Windows 98, and the monstrous feature-loaded Internet Explorer 5.5. Microsoft attempted to create heavy interaction between Web sites and the actual operating system, providing the infamous ActiveX API. Internet Explorer 5.0 shipped with the equivalent of the XMLHttpRequest object, which was used for Outlook WebAccess and is now enjoying a renaissance in Web 2.0.
In reaction to Microsoft's attempt to dominate Netscape's market space, Netscape incorporated numerous new features into Netscape Navigator, and along with Microsoft ensured that Web site development was as easy as possible, even for unexperienced developers and complete beginners. This is one of the reasons today's parsers are highly tolerant of faulty markup and utilize complex algorithms to guess at what the developer might have meant, even if the code is broken and the markup structure is destroyed. Netscape enhanced the scripting support in Navigator and implemented a lot of technologies we still use today in current JavaScript implementations, while Microsoft tried to brew its own mix of scripting languages implementing VisualBasic script support and a slightly different version of client-side scripting called JScript.
This resulted in not only a struggle between the two competitors but also an array of buggy features leading to severe security problems for users, a lot of Web sites using code that was free of semantics and structure, and an interpretation of what markup should be and is capable of. It is rumored that while Internet Explorer 5.5 was in development, more than 1000 people were working on the project. Internet Explorer 5.5 is still considered to be a milestone in browser development, and it offers so many features that some of them are more or less undiscovered in the MSDN, waiting for their time to shine, most likely in a filter circumvention or exploit scenario. We will see many examples of these in the “Style Attributes” section.
In a way, Microsoft won the first browser war: AOL acquired Netscape in late 1998. Unfortunately for Microsoft, the U.S. Department of Justice filed an antitrust case against Microsoft in May 1998. The plaintiffs argued that Microsoft combining its operating system and its Web browser would create a monopoly affecting the OS and browser markets. Also, optimizing the operating system interfaces to better communicate with an integrated Web browser would remove any possibility for third-party browser vendors to provide a comparable array of features, or could, in the worst case, lead to an inability to build and sell a full-featured Web browser at all.
After releasing Internet Explorer 5.5, some sources state that Microsoft drastically reduced the size of the Internet Explorer development team. Some say that during and after the release of Internet Explorer 6, only a handful of developers maintained the code and more less spent their time fixing bugs rather than adding new features. And there were plenty of bugs and serious security issues to fix, ranging from remote code execution flaws and cross-domain XHR problems to drive-by downloads and badly hardened APIs for communicating with user settings. Even cookies could be read cross-domain with some simple tricks—and at the time of this writing, this is still an issue. Additionally, Internet Explorer 6 ignored a lot of existing Web standards, and the lack of feature updates did not change that for many years, causing Web developers to put a lot of effort into either creating two versions of a Web application, or finding ways to make it work on all browsers using the aforementioned conditional comments, several branches of JavaScript, or an array of available browser hacks utilizing parser errors to address a specific model. It was not until March 2005 that Microsoft finally released a major new version of Internet Explorer, namely Internet Explorer 7. At that time, Internet Explorer was the default browser for all Microsoft Windows-based operating systems, and it occupied a huge share of the market. Internet Explorer has maintained such a strong foothold on the market that even at the time of this writing, IE6 is still the browser that is supported by a lot of Web sites and applications.
In the meantime, Netscape opened the source code of its old Netscape browser, which led to the creation of the Mozilla Foundation, which spawned the open source browser Firefox (initially called Phoenix and then Firebird). Some sources refer to that as the second browser war. Firefox 2.0 was released more than 18 months after IE7, but because IE7 was only deployed as a high-priority update for genuine Windows users, the market share for IE6 was still frighteningly high, and on many Web sites IE7 never managed to get a greater share than its older sibling.
The actual second renaissance of Web standards was the fusion between the Mozilla Foundation and Opera Software in early 2004, resulting in the WHATWG providing a forum and platform for quick and effective standard specifications and proposal submission to the W3C (
www.whatwg.org/). Meanwhile, Microsoft started to put serious effort into following Web standards again during development of IE8 (although the company stated similar goals for IE7 some years before).
At the time of this writing, the major competitors in the browser market are Firefox 3.5, Opera 10, Chrome 4, and IE8. Making Web development a rather rocky road for both Web developers and Internet users is the fact that almost all user agents still exhibit a lot of interesting parser behavior, legacy features, and features that most Web developers, IDS and Web Application Firewall (WAF) vendors, and authors of filtering and markup sanitization libraries and products are not even aware of. We will cover all of this, as well as discuss some interesting artifacts that make HTML 5 usable in attack scenarios throughout this chapter.
Why markup obfuscation?
You may be wondering why we are devoting an entire chapter to the subject of markup obfuscation. The following example may help to explain the reason:
‘b\65h 061vio
:url(#default#time2)’
/onbegin=u0061lert(1)//
The preceding code is a vector executing the JavaScript code alert(1) by making use of the HTML+TIME API integrated in Internet Explorer since Version 5.5 (and currently available in IE8).
This snippet of not-really-valid-but-still-working markup executes the JavaScript without any user interaction. Furthermore, it uses almost every available possibility to obfuscate markup. Here is a short list of the techniques being used:
• Fake invalid namespaces
• Invalid but working attribute separators
• Decimal and hexadecimal entities inside HTML attributes
• CSS entities inside the style attribute
• Double encoded entities inside the style attribute
• Backticks as attribute value delimiters
• Invalid but working escapings
• JavaScript Unicode entities in the onbegin event handler
• Crippled decimal entities inside the onbegin event handler
• Invalid garbage before the ending tag
Bypassing Web application input filters
As you may have guessed by looking at the preceding code and the preceding list, one of the reasons it is important to learn about obfuscating markup concerns the ability to bypass Web application input filters. In a real-life exploit scenario, an attacker has a good chance of getting this vector past any blacklist-based filter mechanism. It is not even real HTML we are using here, but something close to HTML or XML. In other words, we are talking about the ability to bypass filter mechanisms. Classic filters look out for known dangerous tags; this is not even a real tag.
A lot of filter libraries out there claim they can filter markup effectively and are fast and secure at the same time. A vector such as this proves many of them wrong, maybe even the one you are using for your own applications.
Slowing down forensics
Another reason obfuscating markup is important is that code such as this makes forensic work extremely difficult. The example uses entities and encodings on several layers, as well as inside the attributes, and it uses the ability to double-encode depending on the exact attribute type and language running inside the attributes. Before the possible victim can even start any forensic work to determine what this vector's payload did, the victim must learn and understand all the basics in terms of about encoding and obfuscation. We are just working with a short alert(1) in this example, but imagine how the whole construct would look if we had more payload.
Fun
The third and final reason to learn about obfuscating markup is that it is just plain fun. Finding a new way to fool user agents into rendering invalid markup and maybe even executing JavaScript in impossible situations might be another component of making your own applications a bit more secure. Or it may be a way for you to identify an exploit against your customer's Web site. Or perhaps it is just a cool snippet of code you can brag about on Twitter.
By the time you finish reading this chapter, the vector example shown earlier should be almost as readable as plain text, and you should understand all the techniques used in the code in terms of what they do and how they work. Hopefully, this will help you to harden your filter software, sharpen your IDS skills, and help you when you audit your or your customers’ Web sites and applications. In the next section, we will discuss the basic obfuscation techniques, starting with how valid markup is structured and how it is meant to work, and how we can leave the path of using vaild markup still being parsed by the user agents with every step.
Basic markup obfuscation
This section demonstrates basic markup obfuscation (meaning taking what is already there and changing it). We discuss the structure of valid markup so that you will better understand where valid tags are located, and learn how to automate this task to attain results as quickly as possible. The only technical requirements are the targeted browser and an editor for testing the examples—or in the best case a running Web server with PHP to actually use the examples where characters are being generated in a loop.
The examples were created and tested on the Ubuntu 9.10 platform. Following is list of software you require for the full experience:
• An up-to-date Flash player
In addition, here are some Web sites you might want to visit while working through this chapter:
You should also be able to work through the chapter's examples on a Microsoft Windows system, but we cannot guarantee that all the examples and scripts will run fine in all situations. Also, several of the listings shown in the following sections may crash your browser, so make sure that no important tabs or instances of the same browser are open while you play with the snippets.
Structure of valid markup
The structure with which valid markup is built is easy to explain. To illustrate the blueprint of a valid and working HTML tag, we can simply look at an example. Let us take something rather basic to start with, and use a simple link pointing to a harmless HTTP URL.
<a href="http://www.google.com/">Click me</a>
The < introduces the tag and is immediately followed by the tag name, a, which denotes an anchor tag. A space separates the tag name and the first attribute, and next comes the attribute name href followed by =" to introduce the attribute value. After this value, we have "> to close the fist part of the tag. Next is the text Click me, followed by </ indicating that we want to close the tag, then the tag name a, and finally >.
Table 2.4 describes the components of this valid piece of markup and where we may be able to change it and still have it work.
Table 2.4 Various Points for Enumeration in Markup
Position | Code | Possibilities |
---|
Right after the opening < | | Trying control characters, white space, and other nonprintables |
Right after the tag name | | Again, control and special characters |
Inside the attribute name | | Control characters and nullbytes; maybe whitespace |
Before or after the equals sign | <a href[here]=[and/or here]"… | Additional equals signs or other arbitrary characters |
Replacing the equals sign | | Unicode representations for the equals sign |
Replacing the double quotes | <a href=[here]…[and/or here]> | Other types of quotes, no quotes, or whitespace |
Between the last attribute and the closing > | | Probably arbitrary padding |
Before the slash in the closing tag | | Whitespace, more slashes or control characters, and other non-printables |
After the slash in the closing tag | | Maybe nullbytes or control characters |
Between the closing tag name and the closing > | | Probably arbitrary garbage |
Playing with the markup
To achieve working results and not just assume that we can inject characters at the listed positions and start obfuscating the markup, it is best to use a small application written in PHP to help us generate a predefined range and number of characters at the desired position inside the markup. Let us look at an actual listing we can work with:
for($i = 0; $i <= 255; $i++) {
# Right after the opening <
echo '<div><'.$character.'a
href="http://www.google.com/">'.$i.'</a></div>';
This small loop does nothing more than create 256 links encapsulated in a block element, the <div>, and echoes the HTML data. What is interesting about this loop is what the user agents do with it. Thus, we have to use our small lab to look at the generated data with each browser we want to test against. Also, we will want to echo the tested index enclosed by the link to know instantly which character worked and which did not.
Alternatively, you might want to create bigger loops, maybe even ranging over the entire UTF-8 table and creating 65,536 links to test possibilities with Unicode. Needless to say, this would take a bit of time and might crash your browser, but there is something else to keep in mind. PHP is working with ISO-8859-1 as its default character encoding. This character set knows 256 characters, and using a loop with table indexes up to 65,535 links might produce garbage. Thus, we have to change our loop slightly to provide valuable results, and tell PHP exactly what character set to use. Then we need to set the user agent to UTF-8 or whatever character set we chose manually.
for($i = 0; $i <= 65535; $i++) {
$character = html_entity_decode('&#'.$i.';', ENT_QUOTES, 'UTF-8'),
# Right after the opening <
echo '<div><'.$character.'a
href="http://www.google.com/">'.$i.'</a></div>';
By running the loop and having a looking afterward, we can see that the majority of the output is rather uninteresting. Most browsers start to behave somewhat strangely when they reach index 33, pointing to the exclamation point. The user agents just receive the combination of
< and
! and automatically assume it is a comment. The comment then automatically closes and the user agents omit the closing
<a> tag; weird, but hard to use in an actual exploit scenario. The rendered result Firefox presents looks like this:
<div><!--a href="http://www.google.com/"-->33</div>
Similar things happen when reaching index 47, or the slash. Again, the user agents apply a lot of auto-magic to the received markup and change it internally. It is good to keep in mind that ! and / force the browser to improvise, but as mentioned, in the field this is rarely exploitable—or is it? Here, we were mainly talking about Opera, Firefox, and Chromium. What about IE 6 and IE 8? Well, they give us the perfect reason to move on to the section “Obfuscating tag names,” because the output from our first loop is a bit disturbing.
Obfuscating tag names
If you look at the output of the aforementioned loops, you can see that for IE 6 and IE 8 something is completely different. The first fragment of HTML actually works, and a link is being displayed with the enclosed text 0. That means Internet Explorer and older versions of other browsers seamlessly swallow the nullbyte (which is the first character in the ASCII table and is sometimes called the null character).
Let us look at this character in more detail. In the old days of punch-card computing, the word nullbyte referred to the absence of a hole in the card. Later, when languages such as C became popular, nullbyte was used to indicate termination of a string; so, when a nullbyte appeared in a string, parsers assumed that signified the end of the string, and either continued with the next line of the string or stopped the parsing process. That does not happen in our code; otherwise, we would not see the output in its entirety, or at least the very first line. Internet Explorer does something else. Since the developers of the Trident layout engine were probably aware of all the security problems that improper handling of nullbytes can cause, the engine just strips them out seamlessly.
Of course, this is not a great thing to do, because it leads to the problem of distributing the attack over multiple layers. Imagine a server-side HTML filter following the standards and detecting HTML fragments in strings based on the assumption that incoming markup must consist of a
< and at least one or more printable non-numeric character, such as any character
a through
Z, or even a printable character from the non-ASCII range, such as
µ. Most user agents do not accept non-ASCII characters as the first character after the
<, but they do accept them thereafter. So, code such as the following works perfectly on Firefox 3.5.7 and Chromium 5.0:
<Lµ onclick=alert(1)>click me</Lµ>
Extending the code with fake namespaces makes it work on Internet Explorer too; only Opera keeps refusing to execute the JavaScript
onclick event.
<L:µ onclick=alert(1)>click me</L:µ>
But back to the nullbyte issue. If a filter is assuming that incoming markup must at least match the pattern
<w+, or in more thorough cases
<[?!]*w+, to also catch comments and processing instructions, the filter would fail terribly. The decision to strip characters in the client is bad, since invalid markup is invalid markup. Even if we are talking about nullbytes there should be no client-side post-validation before the actual data is being rendered. Therefore, this is a serious problem, but it is not known to all vendors of filter solutions. PHP, for example, uses the function
strip_tags() (
http://php.net/manual/en/function.strip-tags.php) to clean strings from surrounding and embedded markup. This method is aware of the nullbyte issue and acts accordingly. But many other libraries and filter solutions do not behave this way. Let us look at some PHP code to help us test this issue via
chr() (
http://php.net/manual/en/function.chr.php):
echo '<im'.chr(0).'g sr'.chr(0).'c=x onerror=ale'.chr(0).'rt(1)>';
As we can see, there is a nullbyte right in the middle of the tag name, inside the attribute name, and in the middle of the JavaScript
alert(), so we can assume that nullbytes are stripped globally, independent of the layer the user agent is processing. Now let us move a step ahead and look at the source code of the generated Web site on IE 8. The result is frightening: we can only see
<im; everything after the nullbyte is hidden. Creating a slight variation such as that shown in the following code can ensure that the entire vector, including the tag and payload, is invisible on Internet Explorer:
echo chr(0).'<im'.chr(0).'g sr'.chr(0).'c=x onerror=ale'.chr(0).'rt(1)>';
You may be wondering if there are other ways to inject strange characters inside the tag name and still have the user agent execute the entire string.
In fact, there are two additional ways in which we can obfuscate the tag name. The first method involves attacking the application using a character set which has design issues in combination with a specific user agent. The second method involves attacking a PHP-based application making use of the function utf8_decode() before any filtering takes place. Since the second method is PHP-specific, we focus on the first method involving the broken character set and user agent combination. (Note, however, that you can use the PHP-based method with invalid UTF-8 character combinations, and that you can easily scan the Internet to find vulnerable applications and Web sites.)
Let us start with a small example to illustrate what this is all about:
header('Content-Type: text/html;charset=Shift_JIS'),
for($i = 1; $i <= 255; $i++) {
$character = html_entity_decode('&#'.$i.';', ENT_QUOTES, 'UTF-8'),
$character = utf8_decode($character);
echo $character.'123456 '.$i."<br>
";
The code we are using is quite easy to explain. We create a loop generating 255 characters starting with ASCII table index 1. This time we omit the nullbyte because we might want to look at the page source, and we know what the nullbyte does with several user agents; Internet Explorer is not the only browser that ignores data following a nullbyte.
We echo the actual character after making sure we set the charset header correctly, and convert the character from UTF-8 to the necessary character set. In the first example, we use Shift_JIS, a Japanese character set. The code might look a bit over-heady, but it proved to be the most stable way to generate the test scenario we need here. The generated character is being echoed directly before the number sequence 123456, for easier readability later on. After that, we echo the character table index to determine what character might be causing trouble. Let us run the script on Firefox 3.5.7, Chromium 5.0, IE 8, and Opera 10.100 and look at the output.
Starting with the character at table position 129 and ending with the character at table position 159, we can see that the “1” in the number sequence 123456 is missing. This happens again from table position 224 through table position 252.
It seems that the user agents are unable to deal with this character set correctly, and they assume that the characters at that position are
actually part of a multibyte character, with the “1” being the second part of the character. Thus, the character and the “1” form a new character, and the “1” gets swallowed.
Of all of the tested user agents, only Chrome was able to get around the broken charset issue we are discussing in this section. No characters were “swallowed” on this browser, so Google apparently patched the charset internally. Opera produced the worst results and introduced several more broken characters. Keep in mind that this kind of low-level vulnerability might render Web sites prone to XSS attacks even if the developers used proper encoding and filtering.
Either the character set
Shift_JIS is buggy or the user agents do not handle it correctly. Other character sets, among them EUC-JP and BIG5, show similar results.
Table 2.5 shows which user agents have problems with which character ranges in which character sets.
Table 2.5 Affected Characters (Decimal ASCII Table Index)
| EUC-JP | Shift_JIS | BIG5 |
---|
Chrome 4.0 | None | None | None |
IE 6 | 129-141, 143-159, 161-254 | 129-159, 224-252 | 129-254 |
IE 8 | None | 129-159, 224-252 | 129-254 |
Firefox 3.5.7 | 143 | 129-159, 224-252 | None |
Opera 10.100 | 142-143, 161-254 | 129-159, 224-252 | 161-254 |
This issue enables an attacker to swallow characters that might, in some situations, be mandatory to secure an application against XSS attacks or even SQL injection. For instance, the following scenario can inject characters into a closed and quoted attribute:
<a title="My Homepage" href="http://[user input]">My Homepage</a>
The Web site developers were smart and made sure that all incoming quotes and < and > tags were encoded to entities to ensure that they would not cause any damage. All an attacker has to do now is to make sure the character being injected is at the end of the user input, thus swallowing the closing double quote for the attribute, and therefore enabling him to introduce event handlers such as onclick or style attributes to get some JavaScript executed. If you are saying to yourself, “But that won't work, we still have the opening double quote and we need a closing double quote to make the attack happen,” you'd be right: Opera, Internet Explorer, and Chrome do handle this correctly. So, this is not a real vector, and is nothing to worry about.
Or is it? Due to a reported Firefox parser bug, the following code actually executes an
alert() on all relevant Firefox versions:
<img src="foobar onerror=alert(1)//
In the preceding code, we have an opening double quote, but no closing double quote. What is important is that we do not have any more double quotes in the entire Web site. Therefore, an injection in the footer area of a Web site will likely succeed, or maybe some help of a nullbyte. Still, the problem is that if there is no closing double quote after the last opening double quote, no closing double quote is necessary, and Firefox just ignores the markup error. To get back to our character set issue and the swallowed characters, if the attacker is lucky, it might be enough to swallow a closing quote to perform an XSS attack against a well-protected Web site. The only conditions are to either stop the content from being displayed after the injection, or have no more quotes from the point of injection until the response body ends. When you think about footer links and other common injection points, this is not unlikely. The complete injection would look like this:
<a title="My Homepage" href="http://foobarŃ onclick=alert()>My Homepage</a>
Obfuscating separators
Thus far, we have seen what we can do regarding markup obfuscation with the tag name. But what about the whitespace right after the tag name? A lot of filters and parsers that detect and treat incoming markup rely on the assumption that browsers only render a tag if the tag name is directly followed by a whitespace, or a closing >. So, officially, such a tag has to look like this, <tag attribute="">, or this, <tag>. But that is not always going to be the case, and again, it strongly depends on the user agent what we can do here.
One of the older tricks that has been published by many sources is to just use the slash instead of the whitespace, or any form of ASCII whitespace such as new lines, carriage returns, horizontal tabs, vertical tabs, and even form feeds. Let us just ask our little loop what can be done here:
for($i = 0; $i <= 255; $i++) {
# Right after the tag name
‘<div><a'.$character.'href="http://www.google.com/">'.$i.’</a></div>';
The result of this test is not very spectacular, as
Table 2.6 shows.
Table 2.6 Characters to Separate Tag Name and Attribute
User Agent | Characters (Decimal Table Index) |
---|
IE 6 | 9,10,11,12,13,32,47 |
IE 8 | 9,10,11,12,13,32,47 |
Opera 10.100 | 9,10,12,13,32 |
Chromium 5.0 | 9,10,11,12,13,32 |
Firefox 3.5.7 | 9,10,13,32,47 |
It seems that the user agents are a bit stuck up here and do not allow too many variations. Opera and Chromium in particular do not accept the slash directly behind the tag name. This is especially tedious in cases where the filter of a targeted Web site denies usage of the available forms of spaces. Also, the character class s in Perl Compatible Regular Expressions (PCRE) detects all of the mentioned ASCII spaces. So, it seems that the user agent vendors have done a pretty good job in terms of restricting the layout engines from accepting irritating characters between the tag name and the first attribute name.
Even if we exceed the range from ASCII to the full UTF-8 range, nothing exciting happens. But it gets interesting if we add a space to the mix, like this:
for($i = 0; $i <= 65535; $i++) {
$character = html_entity_decode(‘&#'.$i.';', ENT_QUOTES, ‘UTF-8’);
# Right after the tag name
echo ‘<div><iframe'.$character.$character.’ onload="document.getElementById('test')'
. '.innerHTML+=''.$i.', '"></iframe></div>';
Running the following code proves that Chromium and Opera allow slashes after the tag name. Additionally, nullbytes appear in the mix again, for Chromium and Internet Explorer (that they appear in Internet Explorer is not surprising, though). So, we can form vectors that look like this (in the following code,