Chapter 6. Crimeware in the Browser

Dan Boneh, Mona Gandhi, Collin Jackson, Markus Jakobsson, John Mitchell, Zulfikar Ramzan, Jacob Ratkiewicz, and Sid Stamm

This chapter considers crimeware that executes solely in the user’s web browser, but that does not exploit a vulnerability on that browser. The chapter starts with a discussion of transaction generators, which can be used to modify online transactions (performed using a web browser) in real time. Next, the concept of a drive-by pharming attack is studied—this attack shows how home router DNS settings can be changed when the victim simply views an external web page. Finally, this chapter considers badvertisements, which demonstrate how JavaScript can be used to simulate fraudulent clicks on online advertisements.

6.1 Transaction Generators: Rootkits for the Web*

Current phishing attacks steal user credentials, either by directing users to a spoofed web page that fools them into revealing a password, or by installing keylogging malware that records user passwords and sends them to the phisher. In response, web sites are deploying a variety of back-end analytic tools [74, 301, 325] that use past user behavior to determine transaction risk, such as the time of day when the user is typically active and the user’s IP address and location. Some sites are moving to stronger authentication using one-time password tokens such as RSA SecurID [359]. These methods, as well as many other anti-phishing proposals [86, 163, 206, 300, 357, 486], focus primarily on reducing the value that phishers derive from stolen passwords.

Fortunately for thieves, and unfortunately for the rest of us, a new form of attack using a transaction generator (TG) allows criminals to manipulate user accounts directly without stealing user credentials or subverting authentication mechanisms. TG attacks generate fraudulent transactions from the user’s computer, through malicious browser extensions, after the user has authenticated to the site. A TG quietly sits on the user’s machine and waits for the user to log in to a banking or retail site. Once the authentication completes, web sites typically issue a session cookie used to authenticate subsequent messages from the browser. These session cookies reside in the browser and are fully accessible to malware. A TG can thus wait for the user to securely log in to the site and then use the session cookie to issue transactions on behalf of the user, transferring funds out of the user’s account or purchasing goods and mailing them off as “gifts.” To the web site, a transaction issued by a TG looks identical to a legitimate transaction issued by the user—it originates from the user’s normal IP address at the usual time of day—making it hard for analytic tools to detect.

Because TGs typically live inside the user’s browser as a browser extension, SSL provides no defense against a TG. Moreover, a clever TG can hide its transactions using stealth techniques discussed in the next section. To date, we have seen only few reports of TGs in the wild [374], but we anticipate seeing many more reports as adoption of stronger authentication becomes widespread.

In Section 6.1.3, we explore a number of mitigation techniques, including transaction confirmation. A transaction confirmation system consists of isolated client-side software and a trusted path to the user that enables web sites to request confirmation for transactions that the site deems risky.

Cross-Site Request Forgery. At a first glance, a TG may appear to be related to cross-site request forgery (CSRF) [72]. A CSRF vulnerability is caused by an incorrect implementation of user authentication at the web site. To prevent CSRF attacks, the web site need only implement a small change to its user authentication system; this modification is transparent to the user. In contrast, a TG running inside a client browser is much harder to block, and defenses against it require changes to the user experience at the site.

6.1.1 Building a Transaction Generator

TGs can lead to many types of illegal activity:

Pump-and-dump stock schemes [324]. The TG buys prespecified stock on a prespecified date to artificially increase the value of penny stock.

Purchasing goods. The TG purchases goods and has them shipped to a forwarding address acquired earlier by the phisher.

Election system fraud. For voting-at-home systems, such as those used for collecting shareholder votes, a TG can be used to alter votes in one way or another.

Financial theft. A TG can use a bill-pay service to transfer funds out of a victim account.

Example. Building a TG is trivial, as shown in the following hypothetical example. This Firefox extension waits for the user to land on the www.retailer.com/loggedin page, which is reached once the user has properly logged in at the retailer. The TG then issues a purchase request to www.retailer.com/buy and orders 10 blenders to be sent to some address in Kansas. Presumably the phisher hired the person in Kansas to ship the blenders to an offshore address. The person in Kansas (the “mule”) may have no idea that he or she is involved in illegal activity.

<?xml version="1.0"?>
<overlay xmlns="http://www.mozilla.org/keymaster/gatekeeper/
there.is.only.xul">
<script>
document.getElementById("appcontent")
        .addEventListener("load", function() {
  var curLocation  =
    getBrowser().selectedBrowser.contentDocument.location;
  if(curLocation.href.indexOf("www.retailer.com/loggedin") > 0)
  {
    var xhr = new XMLHttpRequest();
    xhr.open("POST", "https://www.retailer.com/buy");
    xhr.send("item=blender&amp;quantity=10&amp;address=Kansas");
  }
}, true);
</script></overlay>

6.1.2 Stealthy Transaction Generators

Transactions generated by a TG will show up on any transaction report page (e.g., an “items purchased” page) at the web site. A clever TG in the user’s browser can intercept report pages and erase its own transactions from the report. As a result, the user cannot tell that fraud occurred just by looking at pages at the site. For example, the following single JavaScript line removes all table rows on a transaction history page that refer to a blender:

document.body.innerHTML =
    document.body.innerHTML.replace(
        /<tr>.*?blender.*?</tr>/gi, "");

We have tested this code on several major retailer web sites.

Moreover, suppose a user pays her credit card bills online. The TG can wait for the user to log in to her credit card provider site and then erase the fraudulent transactions from the provider’s report page, using the same line of JavaScript shown previously. The sum total amount remains unchanged, but the fraudulent transaction does not appear in the list of transactions. Because most consumers do not bother to check the arithmetic on report pages from their bank, the consumer will pay her credit card bill in full and remain unaware that the bill includes a stealthy fraudulent transaction. This behavior is analogous to how rootkits hide themselves by hiding their footprints on the infected system.

The net result of stealth techniques is that the consumer will never know that her machine issued a nonconfirmed transaction and will never know that she paid for the transaction.

6.1.3 Countermeasures

Mitigation Techniques

We discuss three potential mitigation techniques against the stealthy TGs discussed in the previous section. The first two are easy to deploy, but can be defeated. The third approach is the one we advocate.

  1. CAPTCHA. A CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) on the retailer’s checkout page will make it harder for a TG to issue transactions automatically. Retailers, however, balk at this idea because the CAPTCHA complicates the checkout procedure and can reduce conversion rates. There are also security concerns because phishers can hire real people to solve CAPTCHAs. After all, if one can buy a $50 blender for free, it is worth paying $0.10 for someone to manually solve the challenge of the CAPTCHA. Alternatively, the malware may try to fool the authenticated user into solving the CAPTCHA for a malicious transaction, while the user thinks he or she is solving the CAPTCHA for some other purpose. Overall, we believe CAPTCHAs cannot defeat a clever TG.

  2. Randomized Transaction Pages. As mentioned earlier, a stealthy TG can remove its transactions from an online credit card bill, thus hiding its tracks. Credit card providers can make this a little more difficult by presenting the bill as an image or by randomizing the structure of the bill. As a result, it is more difficult for a TG to make surgical changes to the bill.

  3. Transaction Confirmation: A Robust Defense. An online merchant can protect itself from TGs by using a confirmation system that enables users to confirm every transaction. The confirmation system should be unobtrusive and easy to use.

Here we propose a simple web-based confirmation system that can be deployed with minimal changes to the web site. The system combines confirmation with the checkout process. On the client side, the system consists of two components:

• A confirmation agent that is isolated from malware infecting the browser. In our prototype implementation (called SpyBlock), the browser runs in a virtual machine (VM) while the agent runs outside the VM. Alternatively, the confirmation agent might live on a separate hardware device such as a USB token or a Bluetooth cell phone.

• A browser extension that functions as an untrusted relay between the confirmation agent and the remote web site.

We briefly describe the confirmation process here. The confirmation agent and remote web site share an ephemeral secret key generated by an identity system such as CardSpace during user login. During checkout, the remote web site can request transaction confirmation by embedding the following simple JavaScript on the checkout page:

if (window.spyblock) {
  spyblock.confirm(document.form1.transaction, {
    observe: function(subject, topic, data) {
      document.form1.transactionMAC.value = data;
} }; }

This script interacts with the untrusted browser extension that relays the transaction details to the confirmation agent. The confirmation agent displays the details to the user and asks the user to confirm the transaction. If the user confirms, the agent sends back a MAC of the transaction details to the browser extension, which then forwards the MAC to the remote web site. The web site verifies that the MAC is valid: if it is valid, the web site fulfills the transaction.

Security relies on two properties. First, the agent’s secret key must be isolated from malware. Second, the confirmation dialog must not be obscured by a malware pop-up to ensure that the user confirms the correct transaction details. Similarly, malware must be prevented from injecting mouse clicks into the agent’s dialog. Note that simply spoofing the confirmation dialog is of no use to the TG, because it cannot generate the necessary MAC itself.

A Nonsolution

Clearly, a potential solution to the TG problem is to prevent malware from getting into the browser in the first place. However, the widespread penetration of end-user machines by spyware and bot networks [263] underscores the vulnerability of many of today’s machines to malware attacks. We do not expect this situation to change any time soon.

Conclusion

Transaction generators are a source of concern for enterprises engaged in online commerce [88]. As stronger authentication systems are deployed, we expect transaction generators to pose an increasing threat. This emerging form of malware hijacks legitimate sessions and generates fraudulent transactions using legitimate credentials, instead of stealing authentication credentials. By operating within the browser, TGs can potentially hide their effects by altering the user’s view of information provided by any site. Consequently, it is necessary to extend identity systems, such as CardSpace, to include a transaction confirmation component.

6.2 Drive-By Pharming*

This section describes an attack concept termed drive-by pharming, where an attacker sets up a web page that, when simply viewed by the victim, attempts to change the settings on the victim’s home broadband router. In particular, the attacker can change the router’s DNS1 server settings. Future DNS queries are then resolved by a DNS server of the attacker’s choice. The attacker can direct the victim’s Internet traffic and point the victim to the attacker’s own web sites regardless of which domain the victim thinks he or she is actually visiting.

For example, the attacker can set up web sites that mimic the victim’s bank. More generally, the attacker can set up a man-in-the-middle proxy that will record all traffic packets generated by the victim, including those that contain sensitive credentials such as usernames, passwords, credit card numbers, bank account numbers, and the like. To the victim, everything appears normal because the address bar on the browser displays the correct URL.

This type of attack could lead to the user’s credentials being compromised and identity stolen. The same attack methodology could, in principle, be used to make other changes to the router, such as replacing its firmware. Routers could then host malicious web pages or engage in click fraud. Fortunately, drive-by pharming works only when the management password on the router is easily guessed. The remainder of this section describes this kind of attack, its implications, and overall defense strategies.

6.2.1 The Drive-By Pharming Attack Flow

The flow in a drive-by pharming attack is as follows:

  1. The attacker sets up a web page that contains malicious JavaScript code.

  2. The attacker uses some ruse to get the victim to visit the web page. For example, the web page could offer news stories, salacious gossip, or illicit content. Alternatively, the attacker could purchase inexpensive banner advertisements for selective keywords.

  3. The victim views the malicious page.

  4. The page, now running in the context of the user’s browser, uses a cross-site request Forgery (CSRF) to attempt a login into the victim’s home broadband router.

  5. If the login is successful, the attacker changes the victim’s DNS settings. This approach works in part because home broadband routers offer a web-based interface for configuration management and, therefore, can be configured through an HTTP request, which can be generated through appropriate JavaScript code.

  6. Once the victim’s DNS settings are altered, future DNS requests are resolved by the attacker. The next time the victim attempts to go to a legitimate site, such as one corresponding to his or her bank or credit card company, the attacker can instead trick the user into going to an illegitimate site that the attacker set up. The victim will have difficulty knowing that he or she is at an illegitimate site because the address bar on the web browser will show the URL of the legitimate site (not to mention that the content on the illegitimate site can be a direct replica of the content on the legitimate site).

This flow is depicted graphically in Figure 6.1. More details on drive-by pharming can be obtained from the original technical report on the subject [389].

Figure 6.1. The flow of a drive-by pharming attack. (1) The attacker creates a web page and then (2) lures a victim into viewing it. When the web page is (3) viewed, the attacker’s page attempts a (4) cross-site request forgery on the victim’s home broadband router and (5) attempts to change its settings. If successful, the victim’s future DNS requests will (6) be resolved by the attacker.

image

6.2.2 Prior Related Work

At Black Hat 2006, Jeremiah Grossman presented mechanisms for using JavaScript to scan and fingerprint hosts on an internal network [157]. He also suggested that an attacker could mount a CSRF against these devices.

A few months earlier, Tsow [425] described how routers could be used to perpetrate pharming attacks [4]. Subsequently, Tsow et al. [426] described how wardriving could be combined with malicious firmware upgrades, giving rise to the recent term warkitting. These last two works assume some physical proximity to the router being manipulated. For the case of wireless routers, the attacker has to be located within the router’s signal range. For the case of wired routers, the attacker must have physical access to the device.

The drive-by pharming attack described in this section observes that JavaScript host scanning and CSRFs can be used to eliminate the need for physical proximity in these router attacks. It is sufficient that the victim navigates to a corrupted site for the router to be at risk. It is a mistake to believe that only “reckless” web surfers are at risk for the attack described here; consider, for example, an attacker who places banner advertisements (for any type of product or service) and probes everyone that is referred to the “advertisement site.” The underlying details of the attack from a purely technical perspective are, in our opinion, not nearly as interesting as the potential implications of drive-by pharming.

6.2.3 Attack Details

This subsection describes the drive-by pharming attack in more detail. First, we begin from the bottom up by describing how a CSRF can be used to log in to and change DNS settings on a specific home broadband router, assuming that the attacker knows which router the victim uses, and assuming the attacker knows where on the network the router is located. Second, we explain techniques that allow an attacker to determine which router the victim’s network is actually using. Third, we describe how JavaScript host scanning can be used to identify where the router is located on the internal network, assuming the attacker knows the internal IP address space of the network. Fourth, we describe how the internal IP address of the network can be determined using a Java applet.

Most home broadband routers are located at a default IP address, so the first two steps will likely be sufficient for carrying out the attack. Because these steps require only some basic JavaScript, the entire attack can be carried out using JavaScript. To maximize the likelihood that the attacker succeeds, the attack can be augmented using a Java applet, although this step—while helpful—may not be necessary.

Configuring Home Routers Through Cross-Site Request Forgeries

Most home routers use a web-based interface for configuration management. When the user wants to change the router’s settings, he or she simply types in the router’s internal IP address into the web browser—for example, http://192.168.1.1. The user is then typically prompted for the username and password for the router. (Note that this username and password are separate from the credentials associated with the user’s Internet service provider; for example, if the user were using DSL, these credentials would be separate from his or her PPPoE credentials.) At this point, the user is typically presented with a form. Changes can be entered into this form, and the changes are passed in a query string to a CGI script, which in turn modifies the settings and updates the router.

For a specific router, JavaScript code of the following form can automate the process:

<SCRIPT SRC = "http://username:[email protected]/foo.cgi?dns=1.2.3.4">
</SCRIPT>

Taken verbatim, this script will not actually work on any of the well-known routers—but modified appropriately it will. To be successful, this script assumes that the attacker has five pieces of information:

• The username and password for the router

• The IP address of the router

• The actual name of the CGI script the router uses

• The actual name of the parameter for changing the DNS settings

• The correct format for submitting updated values

Given that many people use the default username and password for their home broadband router, the first assumption is reasonable. Similarly, the second assumption is reasonable because these routers use a default IP address. Also, the third, fourth, and fifth assumptions are reasonable because the script and parameter names as well as the query string format are uniform for a given router model. To extend the attack to different types of routers, the attacker must determine which router brand is being used. We will describe how that can be done.

Device Fingerprinting

A simple technique (which can be implemented using basic JavaScript and HTML) allows one to determine the brand of a user’s home router. Basically, each router hosts a set of web pages that are used as a configuration management interface. These web pages display, among other things, images such as the router manufacturer’s logo. An attacker can determine the path names of these logo images ahead of time, and then use an HTML <IMG> tag in conjunction with the OnLoad() handler to perform attack steps for specific routers as follows:

<IMG SRC = "http://router-IP-address/path-to-logo/specific-logo"
OnLoad = Do-Bad-Stuff()">

If the specific logo file exists at the specified IP address, then the image will be loaded. In that case, the OnLoad() handler will execute the malicious code. Again, the code assumes that the attacker knows the IP address of the router. As remarked previously, there are well-known defaults for this IP address, and using one of them will likely yield a successful result.

It turns out, though, that there are techniques for determining the router’s IP address even if it is not at the default location. One way to do so is by

  1. Determining the internal IP address of the victim’s machine; and

  2. Scanning the local address space for the router.

It turns out that (1) can be done using a simple Java applet and (2) can be done in JavaScript. We describe these techniques next.

Internal IP Address Determination and JavaScript Host Scanning

Using a Java applet developed by Lars Kinderman [214], one can determine the internal IP address of the machine. Note that the applet finds the internal address. Therefore, being behind a firewall or proxy server will not protect the end user. An alternative approach was described by Petkov [311].

Once the local IP address is known, one can loop through all addresses on the same subnet (for example, by trying different values for the last octet of the IP address). At each address, the attacker can use the previously mentioned finger-printing techniques. Given the internal IP address, one can usually determine which address to start with when doing such a scan. Because most home networks are small, finding the IP address of the home broadband router will not take many tries. The general technique of JavaScript host scanning was described by Grossman [157].

6.2.4 Additional Comments

Visiting the Attacker’s Web Page

To become victimized, all a user has to do is examine a web page containing the offending JavaScript code. No traditional download and installation of malicious software is necessary; simply looking at the web page is enough.

One approach that an attacker can take is to set up the malicious web page and then drive traffic to it—for example, by making the site contain attractive content (such as salacious celebrity gossip, or illicit photos). The attacker can also try to manipulate search engine rankings by including popular content. Finally, the attacker can drive traffic by purchasing banner ads and strategic keywords.

Alternatively, an attacker can try to break into an existing web server and host malicious code on its web pages. Then, all visitors to those web pages will be victimized.

A third possibility is for an attacker to find a cross-site scripting vulnerability on an existing web site. Such a vulnerability can, for example, allow an attacker to inject JavaScript code into a web page. One common case is for the code to be injected through the query string on a URL that points to a server-side script. Then, anyone who visits the URL with that particular query string will run the attacker’s code in the context of the user’s web browser. The attacker can advertise the URL in an email message or as a hyperlink off another web page (or even using a banner advertisement). Victims might be more inclined to click on the hyperlink, especially if the attacker found a cross-site scripting vulnerability on a well-known site.

Drive-By Pharming in the Wild

As of October 2007, no examples of drive-by pharming were actually known in the wild. Nevertheless, this kind of attack is fairly simple and involves piecing together well-known building blocks. Also, the attack has potentially devastating consequences, because the victim merely needs to view the offending web page for the attacker to effectively control the victim’s Internet connection. Given that the attack affects potentially anyone with a home broadband router, the implications are staggering. Therefore, it would not be a stretch to imagine that attackers will employ this attack (if they have not done so already). We believe that anyone sufficiently familiar with the technical details of concepts such as JavaScript host scanning and CSRFs could put the pieces of the drive-by pharming attack together. The underlying technical details are, in many ways, less interesting than the overall implications.

Fortunately, there are many simple countermeasures to drive-by pharming, which we discuss later. Given the simplicity of the attack and the simplicity of the defense, we thought it prudent to warn people of the threat before they became victims.

Proof of Concept

We implemented proof-of-concept systems to test the drive-by pharming attack methodology, in which we changed a router’s server settings and verified that the new server was being used to resolve requests. We were successfully able to carry out the attack methodology on the following routers: Linksys WRT54GS, Netgear WGR614, and a D-Link DI-524. Given that Linksys, Netgear, and D-Link represent three of the largest home broadband router manufacturers, the number of affected routers is potentially very large.

6.2.5 Countermeasures

A number of simple countermeasures may be deployed to defend against drive-by pharming attacks. We describe some of these measures here.

Changing the Router Password

The simplest way to defend against this kind of attack is to change the administrative password on the home broadband router to something less easily guessed by the attacker. (Note that this password is separate from the one used to actually gain Internet access.) The Symantec Security Response web log contains links to the password change pages for Linksys, Netgear, and D-Link routers [327]. If the attacker guesses the wrong password, then a dialog box will pop up alerting the user of an incorrect guess, which might arouse suspicion. Therefore, the attacker has to guess the password correctly the first time (which implies that even a mildly difficult-to-guess password defends against this attack). However, we still recommend that users choose a strong password.

Local Firewall Rules

If a user is employing a PC-level firewall, such as the one found in the Norton Internet Security software suite, then the user can add rules to the firewall that alert users of connections to the router’s management interface. (Note that many PC-level firewall programs might not allow fine-grained user-defined rules.)

CSRF Protection of Routers

There are numerous techniques to make one’s web site secure against CSRFs, such as the one described for the drive-by pharming attack. For example, the web site can require a hidden (and unpredictable) variable to be submitted as part of any form input—perhaps the session ID. This input can be validated by the web server before it honors any client requests. If a user is legitimately changing the router settings, then this variable will be passed along with the form input, and the changes will be processed. An attacker, by contrast, will not know the value of the variable and, therefore, cannot include it in his or her request. The validation attempt by the web server will fail, and the request will not be completed. Note that DNS rebinding attacks, which are covered in Chapter 7, can be used to overcome CSRF protection in routers.

A number of other approaches can also be used to defend against drive-by pharming attacks. A full treatment of such approaches is beyond the scope of this section.

Conclusion

This section described drive-by pharming, a type of attack that allows an attacker to change the DNS server settings on a user’s home broadband router. Once these settings are changed, future DNS queries will be resolved by the attacker’s DNS server. The result is that the attacker effectively controls the victim’s Internet connection and can use that control to obtain any sensitive information the victim enters as part of an Internet transaction (e.g., passwords, credit card numbers, bank account numbers). The attack requires only some simple native JavaScript code, and the victim merely has to view the malicious page.

In addition to pharming, this attack methodology can be used to make other router configuration changes, such as malicious firmware upgrades. We believe drive-by pharming has serious widespread implications because it could potentially affect any home broadband user. Fortunately, there are a number of simple defenses, including changing the password on the home broadband router. A more thorough technical report that goes into the attack details is available online [389].

6.3 Using JavaScript to Commit Click Fraud*

This chapter introduces a new and dangerous technique for turning web-site visitors into unwitting click-fraudsters. Many recent click fraud attacks have been based on traditional malware, which installs itself on a user’s machine and simulates the clicking of advertisements by the user [150, 230, 296, 483]. The attack presented here is certainly easier to accomplish than that of infecting a machine with malware, as all it requires is that a user visit a web site in a JavaScript-enabled browser. This attack, which is referred to as a badvertisement, has been experimentally verified on several prominent advertisement schemes.

In brief, the attack allows a fraudster to force unwitting accessories to perform automated click-throughs on ads hosted by the fraudster, resulting in revenue generation for the fraudster at the expense of advertisers. This means a higher number of visits registered for a sponsored ad, leading to a higher per-ad cut revenue for the publisher. While it may at first appear that this attack should be easily detected by inspection of the click-through rates from the domain in question, this is unfortunately not so. The fraudster can cause both click-throughs and non-click-throughs (in any desired proportion) to be generated by the traffic accessing the corrupted page, while keeping end users corresponding to both of these classes unaware of the advertisement. A fraudster can even generate click-fraud revenue from traffic to a site that is not allowed to display advertisements from a given provider. A typical example of such a site would be a pornographic site.

Owners of sites that are used to generate revenue for a “badvertiser” might be unaware of the attack they are part of. Given the invisibility of the attack, domain owners may remain unaware of the existence of an attack mounted by a corrupt web master of theirs, or a person who is able to impersonate the web master. The latter ties the problem back into phishing once again.

This section starts by defining some important terms. It then gives an overview of the basic building blocks of the attack before diving into the implementation details of making a badvertisement in Section 6.3.3. Section 6.3.4 explains in detail the techniques used to cover the attacker’s tracks to prevent discovery performed using reverse spidering. We then describe scenarios that can be used as potential media for deploying such attacks and explore the economic losses associated with their implementation in Section 6.3.7. Finally, we outline some potential countermeasures.

6.3.1 Terms and Definitions

The following are definitions of terms as used in the context of click fraud.

Phishing. Attempting to fraudulently acquire a person’s credentials, usually for financial gain.

JavaScript. A simple programming language that is interpreted by web browsers. It enables web site designers to embed programs in web pages, making them potentially more interactive. Despite its simplicity, JavaScript is quite powerful.

REFERER. When a web browser visits a site, it transmits to the site the URL of the page it was linked from, if any. That is, if a user is at site B and clicks a link to site A, when the web browser visits A, it tells A that B is its REFERER. The REFERER information need not always be provided, however. Note that this word is not spelled in the same way as the English word “referrer.”

Spidering. The process of surfing the web, storing URLs, and indexing keywords, links, and text. It is commonly used by search engines in their efforts to index web pages [385].

robots.txt. A file that may be included at the top level of a web site, specifying which pages the web master does not wish web spiders to crawl. Compliance is completely voluntary on the part of web spiders, but is considered good etiquette [353].

Reverse spidering. When an ad is clicked, the ad provider can track the page responsible for serving that ad. That page is known as the REFERER of the ad. We use the term “reverse spidering” to refer to the ad provider’s spidering of the REFERER page [140].

Dual-personality page. A page that appears differently when viewed by different agents, or depending on other criteria. Typically one “personality” of the page may be termed “good” and the other “evil.”

6.3.2 Building Blocks

The badvertisement attack consist of two components: delivery and execution. The first component either brings users to the corrupt information or brings corrupt information to the users. The second component causes the automated but invisible display of an advertisement to a targeted user.

The Delivery Component

Bringing users to the corrupt information may rely either on sites that users visit voluntarily and intentionally due to their content or on sites that may contain information of no particular value, but which users are tricked to visit.

Users may be tricked to visit sites using known techniques for page rank manipulation. Fraudsters may also use spam as a delivery approach and try to entice recipients to click on a link, causing a visit to a site with badvertisements. Sophisticated spam-driven attempts may rely on user-specific context to improve the probabilities of visits. For example, a fraudster may send out email that appears to be an electronic postcard sent by a friend of the recipient; when the recipient clicks on the link, he or she will activate the badvertisement. An 80% visit rate was achieved in recent experiments by Jagatic et al. [197] by using relational knowledge collected from social network sites. Bringing corrupt information to users may simply mean sending emails with content causing an advertisement to be clicked. This is possible whenever the targeted user has a mail reader in which JavaScript is enabled.

The Execution Component

For all situations described here, successful execution can be achieved when the fraudster can cause the spam email in question to be delivered and viewed by the targeted user. The increasing sophistication of spammers’ obfuscation techniques means that one cannot count on a spam filter as the only line of defense against badvertisement attacks delivered through email. Although this delivery component is similar to that used by phishing attacks, here the message can be much more general and, therefore, more difficult to identify and filter. The JavaScript code actually containing the attack cannot reliably be filtered or identified, owing to the vast array of ways in which this code can be obfuscated.

Here’s an example of how eval may be used to obfuscate JavaScript code. The statement

for( i = 1; i <= 10; i++ ) {
     document.write ( i );
}

is functionally equivalent to the statement

code = "f@o"+"#r(i@"+"#="+1+";@"+"i#<=1%0"+";i"
       +"+@ #+){@"+"d#oc"+"%um#"+"en%t."+"@w#r"
       +"i#te"+"(@i + ";<b" +"r >");}";
eval(code.replace(/[@#%]/g, ""));

Each statement prints the numbers 1–10 when executed by a JavaScript interpreter.

The badvertisment attack does rely on clients having JavaScript enabled. This is not a real limitation, however, both because 90% of web browsers have it enabled and because the advertising services themselves count on users having JavaScript enabled. Thus the execution component of the attack relies simply on a JavaScript trick that causes an ad to be automatically clicked and processed by a client’s web browser. This causes the click to be counted by the server side, which in turn triggers a transfer of funds from the advertiser to the domain hosting the advertisement (i.e., the fraudster).

The key feature of a successful attack is to display effectively different text to the user than to any agent that audits the advertiser for policy compliance. This is not as simple as serving a different page to any agent identifying itself as a web spider; it is, for example, known that Google performs some amount of spidering without identifying itself as Google [43].

6.3.3 The Making of a Badvertisement

A successful badvertisement is one that is able to silently generate automatic click-throughs on advertisement banners when users visit the site, but remains undetected by auditing agents of the ad provider. Further, while the ad provider may place restrictions on the type of content that may be shown on a page containing its ads, a fraudster would like to be able to show prohibited content (e.g., pornography) on the badvertisement pages that may draw higher traffic. To this end, we introduce two concepts—a façade page and a dual-personality page. A dual-personality page changes its personality (behavior) based on the kind of its visitor. For an oblivious visitor, the dual-personality page takes on its “evil” form to commit auto-clicking in the background. To avoid being caught, the dual-personality page takes on the “good” form for a suspicious auditor. Thus, when a spider crawls through pages to check for suspicious activities, it sees only the “good” side of the dual personality page.

To monetize prohibited content, the fraudster can use the façade page. It acts as an interface for the visitor to the dual-personality page and exists on a different domain that is not registered with the ad provider.

Figure 6.2 gives a graphical representation of these two pages. In the following explanations, we suppose the existence of two web sites: www.verynastyporn.com, which contains the façade page, and www.veryniceflorist.com, which is registered with the ad provider and contains the dual-personality page.

Figure 6.2. Pages to hide badvertisements from users and from auditors. (a) A dual-personality page. To an oblivious user it behaves in an “evil” manner and includes badvertisements (because it “knows” that it will not be detected). To a suspicious auditor, it appears good (because it has detected that it is being audited and has disabled any evil portion). (b) A dual-personality page included in a façade page.

image

Next, we give a brief overview of how JavaScript is used in these techniques. We then describe the implementation of both the façade page and the dual-personality page.

JavaScript for Click Fraud

Typical online advertisement services (such as AdBrite [3] and the others we examine) work by providing web masters with a snippet of JavaScript code to add to their pages. This code, which is executed by the web browser of a visitor to the site, downloads ads from the advertiser’s server at that time. The ad download triggers a rewrite of the frame in which the JavaScript appears, replacing it with the HTML code necessary to display the ads. When a user clicks an advertisement link, he or she “clicks through” the ad provider’s server, giving the ad provider the opportunity to bill the client for the click. The user is then taken to the ad client’s home page. Figure 6.3 illustrates this scenario.

Figure 6.3. Normal use of the advertising programs we examine. A web master who wishes to serve sponsored advertisements places a snippet of JavaScript in his or her page that downloads ads from the ad provider when the page is loaded. If a user clicks a link for one of those ads, it is forwarded to the ad client. On the way, the click is registered so that the ad provider can bill the advertiser and pay the web master a share.

image

Badvertisements, then, contain additional JavaScript that simulates automatic clicking. This attack is accomplished as follows: After the advertiser’s JavaScript code runs and modifies the page to contain the ads, the badvertisement JavaScript parses the resulting HTML and compiles a list of all hyperlinks. It then modifies the page to contain a hidden frame that loads an advertiser’s site, creating the impression that the user has clicked an advertisement link (Figure 6.4).

Figure 6.4. Auto-clicking in a hidden badvertisment. Compared to Figure 6.3, the ad banner presented here is hidden. JavaScript code extracts links from the hidden ad banner and causes them to be displayed in an another hidden iframe, creating the impression that the user has clicked these links.

image

JavaScript is used both by the legitimate advertiser and by the badvertiser. The latter uses JavaScript to parse and edit the JavaScript of the legitimate advertiser.

As an aside, note that it is possible for JavaScripts to be obfuscated to the point that their effect cannot be determined without actually executing them. This obfuscation allows the fraudster to hide the fact that an attack is taking place from bots that search for JavaScripts containing particular patterns that are known to be associated with badvertisements. One such obfuscation method is use of the eval keyword, which allows JavaScript code to programmatically evaluate arbitrarily complex JavaScript code represented as string constants. In essence, this technique allows the inclusion of arbitrary noise in the script, to make searching for tell-tale signs of click fraud fruitless.

Caveat. The attack we present here works on a large variety of platforms but, notably, not on Google AdSense. Google’s approach is different in two ways from the vulnerable schemes that we show how to attack.

One difference lies in the fact that the code is downloaded from the Google server to fetch the ads (a script called show_ads.js). Whereas the attacked schemes allow these scripts to be accessed by another JavaScript component, this is prevented in AdSense by defining all functions inside a single anonymous function.

To illustrate this difference, here is pseudocode for the general strategy used by most providers to print ads:

function print_ads() {
  for(each ad) {
    document.write(text of ad);
  }
}

Note that each ad is inserted directly in the current document.

The following pseudocode shows AdSense’s more secure strategy (in greatly simplified form):

(function() {
  function print_ads() {
    document.write("<iframe src=url of ad server>");
  }

  print_ads();
})()

Here, an anonymous function wraps all declarations to keep them out of the global namespace. Further, the downloaded script does not print the ads itself; it prints an internal frame (iframe) that draws its source from yet another script. This allows browser policies to protect the contents of the iframe from reading by other scripts in the page (such as badvertisment scripts).

A second distinction is that the AdSense code does not rewrite the current page to include the ads; it rewrites the current page to include an internal frame with its source set to the output of yet another script. It is this latter script that actually generates the ads. Given that the source of this generated frame is a Google server, JavaScripts served by other sites (such as any served by a fraudster) are prevented by a browser from accessing it. This would prevent a fraudster from fetching a list of links for auto-clicking.

6.3.4 Hiding the Attack

Client Evasion

As mentioned in the previous section, hidden frames can be used to create an impression of a click for the advertisement URL identified using JavaScript. Repeating this process several times will create the impression on the server side that the user has clicked several of the advertisement links, even though he or she may not have. The behavior described here is implemented in a dual-personality page placed on veryniceflorist.com. This entire site would then be included in a hidden frame in the façade page registered as verynastyporn.com. Thus it will invisibly generate artificial clicks that appear to come from veryniceflorist.com whenever a user visits verynastyporn.com.

Ad-Provider Evasion

Concealing the badvertisement from the legitimate ad provider is somewhat more difficult than hiding it from users. The auditing method commonly used by ad providers is a web spider that follows the REFERER links on clicked advertisements—that is, when a client clicks an ad on a page, an auditing spider may choose to follow the REFERER link back to the page that served the clicked ad. The goal of the fraudster must then be to detect when the page is being viewed by an auditing spider and to serve a harmless page for that instance. The fraudster does so with the dual-personality page; the entire process is as follows (also summarized in Figure 6.5):

  1. Create a personalization CGI script that runs when a user visits the façade. This script simply assigns a unique ID to the visitor and redirects the visitor to a second CGI script, passing the ID as a GET parameter so that it appears in the URL.

  2. The second CGI script outputs the façade, complete with its script placeholder. This placeholder is configured to run a third CGI script, also passing the ID along.

  3. The third CGI script may now choose which JavaScript to generate. If the ID passed has never been visited before, the CGI script will generate an “evil” JavaScript; if the ID has been previously visited, the CGI script will generate some “good” JavaScript. (This script might also use criteria other than an ID to determine which personality to show—for instance, REFERER, browser type, or even time of day.)

Figure 6.5. Assignment of IDs prevents any agent not entering through the façade from seeing the evil JavaScript. The dual-personality page at veryniceflorist.com reveals its evil side only when passed an as-yet-unvisited ID. When it is given no ID or a visited ID, it shows its good side. When included from the façade page, veryniceflorist.com has an unvisited ID and thus contains badvertisments. However, when a user visits veryniceflorist.com normally, or when a reverse spider from the ad provider audits it, it will show its good side, because these requests contain no ID and a visited ID, respectively. Thus the only way to get veryniceflorist.com to expose its evil personality is to visit it through verynastyporn.com.

image

The overall effect of this rather strange configuration is a “one-way door” to the badvertisment. Users accessing the site through the façade load badvertisements links have been visited once before (by the client), the “one-way door” will prevent the evil personality from becoming active. The result is that the user, when visiting a site with badvertisements, will see only the original content of the site (without the badvertisements). The advertiser will be given the impression that its ads (as served by the ad provider) are very successful (i.e., they are viewed often). Of course, the advertiser is not really getting its money’s worth, as the ads are not really being viewed by users.

As an example, consider the sequence of events that will occur when a user visits verynastyporn.com for the first time:

  1. The visitor will first be served the façade page, which will invisibly include www.veryniceflorist.com. The user will also be assigned a unique ID, which will be passed to veryniceflorist.com.

  2. A CGI script at veryniceflorist.com will output a completely legitimate page (including visible advertisements and no auto-clicking), but will also include a placeholder for another JavaScript.

  3. The visitor’s browser will request the JavaScript, passing the unique ID. The server will check whether the ID has already been registered; given that it will not be, the server will return the badvertisement JavaScript and register the ID as visited.

  4. The user’s browser will interpret the JavaScript, causing the ads in the (invisible) veryniceflorist.com site to be auto-clicked.

  5. To the ad provider, the clicks will appear to come from veryniceflorist.com.

The fraudster may choose to pass on a large proportion of chances to serve badvertisements, so as to be more stealthy. This would be accomplished by modifying step 1 to not include veryniceflorist.com some proportion of the time, or by changing step 4 to not generate auto-clicks sometimes. The fraudster might also randomly choose which ad will be auto-clicked from the served list. This will ensure that the same IP address does not generate an excessive number of clicks for a single advertisement, as a different IP will initiate each.

Figure 6.5 illustrates how the web site will change its appearance only when safe to do so. Consider its behavior when visited by various agents:

Human visitor (at verynastyporn.com). The human visitor will not navigate directly to the page; it will likely be invisibly included in the façade page. Thus the user will have an unvisited ID, and auto-clicks may be generated by the fraudster. In addition to considering only whether the ID has been visited, many other policies could be used to determine whether to serve the badvertisment. For instance, the evil web master might choose to serve only good scripts to users coming from a particular IP range or users visiting at a certain time of the day. Using techniques such as those described in [199], a web master might even choose to serve a particular script only to those users with certain other sites in their history.

Human visitor or standard spider (at veryniceflorist.com). Here, the human visitor (or spider) will not have a unique ID, so veryniceflorist.com will show its good personality and not include auto-clicking.

Standard (forward) spider (at verynastyporn.com). Typical web spiders do not evaluate JavaScript in pages. Indeed, it is in general not useful for a web spider to interpret JavaScript, as this is typically used to enhance interaction with users and not to generate text that the web spider might index. Here, however, this fact gives the fraudster an opportunity to hide illicit text from the web spider by causing that text to become available only through the interpretation of JavaScript. Thus, while a spider may visit the façade page and receive an unvisited unique ID, the fraudster may make the badvertisment page visible only after interpretation of JavaScript and, therefore, cause it to remain invisible to the spider.

Reverse spider (at veryniceflorist.com). The reverse spider may not enter the site by way of the façade; rather, it follows a REFERER link on a clicked ad. Any REFERER link will contain the unique ID issued by the façade, and veryniceflorist.com will show its good personality to an already-visited ID. Because any ID in a REFERER link must already be visited (by the user who unwittingly generated the auto-click), the reverse spider will always see the good veryniceflorist.com page (even if it evaluates JavaScript).

In any case, the advertiser’s detection of an offending site does not necessarily prevent its owner from earning any illicit revenue. A fraudster might, for example, register many veryniceflorist-like domains under different assumed names, and parcel traffic over them so that the advertiser’s detection of any one of them eliminates only a portion of the total profits.

The knowledge that this could be the case would prevent an ad provider from immediately blacklisting a host upon detection of one (or a small number of) auto-clicks for a given domain. This behavior is of particular relevance in situations where the badvertisement is sent as spam to a user who is likely to use webmail for displaying his or her email, and where the ad provider may have a small number of email honeypots to detect such attacks.

6.3.5 Why Will Users Visit the Site?

The previous discussion stated that the fraudster will create a legitimate-appearing site that will be approved by the ad provider, and then create a badvertisement JavaScript function that will be loaded from an offsite server for execution only when innocent users visit the site. However, there are other ways the web master might lure users to visit the site, including the use of mass email and hosting the ads on pornographic websites (which itself is prohibited).

Email as a Lure. Multiple variations on the medium through which an attack can be mounted are possible, including through email messages. Email messages can either be spammed indiscriminately to many addresses or spoofed to users so as to appear to be from their acquaintances (cf. [197]). Each email might contain a “hook” line to encourage users to visit the site and a link to the portal page. This also reduces the possibility of a scenario in which a search engine locates the badvertisement, as the abuser would no longer need to list the portal page on any search engine. Thus an ad provider must trap such an email in an email honeypot to locate the site hosting the portal.

Popular Content as a Lure. Unscrupulous web masters with popular sites could easily employ this technique to increase their revenue. In particular, sites that host pornography can take advantage of the high traffic these sites typically generate by hosting hidden badvertisements. This threat was verified by a small experiment conducted at Berkeley, which found that approximately 28% of websites on the Internet contain pornographic content [37].

Although hosting ads with pornography is prohibited by the terms of service by almost all advertising programs, the techniques set forth here could enable a web master to hide all illicit content from spiders, showing it only to users. This would be done by rewriting the page on the client side using JavaScript. We consider this type of content to be distinct from viral content (described next) in terms of the effort that is required by the web master to promote it.

Viral Content as a Lure. A phenomenon associated with the social use of the Internet is the propagation of so-called viral content. This content is typically something that is considered interesting or amusing by a recipient and, therefore, propagates quickly from person to person [201]. A fraudster could host content likely to promote such distribution on his or her site (even cloning or stealing it from another site) and then distribute a link to this content through email, hoping that it is passed along. Such an attack could be perpetrated with minimal bootstrap effort compared to the previous scenario; while the scenario involving popular content as a lure assumes an already-established web site, here a fraudster need only steal a humorous video from another site (for instance) and spoof emails to a few thousand people, hoping they will be sent along.

6.3.6 Detecting and Preventing Abuse

There is an ongoing industry-wide effort to develop tools that will effectively detect and block many common click-fraud attacks. Most of the attacks discovered and reported so far have been malware-based attacks that rely on automated scripts ([151]), individuals hired by competitors [186], or proxy servers used to generate clicks for paid advertisements.

Companies such as AdWatcher [4] and ClickProtector [57] have initiated efforts to counter these attacks. The essence of their approach is to track the IP addresses of machines generating the clicks, and to identify the domain from which the clicks are registered. By collecting large logs and performing expert analysis, irregularities such as a repeated number of clicks for a certain advertisement from a particular IP address or a particular domain or abnormal spikes in traffic for a specific web site may be identified.

Unfortunately, the stealthy attack described in this section will go undetected by any of these tools. It is, therefore, of particular importance to determine other unique mechanisms of detecting and preventing attacks of this nature. These can be divided into two classes: active and passive. Our proposal for the former is intended to detect click-fraud attempts housed on web pages that users intentionally navigate to (whether they wanted to go there or were deceived to think so), whereas the proposal for the latter is suited for detection of email-instigated click fraud.

An active client-side approach interacts with search engines, performs popular searches, and visits the resulting sites. It also spiders through sites in the same manner as users might. To hide its identity, such an agent would not abide by the robots.txt conventions and so would appear as an actual user to the servers it interacts with. The agent would act like a user as closely as possible, including occasionally requesting some advertisements; it would always verify that the number of ad calls that were made correspond to the number of requests that a human user would perceive as being made. (The latter criterion is meant to detect click-fraud attempts in which a large number of ad requests are made after a user initiates a smaller number of actual requests.)

A passive client-side approach observes the actions performed on the machine of the person appearing to perform the click. This may be done by running all JavaScript components in a virtual machine (appearing to be a browser) and trapping the requests for advertisements that are made. Any web page that causes a call of a type that should be made only after a click occurred can be determined to be fraudulent. While this takes care of the type of automated click fraud described in this section, it would not defend against a version that first causes a long (and potentially random) delay and then commits click fraud unless the virtual machine allows randomly selected scripts to run for significant amounts of time, hoping to trap a delayed call. Excessive delays are not in the best interest of the fraudster, as his or her target may close the browser window and, therefore, interrupt the session before a click was made. Passive client-side solutions may be housed in security toolbars or in antivirus software.

Another type of passive solution is an infrastructure component. It would sift traffic, identify candidate traffic, and emulate the client machine that would have received the packets in question, with the intent of identifying click fraud. A suitable application might be an ISP-level spam filter or an MTA. Before emails are delivered to recipients, they could be delivered in virtual machine versions of the same, residing on the infrastructure component, but mimicking the client machine.

For all of these solutions, it should be clear that it is not necessary to trap all abuse. That is, even if a rather small percentage of abuse is detected, it would betray the locations that house click fraud, with a high likelihood that increases with the number of users who are taken to the same fraud-inducing domain. (Of course, a fraudster would not want to limit the number of victims too much, because the profit would become rather marginal.) Evidence shows that if only a small percentage of users had such a client-side detection tool installed, it would make attacks almost entirely unprofitable, given reasonable assumptions on the per-domain cost associated with this type of click fraud. This topic is discussed in depth in Section 6.3.7.

6.3.7 Brief Economic Analysis

This particular type of click fraud can be very profitable, as shown in the following paragraphs, which makes effective countermeasures essential. This discussion hinges on the expected probability that the fraudster will be detected, which we call p. Let us further make the simplifying assumption that the billing cycle of all pay-per-click advertising schemes is one month and adopt that timeframe for an attack cycle.

A particular attack-month would look like this:

• Day 1: Send out a number of attacks, each with a probability p of being instantly caught (thus invalidating the whole attack). This must necessarily cost the fraudster some small amount of money (perhaps to rent time on the botnet used to send the spam emails or to register appropriate domains).

Days 2–30: Some number of tricked users visit the fraudster’s site. With some probability, the fraudster causes these users to (unwittingly and invisibly) commit click fraud in the course of their browsing.

• Day 31: If the fraudster was not caught, he or she receives a check from the ad provider. If he was caught, he receives nothing (and records a loss because of the initial costs).

Note that the fraudster can distribute the attack over multiple domains to amortize the risk of being caught (as detection might mean the loss of profits from only a single domain).

Thus the goal of the countermeasures designer should be to raise the probability p of being caught high enough that the fraudster’s expected profits are close to zero. Next, we examine a few countermeasure schemes and their potential impact on the fraudster.

No Countermeasures. With no defense mechanism in place, there are only two ways for the ad provider to find out about the offending site: (1) by having it reported by someone who stumbles on it or (2) by having a human at the ad provider visit the site after noticing its unusual popularity. Let us (reasonably) assume that the probability of this happening is very low, and consider “very low” to mean p = 2-20. The fraudster’s revenue then grows essentially as a linear function of the number of users targeted (Figure 6.6).

Figure 6.6. Benefit for fraudster, in dollars earned, given several probabilities p of detection by ad providers. Note that when p = 1/1000, the profit hardly goes above zero; when p = 1/10,000, profit tops out at around $500 per domain; after that, the risks quickly outweigh the rewards. Once p shrinks below 1/10,000, however, the fraudster fares much better—this is the current threat situation as far as we can tell. (a) shows the benefit when the reward per click is $0.25; (b) considers the reward per click to be $1.00.

image

Active or Passive Countermeasures. Suppose that the variables here combine to yield p ∊{1/ 1000, 1/ 10, 000, 1/ 100, 000}. The analysis for given hypothetical detection probabilities is shown in Figure 6.6. Note that even a modest detection probability, such as p = 1/ 10,000, limits potential profits considerably.

6.3.8 Implications

This type of click fraud is a serious attack with significant revenue potential for its perpetrators. Phishers might find attacks of this nature more profitable than identity theft. It is not necessary that these attacks have more revenue potential than phishing attacks, but they are certainly more convenient to perform: The perpetrator can make cash directly, rather than coming into possession of credit card numbers that must be used to buy merchandise to be converted into cash. The execution of this type of click fraud does not require significant technical knowledge (assuming the development of a page preprocessor that would insert the malicious code), so it could be performed by almost any unscrupulous web master.

We believe that this attack would generalize to any system of user-site-oriented advertising similar in operation to AdBrite. This idea was verified by testing a few other advertising programs, such as Miva’s AdRevenue Xpress [261], BannerBox [30], and Clicksor [58]. The code proposed to automatically parse and click advertisements can be adapted to parse any advertisements that are textually represented, so long as they can also be parsed by the client browser (and this must be true, of course). Likewise, our techniques for avoiding detection by spiders do not rely on any particular feature of any of these ad providers and would apply equally well to any type of reverse-auditing mechanism.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.237.77