Chapter 2. Writing Rules

ModSecurity is an extremely powerful and versatile web application firewall. However, to be able to utilize its power you need to learn how to tell ModSecurity what you want it to do. That is what this chapter is for—it will teach you all about writing ModSecurity rules, including some interesting uses for ModSecurity that extend beyond just blocking malicious requests (you will for example learn how to redirect requests for files to the closest server depending on where in the world a visitor is located, and you'll learn how to count the number of downloads of a binary file and store the resulting statistics in a MySQL database).

To give you a brief outline of the chapter, here is the order in which we will be looking at the business of writing ModSecurity rules:

  • The syntax of SecRule

  • What variables are available and how to use them

  • Operators, and how they relate to variables

  • Regular expressions—what they are, and why they're important when writing rules

  • Actions—denying, allowing, redirecting, and all the other things we can do when a rule matches

  • Practical examples of writing rules

The first half of the chapter contains basic information on how to write rules so you may find it a bit theoretical, but hang in there because the second half is full of useful examples of how to put what you've learned to use.

SecRule syntax

SecRule is the directive that is used to create ModSecurity rules. Its syntax is simple, but don't let that fool you. For almost any scenario you can imagine where you want to process a request in a certain way (whether by denying it, forwarding it or doing some more advanced processing), there is a way to use SecRule to solve the problem.

The basic syntax of SecRule is as follows:

SecRule Target Operator [Actions]

Target specifies what part of the request or response you want to examine. In the basic example given in the previous chapter, we used the variable named REQUEST_URI, which contains the requested URI on the server, to identify and block any attempts to access the location /secret.html. There are over 70 variables that can be used to create rules, meaning there is likely to be a way to match a rule in almost any circumstance where you need to create a rule.

There is also a special kind of variable called a collection that can hold several values. An example of a collection is ARGS, which contains all of the arguments passed in a query string or via a POST request.

The Operator part of the rule specifies the method and comparison data to use when matching against the specified variable or variables. The default operator, if none other is specified, is @rx, which means that the rule engine will interpret the string that follows as a regular expression to be matched against the specified variable.

Finally, Actions is an optional list of actions to be taken if a rule matches. These can include things such as allowing or denying the request and specifying which status codes to return. If no actions are specified, then the default list of actions, as set by using SecDefaultAction, is used.

Let's take a look at an example to make things a little more clear. Imagine the following scenario: You are a small business owner selling cookbooks in the PDF file format on your web site. To entice prospective customers, you offer a sample chapter containing the most delicious recipes in the book, which they can download free of charge to see if they want to spend their hard-earned money on your book.

Everything is running along nicely—or so you think—and then suddenly you get a complaint via email saying that your site has become very slow. A bit worried, you fire up your web browser and find that your site is indeed painfully slow. When looking at the output of the web server log files you notice that one particular IP is literally flooding your web server with requests for the sample chapter. The user-agent string of the evildoer is set to "Red Bullet Downloader", which you gather is some sort of download manager that is misbehaving badly.

You start worrying about how long the user will hammer away at the server, but then you remember that the site runs ModSecurity, and your hope is restored. Logging into your account via SSH, you put the following line in your ModSecurity configuration file and then restart Apache:

SecRule REQUEST_HEADERS:User-Agent "Red Bullet" "deny,nolog"

When you next try to access your site, it is once again working smoothly and peace is restored.

In this example, REQUEST_HEADERS is a collection (we'll learn more about these shortly), containing all of the headers sent by the client. Since the particular header our brave web site owner was interested in is called User-Agent, he accessed it using REQUEST_HEADERS:User-Agent, which is the syntax to use when you want to get hold of a field in a collection. The next part, enclosed in double quotes, is a regular expression (we will learn more about these soon as well). Since the offending user agent string is "Red Bullet Downloader", the regular expression"Red Bullet" will match it, triggering the rule. The final part of the rule, deny,nolog, is the action to be taken when the rule matches. In this case, the action specifies that the request should be denied (and kept out of the log files), and ModSecurity is happy to do so to ensure that the hero in our story doesn't lose any sleep over the misbehaving download manager.

Variables and collections

Take a moment to have a closer look at just which variables are available for use. There are a lot, so I have placed them in Appendix A.

ModSecurity uses two types of variables: Standard variables, which simply contain a single value, and collections, which can contain more than one value. One example of a collection is REQUEST_HEADERS, which contains all the headers sent by the client, such as for example User-Agent or Referer.

To access a field in a collection, you give the collection name followed by a colon and then the name of the item you want to access. So if for example we wanted to look at the referrer in a particular request we would use the following:

SecRule REQUEST_HEADERS:Referer "bad-referer.com"

Note

As a side note, yes, the header name is actually (mis-)spelled referer and not referrer. The original HTTP specification contained this error, and the "referer" spelling has stuck—so if you're obsessive about spelling and are ever writing an Internet protocol specification, make sure you run the spell checker over the document before submitting the final draft, or you could well be kicking yourself for years to come.

Most collections can also be used on their own, without specifying a field, in which case they refer to the whole of the data in the collection. So for example, if you wanted to check all argument values for the presence of the string script you could use the following:

SecRule ARGS "script"

In practice, if the query string submitted was ?username=john&login=yes then the above would expand to this when the rule is evaluated:

SecRule ARGS:john|ARGS:login "script"

The following collections are available in ModSecurity 2.5:

  • ARGS

  • ENV

  • FILES

  • FILES_NAMES

  • FILES_SIZES

  • FILES_TMPNAMES

  • GEO

  • IP

  • REQUEST_COOKIES

  • REQUEST_COOKIES_NAMES

  • REQUEST_HEADERS

  • REQUEST_HEADERS_NAMES

  • RESPONSE_HEADERS

  • RESPONSE_HEADERS_NAMES

  • SESSION

  • TX

  • USER

Some collections have fixed fields, such as the GEO collection, which contains fields such as COUNTRY_NAME and CITY. Other collections, such as REQUEST_HEADERS have variable field names—in the case of REQUEST_HEADERS it depends on which headers were sent by the client.

It is never an error to specify a field name that doesn't exist or doesn't have a value set—so specifying REQUEST_HEADERS:Rubber-Ducky always works—the value would not be tested against if the client hasn't sent a Rubber-Ducky header.

The transaction collection

The TX collection is also known as the transaction collection. You can use it to create your own variables if you need to store data during a transaction:

SecRule REQUEST_URI "passwd" "pass,setvar:tx.hackscore=+5"
SecRule REQUEST_URI "<script" "pass,setvar:tx.hackscore=+10"
SecRule TX:HACKSCORE "@gt 10" deny

In the first two rules we use the setvar action to set the collection variables. You use this action whenever you want to create or update a variable. (You can also remove variables by using the syntax setvar:!tx.hackscore as prefixing the variable with an exclamation mark removes it.)

The TX collection also contains the built-in fields TX:0 and TX:1 through TX:9. TX:0 is the value that matched when using the @rx or @pm operators (we will learn more about the latter operator later). TX:1 TX:9 contain the captured regular expression values when evaluating a regular expression together with the capture action.

Storing data between requests

There are three types of collections in ModSecurity that can be used as persistent storage. We have already seen that it is possible to use setvar to create a variable and assign a value to it. However, the variable expires and is no longer available once the current request has been handled. In some situations you would like to be able to store data and access it in later requests.

There are three collections that can be used for this purpose:

  • IP

  • SESSION

  • USER

The IP collection is used to store information about a user from a specific IP address. It can be used to store such things as the number of failed access attempts to a resource, or the number of requests made by a user.

Before we can use one of these collections, we need to initialize it. This is done by using the initcol action:

SecAction initcol:ip=%{REMOTE_ADDR},nolog,pass

We also need to make sure that we have configured a data directory for ModSecurity to use:

SecDataDir /var/log/httpd/modsec_data

Make sure that the directory is writable by the Apache user or the initcol action will not work properly. Now that this is done we can use the IP collection in conjunction with setvar to store user-specific data.

Examining several variables

It is possible to look in several variables at once to see if a matching string can be found. If for example we wanted to examine both the request headers and the request arguments passed for the string park and deny any matching requests we could use the following rule:

SecRule ARGS|REQUEST_HEADERS "park" deny

As can be seen the pipe character (|) is used to separate the variable names, and it functions a lot like the logical or you might be familiar with if you've done any programming.

Quotes: Sometimes you need them and sometimes you don't

You may be wondering what the difference is between the following:

SecRule REQUEST_URI "secret" "deny"

and this:

SecRule REQUEST_URI secret deny

In this case there is no difference. If both the operator expression and action list don't contain any whitespace then they don't need to be enclosed in quotes. However, if the rule was modified to match the string secret place then we would need to enclose this string in quotes:

SecRule REQUEST_URI "secret place" deny

The essence of quotes as they apply to ModSecurity is that anything enclosed in quotes is considered as "one part", meaning that the"secret place" string is considered to be part of the operator expression of the rule.

What if we need to specify a string within the operator or action list, and it is already enclosed in quotes? This happens if for example we use the msg: action to write a log message. In this case we would use single quote characters to enclose the string we want to log:

SecRule REQUEST_URI "secret place" "deny,log,msg:'Someone tried to access the secret place!'"

What if even the quoted message needed to include quotes? Let's say that you wanted to log the message "Someone's trying to hack us!". In that case you would need to escape the innermost quote (the one in "someone's") with a backslash. The rule would now look like this:

SecRule REQUEST_URI "secret place" "deny,log,msg:'Someone's trying to hack us!'"

In general throughout this book I tend to enclose operators and action lists in quotes even when not strictly necessary. It makes it easier to later expand on rules without forgetting the quotes.

Remember that you must restart Apache to reload the ModSecurity ruleset. If you were to forget to restart or were distracted by another task then a broken ModSecurity configuration file (resulting, for example, from forgetting to wrap an action list in quotes) would result in Apache refusing to start. This might not be a big deal so long as the server is running along nicely, but if anything such as log rotation were to cause the server to restart then the restart would fail and your web server would be down (and yes, the reason I mention this is because it happened to me in exactly the manner described&mdash;I wouldn't want you making the same mistake).

Creating chained rules

Sometimes you want a match to trigger only if several conditions apply. Say for example that our web site owner from the previous example wanted to block the troublesome downloader, but this downloader was also used by other clients where it wasn't misbehaving by downloading the same file over and over. Also, for the sake of argument let's assume that it wouldn't be possible to just block the client's IP address as he was on DSL and frequently appeared with a new address.

What we'd want in this case is a rule that denies the request if the user-agent string contains "Red Bullet" and the IP address of the client belongs to the subnet range of a particular ISP.

Enter the chain action. Using this, we can create a chain of rules that only matches if all of the individual rules in the chain match. If you're familiar with programming, you can think of chained rules as rules with a logical and operator between them&mdash;if a single one of them doesn't match then the rule chain fails to match and no action in the action list is taken.

In the example we're looking at, the first rule in the chain would be the same as previously:

SecRule REQUEST_HEADERS:User-Agent "Red Bullet" "deny"

The second rule should trigger only if the client had an IP address within a particular range, say 192.168.1.0 192.168.1.255:

SecRule REMOTE_ADDR "^192.168.1."

This rule triggers for any clients whose IP address starts with 192.168.1. As you can see we don't include any action list in the above rule. This is because in rule chains, only the first rule can contain disruptive actions such as deny, so we could not have placed the deny action in this rule. Instead, make sure you always place any disruptive actions in the first rule of a rule chain. In addition, metadata actions such as log, msg, id, rev, tag, severity, and logdata can also only appear in the first rule of a chain. If you try to put such an action anywhere but in a chain start rule, you'll get an error message when ModSecurity attempts to reload its rules.

Now all we need to do is specify the chain action in the first rule to chain the rules together. Putting it all together, this is what the rule chain looks like:

SecRule REQUEST_HEADERS:User-Agent "Red Bullet" "chain,deny"
SecRule REMOTE_ADDR "^192.168.1."

You can chain an arbitrary number of rules together. If we had also wanted to add the condition that the rule should only be active before 6 PM in the evening, we would add another rule at the end of the chain, and make sure that the second rule also contained the chain action:

SecRule REQUEST_HEADERS:User-Agent "Blue Magic" "chain,deny"
SecRule REMOTE_ADDR "^192.168.1." "chain"
SecRule TIME_HOUR "@lt 18"

The operator in the last rule&mdash;@lt&mdash;stands for "less than" and is one of the operators that can be used to compare numbers. We'll learn about all of the number comparison operators in a little while.

Rule IDs

You can assign an ID number to each rule by using the id action:

SecRule ARGS "login" "deny,id:1000"

This allows the rule to be identified for use with:

  • SecRuleRemoveById (removes the rule from the current context)

  • SecRuleUpdateActionById (updates a rule's action list)

  • skipAfter:nn (an action&mdash;jump to after the rule with the ID specified)

The SecMarker directive should be mentioned here. Its purpose is to create a marker, which is essentially a rule with just an ID number, for use with the action skipAfter.

The following example checks to see if the ModSecurity version is at least 2.5, and skips over a set of rules in case an older version that may not support them is installed:

SecRule MODSEC_BUILD "@lt 020500000" "skipAfter:1024"
...
Rules requiring version >= 2.5
...
SecMarker 1024

An introduction to regular expressions

Regular expressions are an important part of writing ModSecurity rules. That is why this section contains a short introduction to them and why the book also has an appendix that describes them in more detail.

Regular expressions are a very powerful tool when it comes to string matching. They are used to identify a string of interest, and are useful for many different tasks, such as searching through large text files for a given pattern, or, as used in ModSecurity, to define patterns which should trigger a rule match.

Programming languages such as Perl come with regular expression support built right into the syntax of the language (in fact, the PCRE library that was mentioned in the previous chapter that is used by Apache and ModSecurity is a re-implementation of the regular expression engine used in Perl). Even Java's String class has the matches() method which returns true if the string matches the given regular expression.

Regular expressions are so ubiquitous today that they are often referred to by the shorthand name regexp or regex. In this book, regular expression and regex are used interchangeably.

When ModSecurity uses regular expressions to match rules, it looks within the targeted text string (or strings) to see if the specified regex can be matched within. For example, the following rule will match any request protocol line that contains the string HTTP, such as HTTP/1.0 or HTTP/1.1:

SecRule REQUEST_PROTOCOL "HTTP"

In this way, the regular expressions you provide when writing ModSecurity rules function much like the Linux utility grep, searching for matching patterns in the given variables, and triggering a match if the pattern was found.

As we learned previously, @rx (the regular expression operator) is implied if no other operator is specified (and hence doesn't even need to be specified), so when ModSecurity encounters a rule that doesn't have an operator it will assume you want to match the target against a regular expression.

Examples of regular expressions

I'm not going to provide any formal specification of how regular expressions work here, but instead I will give a few short examples to allow you to get a better understanding of how they work. For a more complete overview, please see Appendix B which contains a primer on regular expressions.

The following examples cover some of the most common forms of regular expressions:

Regular Expression

Matches

joy

Any string containing the character j, followed by an o and a y. It thus matches joy, and enjoy, among many others. Joyful, however, does not match as it contains an uppercase J.

[Jj]oy

Any string that starts with an upper-case J or a lower-case j and is followed by o and y. Matches for example the strings Joy, joy, enjoy, and enJoy.

[0-9]

Any single digit from 0 to 9.

[a-zA-Z]

Any single letter in the range a-z, whether upper- or lower-case.

^

Start of a string.

^Host

Host when it is found at the start of a string.

$

End of a string.

^Host$

A string containing only the word Host.

. (dot)

Any character.

p.t

pat, pet, and pzt, among others.

Regular expressions can contain metacharacters. We have already seen an example of these in the table above: ^, $, and "dot" don't match any one character but have other meaning within regular expressions (start of string, end of string and match any character in this case). The following table lists some additional metacharacters that are frequently used in regexes:

Metacharacter

Meaning

*

Match the preceding character or sequence 0 or more times.

?

Match the preceding character or sequence 0 or 1 times.

+

Match the preceding character or sequence 1 or more times.

So for example if we wanted to match favorite or favourite, we could use the regex favou?rite. Similarly, if we wanted to match either previous or previously we could use the regex previous(ly)?. The parentheses&mdash;()&mdash;group the ly characters and then apply the ? operator to the group to match it 0 or 1 times (therefore making it optional).

So what if we really do want to match a dot literally, and not have it interpreted as any character? In that case we need to escape the dot with a backslash. Referring back to our previous example, if we really did want to match the string p.t literally, we would use the regex p.t to ensure that the dot is interpreted like a literal character and not a metacharacter by the regex engine.

More about regular expressions

If you are already familiar with regular expressions, the preceding examples probably didn't teach you anything new. If, however, you feel that you need to learn more about regexes before you feel comfortable with them, I encourage you to read Appendix B for a more in-depth look at how regexes work.

As we will see, there are several ways to match strings using operators other than @rx, but in many situations, regular expressions are the only tool that will get the job done, so it definitely pays to learn as much as you can about them.

Should you find that the appendix tickles your fancy and that you really want to learn about regexes then I can heartily recommend that you get a hold of "Mastering Regular Expressions" by Jeffrey E. F. Friedl (published by O'Reilly), which is the definitive guide to the subject.

Using @rx to block a remote host

To get a better understanding of the default regular expression mode (@rx) of matching, consider the following two rules, which are both equivalent:

# Rule 1 SecRule REMOTE_HOST "@rx .microsoft.com$" deny
# Rule 2 SecRule REMOTE_HOST ".microsoft.com$" deny

Both of the above rules do exactly the same thing&mdash;block any access attempts from users at Microsoft Corporation. The @rx operator is omitted in the second rule, but since the ModSecurity engine interprets the provided string as a regular expression if no other operator is specified, the rules will both match any domain name ending in .microsoft.com.

As we just learned, the reason there is a backslash before the dots (".") in the above rules is that the dot is a special character in regular expressions. On its own, a dot will match any character, which means that the regular expression .microsoft.com would match hostnames ending in .microsoft.com as well as xmicrosoft.com and others. To avoid this, we escape the dot with a backslash, which instructs the regular expression engine that we really do want it to match a dot, and not just any character.

You may also wonder why there is a $ sign at the end of the regular expressions above. Good question! The $ sign matches the end of a line. If we had not specified it, the regular expression would have matched other hostnames such as microsoft.com.mysite.com as well, which is probably not what we want.

Simple string matching

You may ask if there isn't a way to match a string that ends with a certain other sub-string, without having to bother with escaping dots and putting dollar signs at the ends of lines. There is actually an operator that does just that&mdash;it's called @endsWith. This operator returns true if the targeted value ends with the specified string. If, as in the example above, we wanted to block remote hosts from microsoft.com, we could do it by using @endsWith in the following manner:

SecRule REMOTE_HOST "@endsWith .microsoft.com" deny

If we wanted to negate the above rule, and instead block any domain that is not from Microsoft, we could have done it in the following way:

SecRule REMOTE_HOST "!@endsWith .microsoft.com" deny

It is good practice to use simple string matching whenever you don't need to utilize the power of regular expressions, as it is very much easier to get a regular expression wrong than it is to get unexpected results with simple string matching.

The following lists the simple string operations that are available in the ModSecurity engine:

Operator

Description

@beginsWith

Matches strings that begin with the specified string.

Example:

SecRule REMOTE_HOST "@beginswith host37.evilhacker"

@contains

Matches strings that contain the specified string anywhere.

Example:

SecRule REMOTE_HOST "@contains evilhacker"

@containsWord

Matches if the string contains the specified word. Words are understood to be separated by one or more non-alphanumeric characters, meaning that @containsWord secret will match "secret place" and"secret%&_place", but not"secretplace".

Example:

SecRule REQUEST_URI "@containsWord secret"

@endsWith

Matches strings that end with the specified string.

Example:

SecRule REMOTE_HOST "@endsWith evilhacker.com"

@streq

Matches strings that are exactly equal to the specified string.

Example:

SecRule REMOTE_HOST "@streq host37.evilhacker.com"

@within

This is deceptively similar to the contains operator, however the @within operator matches if the value contained in the variable we are matching against is found within the parameter supplied to the @within operator. An example will go a long way towards clearing up any confusion:

Example:

SecRule REMOTE_USER "@within tim,john,alice"

The above rule matches if the authenticated remote user is either tim, john, or alice.

All of the simple string comparison functions are case sensitive. This means that @streq apple will not match the string Apple, since the latter has a capital "A". To match strings regardless of their case, you can use a transformation function to transform the string to be compared into all-lowercase characters. We examine transformation functions in more detail in a later section of this chapter.

On a similar note, the actual operators are not case sensitive, so writing @StrEq works just as well as @streq.

Matching numbers

Both regular expressions and the simple string matching operators work on character strings. As we saw in a previous example, using a regex to match against numbers can be error-prone, and regular expressions can often be cumbersome when you want to match against numbers. ModSecurity solves this problem by providing us with operators that can be used to compare numbers when we know that the arguments we are examining are numeric.

The following are the numerical operators ModSecurity provides:

Operator

Description

@eq

Matches if the variable contains a number that is equal to the specified value.

Example:

SecRule RESPONSE_STATUS "@eq 200"

This rule matches if the response code is 200.

@ge

Matches if the variable contains a number that is greater than or equal to the specified value.

Example:

SecRule RESPONSE_STATUS "@ge 400"

This rule matches if the response code is greater than or equal to 400. Since error codes are defined as having an HTTP status code of 400 or above, this rule can be used to detect HTTP error conditions, such as 404&mdash;page not found.

@gt

Matches if the variable contains a number that is greater than the specified value.

Example:

SecRule RESPONSE_STATUS "@gt 399"

This rule will match the same HTTP response status codes as the one used above, with the difference being that this uses 399 as the argument since we are using the "greater than" operator.

@le

Matches if the variable contains a number that is less than or equal to the specified value.

Example:

SecRule RESPONSE_STATUS "@le 199"

This rule matches if the response code is 199 or below.

@lt

Matches if the variable contains a number that is less than the specified value.

Example:

SecRule RESPONSE_STATUS "@lt 200"

This rule also matches if the response code is 199 or below.

More about collections

Let's look at some more things that can be done with collections, such as counting the number of items in a collection or filtering out collection variables using a regex.

Counting items in collections

You can count the number of items in a collection by prefixing it with an ampersand (&). For example, the following rule matches if the client does not send any cookies with his request:

SecRule &REQUEST_COOKIES "@eq 0"

You can also use the count operator to make sure that a certain field in a collection is present. If we wanted a rule to match if the User-Agent header was missing from a request we could use the following:

SecRule &REQUEST_HEADER:User-Agent "@eq 0"

The above will match if the header is missing. If instead there is a User-Agent header but it is empty the count operator would return 1, so it is important to be aware that there is a difference between a missing field and an empty one.

Note

It is perfectly valid for a query string or POST request to contain several arguments with the same name, as in the following example:

GET /buy/?product=widget&product=gizmo

If we counted the number of arguments named product by using&ARGS:product in a rule, the result would evaluate to two.

Filtering collection fields using a regular expression

You can also use a regular expression to filter out only certain fields in a collection. For example, to select all arguments that contain the string arg, use the following construct:

SecRule ARGS:/arg/ "secret" deny

The regular expression filters out any arguments whose name contains arg, so the above rule will match query strings such as arg1=secret phrase which contain the value secret, but it will not match if no argument name contains the string arg, since in that case the regular expression construct doesn't select any arguments at all from the collection.

You'll notice that the syntax used to filter out arguments differs from a normal collection declaration by the slashes surrounding the regular expression. We use the forward slashes to tell the rule engine to treat the string within the slashes as a regular expression. Had we omitted the slashes, only parameters with the exact name arg would have been selected and matched against.

Built-in fields

The collections IP, SESSION, and USER contain a number of built-in fields, that can be used to get statistics about the creation time and update rate of each collection:

Built-in field

Description

CREATE_TIME

Date/time the collection was created.

IS_NEW

Set to 1 if the collection is new.

KEY

The value stored in the collection variable.

LAST_UPDATE_TIME

Date/time the collection was last updated.

TIMEOUT

Seconds until collection will be written to disk.

UPDATE_COUNTER

Number of times the collection has been updated since it was created.

UPDATE_RATE

Average number of updates to the collection per minute.

The CREATE_TIME and LAST_UPDATE_TIME fields contain a UNIX timestamp (number of seconds since January 1st, 1970), so keep that in mind if you ever need to convert these values to a human-readable format.

The KEY field contains the value stored in the collection variable when the collection was first initialized with initcol. The IP.KEY field would for example contain the IP address of the client.

Transformation functions

ModSecurity provides a number of transformation functions that you can apply to variables and collections. These transformations are done on a copy of the data being examined, meaning that the original HTTP request or response is never modified. The transformations are done before any rule matching is attempted against the data.

Transformation functions are useful for a variety of purposes. If you want to detect cross-site scripting attacks (see Chapter 6 for more on this), you would want to detect injected JavaScript code regardless of the case it was written in. To do this the transformation function lowercase can be applied and the comparison can then be done against a lowercase string.

To apply a transformation function, you specify t: followed by the name of the function and then put this in the action list for the rule. For example, to convert the request arguments to all-lowercase, you would use t:lowercase, like so:

SecRule ARGS "<script" "deny,t:lowercase"

This denies all access attempts to URLs containing the string<script, regardless of which case the string is in (for example,<Script, <ScrIPt, and<SCRIPT would all be blocked).

These are the transformation functions available:

Transformation function

Description

base64Encode

Encodes the string using Base64 encoding.

base64Decode

Decodes a Base64-encoded string.

compressWhitespace

Converts tab, newline, carriage return, and form feed characters to spaces (ASCII 32), and then converts multiple consecutive spaces to a single space character.

cssDecode

Decode CSS-encoded characters.

escapeSeqDecode

Decode ANSI C escape sequences ( , , \, ?, ", and so on).

hexEncode

Encode a string using hex encoding (for example, encode A to %41).

hexDecode

Decode a hex encoded string.

htmlEntityDecode

Decode HTML-encoded entities (for example, convert&lt to<).

jsDecode

Decode JavaScript escape sequences (for example, decode ' to').

length

Convert a string to its numeric length.

lowercase

Convert a string to all-lowercase characters.

md5

Convert the input to its MD5 cryptographic hash sum.

none

Remove all transformation functions associated with the current rule.

normalisePath

Replaces multiple forward slashes with a single forward slash and removes directory self-references.

normalisePathWin

Same as normalisePath but also converts backslashes to forward slashes when run on a Windows platform.

parityEven7bit

Calculates an even parity bit for 7-bit data and replaces the eighth bit of each target byte with the calculated parity bit.

parityOdd7bit

Calculates an odd parity bit for 7-bit data and replaces the eighth bit of each target byte with the calculated parity bit.

parityZero7bit

Calculates a zero parity bit for 7-bit data and replaces the eighth bit of each target byte with the calculated parity bit.

removeNulls

Remove null bytes from the string.

removeWhitespace

Remove all whitespace characters from the string.

replaceComments

Replace C-style comments (/* ... */) with a single space character. Opened comments (/*) that have not been terminated will also be replaced with a space character.

replaceNulls

Replace null bytes in the string with space characters.

urlDecode

Decodes an URL-encoded string.

urlDecodeUni

Same as urlDecode, but also handles encoded Unicode characters (%uxxx).

urlEncode

URL encodes the string.

sha1

Convert the input string to its SHA1 cryptographic hash sum.

trimLeft

Remove any whitespace at the beginning of the string.

trimRight

Remove any whitespace at the end of the string.

trim

Remove whitespace from both the beginning and end of the string.

Other operators

Let's look at some additional operators that can be used to operate on data. We have already seen the regular expression, simple string comparison and numeral comparison operators earlier, and here we take a look at some additional ones that are available for use.

Set-based pattern matching with @pm and @pmFromFile

We have seen how to write regular expressions that match one of several alternative words. For example, to match red, green, or blue we would use the regex (red|green|blue). ModSecurity has two "phrase matching" operators that can be used to match a set of words: @pm and @pmFromFile.

The @pm version of our color-matching example would look like this:

SecRule ARGS "@pm red green blue" deny

This will trigger if an argument contains any of the strings red, green, or blue. As with the regex operator, a partial match is enough, so a query string of the form ?color=cobaltblue would trigger a match since the argument value contains the string blue.

Set-based pattern matching has several advantages:

  • It is slightly easier to read and write rules using the @pm operator than the equivalent regex syntax (...|...|...). Also, as we will shortly see, the @pmFromFile operator allows us to externalize the list of phrases to match against so that it is contained in a separate file.

  • Another advantage is that set-based pattern matching is faster than utilizing regular expressions. This is because the @pm and @pmFromFile operators use an algorithm known as the Aho-Corasick algorithm. This algorithm is guaranteed to run in linear time (meaning that as the size of the string and phrases increases, the time required to look for matches goes up only in a linear fashion). So for applications where you need to look for a large number of strings (such as known bad URLs in the Referer header, for example), using @pm or @pmFromFile would guarantee the best performance.

@pmFromFile

If you have a long list of words to match, it can be inconvenient to list all of them in your ModSecurity configuration file. For example, imagine you had a long list of disallowed colors:

red green blue yellow magenta cyan orange maroon pink black white gray grey violet purple brown tan olive

Instead of putting all of these in a rule, we can put the entire list of words in a separate file and then refer to it using the @pmFromFile operator. To do so, create the file you want to save the words in (we'll use /usr/local/colors.txt in this example), and then enter the words in the file, one per line. The file colors.txt starts out as follows:

red
green
blue
...

And this is the rule that utilizes the file together with the @pmFromFile operator:

SecRule ARGS "@pmFromFile /usr/local/colors.txt" deny

What this does is read the list of words from the file /usr/local/colors.txt and then execute the phrase-matching against the word list in the same way as if we'd used the @pm operator.

One subtle difference between @pm and @pmFromFile is that the latter also works with phrases. So if we substituted red apple for red in our colors.txt file, the rule would match any argument whose value was red apple, but not one where the value was only red.

The phrases in an external file are incorporated into the ModSecurity ruleset when the rules are read (that is when Apache is restarted), so if you modify the list you will need to restart the web server before the changes take effect.

Performance of the phrase matching operators

How much faster are the phrase matching operators when compared to a regular expression? Let's look at the above rule and see how long it takes to execute when we use the regex version. This is from the ModSecurity debug log when utilizing the regex version of the rule:

Executing operator "rx" with param "(?:red|green|blue)" against ARGS:x.
Target value: "red"
Operator completed in 11 usec.

The regular expression we used for this rule is slightly different than the first version at the start of this section. Instead of using just parentheses it uses what is called non-capturing parentheses. Non-capturing parentheses are the unsightly (?: ) construct you see above. As the name implies, non-capturing parentheses don't capture any backreferences for later use. The reason to use these in this example is that we don't want the regex engine to do any extra work to capture and store a reference to the matched value since that would slow it down and skew the comparison results.

Here is the debug log output when using the rule that utilizes the @pm operator:

Executing operator "pm" with param "red green blue" against ARGS:x.
Target value: "red"
Operator completed in 6 usec.

This time the operation completed in 6 microseconds instead of 11, which means we've shaved roughly half the processing time off by using @pm instead of the regex. You may think that this is a contrived example and that it's hard to draw any conclusions from using such a short list of words to match against. However, for even larger lists of words (where there might be thousands or even tens of thousands of words), the reduction in processing time will be even more dramatic than in this example, so keep that in mind when writing rules.

Validating character ranges

In later chapters we will learn more about using a positive secure model. A positive security model means that instead of trying to detect malicious data, we assume that all data is malicious and then only allow through exactly that which we determine to be valid requests. The operator @validateByteRange is useful for this purpose&mdash;you can use it to make sure that a character is only within a certain allowed range. For example, you would probably want an argument that contains a username to only contain the letters a-z, A-Z, and 0-9. Ensuring this is easy using @validateByteRange:

# Only allow reasonable characters in usernames SecRule ARGS:username "@validateByteRange 48-57, 65-90,  97-122, 45, 95"

The range 48-57 corresponds to ASCII characters 0..9, 65-90 is A..Z, and 97-122 is a..z. The ASCII codes for dash (45) and underscore (95) are also included so that these characters can be used in a username.

The above rule will block any attempt to provide a username argument that contains any characters except those allowed. Consult an ASCII chart to find out which ranges you need to block. Separate ranges and numbers using commas and make sure that all numbers are input in decimal notation.

Phases and rule ordering

It is important to understand in which order ModSecurity evaluates rules. This makes you more comfortable when creating your own rules and avoids situations where things are unexpectedly blocked or allowed even though you expect the opposite to happen.

We learned in Chapter 1 that the rule engine divides requests into five phases:

  1. REQUEST_HEADERS (phase 1)

  2. REQUEST_BODY (phase 2)

  3. RESPONSE_HEADERS (phase 3)

  4. RESPONSE_BODY (phase 4)

  5. LOGGING (phase 5)

Rules are executed strictly in a phase-by-phase order. This means that ModSecurity first evaluates all rules in phase 1 ("request headers") for a match. It then proceeds with phases 2 through 5 (unless a rule match causes processing to stop).

Within phases, rules are processed in the order in which they appear in the configuration files. You can think of the ModSecurity engine as going through the configuration files five times; one time for each processing phase. During each pass, the engine considers only rules belonging to the phase it is currently processing, and those rules are applied in the order they appear in the files.

The logging phase is special in that it will always be executed even if a request has been allowed or denied in one of the previous phases. Also, once the logging phase has started, you cannot perform any disruptive actions as the response has already been sent to the client. This means that you must be careful not to let any default disruptive action specified by SecDefaultAction be inherited into any phase 5 rules&mdash;doing so is a configuration error and you will be unable to restart Apache if this configuration error happens. If you place the following directive before any phase 5 rules (but after rules for earlier phases), that will prevent this error from occurring:

SecDefaultAction "phase:5,pass"

Actions&mdash;what to do when a rule matches

When a rule matches you have several options: You can allow the request, you can deny it or you can opt to take no action at the moment but continue processing with the next rule. There are also several other things you can do like redirect or proxy requests. In this section you'll learn in more detail about the options that are available.

Allowing requests

The way the allow action works differs depending on how a rule is written. An allow action can be configured to work in one of three ways:

  1. Allow access immediately and skip remaining phases (except for logging).

    This is the case if allow is specified on its own, as in SecAction allow.

  2. Allow access to the current phase only.

    Specify allow:phase to allow in the current phase. Rule processing then continues immediately with the next phase, and rules in this and subsequent phases may then override the allow with a deny action.

  3. Allow access to the request phases only.

    Specify allow:request to allow in the requests phases (1 and 2) only. Rule processing then continues immediately with phase 3 (response headers), and rules in this and subsequent phases may then override the allow with a deny action.

Blocking requests

To block a request you use the deny action. This has the effect of immediately stopping any further rule processing and denying the request with the HTTP error code that was specified in the rule or inherited as the default action.

There is another action called block, which sounds like it would be similar to deny. This action is deceptive however, as in its current form it can be used to both deny and allow requests, depending on what the default action specifies. One way this would help is not having to modify every rule if you wanted to change the disruptive action from deny to allow, for example. The intention of the ModSecurity authors is to expand the capabilities of block in the future, but I do not recommend using it at the current time.

Taking no action but continuing rule processing

Sometimes we want rule processing to continue even when a rule matches. In this case, we use the pass action to tell the rule engine to continue processing the next rule even if this one matches, like so:

SecRule REMOTE_ADDR "^192." "pass,log,logdata:'Suspicious 
IP address'"

Any rule that is in place to perform an action (such as executing a script) but where you don't want to block the request should have a pass action in its action list.

Dropping requests

Using the drop action results in the active TCP connection to the client immediately being closed by sending a TCP FIN packet. This action is useful when responding to Denial of Service attacks since it will preserve server resources such as limited-size connection tables to the greatest extent possible.

Redirecting and proxying requests

Requests that you think should be handled by another server can be redirected. This is done using the redirect action, and the effect is that the rule engine immediately stops any further processing and sends a HTTP status 302 redirect response to the client. The following rule redirects any matching requests to Google:

SecRule REQUEST_BASENAME "search.php"  "redirect:http://www.google.com"

It is important that you specify http:// before the server hostname if you want to redirect a request to a different server. If, in the example above, we had written the redirect string as redirect:www.google.com, the visitor would have ended up being redirected to http://www.ourserver.com/www.google.com, which is not what we intended.

You can also proxy requests. In web server terms, proxying means forwarding a request to another server and letting it deal with it. After the forwarded request has been handled the result is returned to the client via the original server. This means that to the client, it looks like the original server handled the request all along.

To proxy a request using ModSecurity we use the proxy action:

SecRule IP:Attacker "1" proxy:http://10.10.10.101/

Proxying allows you to do sneaky things like redirect any requests that you consider to be attacks to a honeypot web server and let it deal with it. A honeypot, when the term is used in information security, is a dedicated server that is allowed to attract hackers. The goal is to have the honeypot lure the hackers in, all the while making them think that they are trying to hack into a legitimate server. This serves the purpose of deflecting any attacks away from your real servers, and can also be used to create excellent deceptive effects by planting plausible-looking but completely false data on the honeypot server.

To proxy requests, your Apache server must have the mod_proxy module dynamically loaded or statically compiled in.

SecAction

Using SecAction, you can execute any number of actions unconditionally. The syntax is:

SecAction Actions

As an example, the following SecAction logs a message to the HTTP error log file:

SecAction "pass,log,logdata:'Testing SecAction'"

It is important to specify pass in a SecAction directive, as the default action will be prepended to the action list just as with a normal SecRule. If the default action is to deny the request then the SecAction above would have denied all requests if pass was missing.

Using the ctl action to control the rule engine

The ctl action allows for fine-grained control of the rule engine. Using this action you can configure the engine on a per-transaction basis. The following parameters are supported:

Parameter

Description

Corresponding directive

auditEngine

Turn the audit engine on or off.

SecAuditEngine

auditLogParts

Define what data to include in audit logs.

SecAuditLogParts

debugLogLevel

Change the debug log level.

SecDebugLogLevel

ruleRemoveById

Remove a rule or a range of rules.

SecRuleRemoveById

requestBodyAccess

Turn request body access on or off.

SecRequestBodyAccess

requestBodyLimit

Set the request body limit.

SecRequestBodyLimit

requestBodyProcessor

Configure the request body processor.

N/A

responseBodyAccess

Turn response body access on or off.

SecResponseBodyAccess

responseBodyLimit

Set the response body limit.

SecResponseBodyLimit

ruleEngine

Turn the rule engine on or off, or configure it for detection only.

SecRuleEngine

As can be seen from the table almost all of the parameters to ctl correspond to one of the ModSecurity configuration directives. The exception is the requestBodyProcessor parameter which we will discuss in more detail shortly.

How to use the ctl action

As an example, if during a request you notice that you don't want the rule engine to process any further rules you can use ctl:ruleEngine=off in the action list of a rule to stop the engine for the remainder of the request.

The ctl:requestBodyProcessor action doesn't correspond to any directive. Instead, this action can be used to set the module used to parse request bodies. Currently, this is used to allow the parsing of XML request bodies. If you need to be able to parse XML data submitted by clients, you should use the following rule to enable the XML processor and instruct it to parse XML request bodies:

SecRule REQUEST_HEADERS:Content-Type "^text/xml$" "nolog,pass,ctl:requestBodyProcessor=XML, ctl:requestBodyAccess=On"

With the above rule in place, any POST request with the content type text/xml will be parsed by the XML parser. You can then access the data by specifying the XML collection together with an XPath expression:

SecRule XML:/person/name/text() "Neo"

XPath expressions are a way to get to specific data in an XML document. The above rule would evaluate all of the<name> nodes contained in the following XML document and trigger a match since one of the nodes contained the string Neo:

<persons>
<person>
<name>John</name> </person>
<person> <name>Neo</name>
</person>
</persons>

Macro expansion

You can include data from variables or collections in log messages or when you initialize other variables. This is called macro expansion, and is done by enclosing the variable or collection name in a percent sign and curly braces:

SecAction setenv:ADDR=%{REMOTE_ADDR}

It is important to note that when specifying a collection field in an action list you need to separate the collection name from the field name with a dot and not a colon:

SecRule "test" "log,msg:%{TX.0}"

Macro expansion is not currently supported in all circumstances (for example, the append and prepend actions currently don't support macro expansion).

SecRule in practice

Alright, now that we have had a look at the theory of writing rules, let's start doing some real work by writing rules for more real-life situations. In this section we will look at several examples of how to write rules and rule chains to accomplish a given task.

Blocking uncommon request methods

The three most commonly used HTTP request methods are GET, POST and HEAD. You might be surprised to learn that the HTTP specification actually implements many more methods&mdash;if a web server supports the WebDAV (Web-based Distributed Authoring and Versioning) extensions, the total number of methods becomes almost 30. As an example, here are the request methods implemented by the latest version of Apache:

GET

PUT

POST

CONNECT

OPTIONS

TRACE

PROPFIND

PROPPATCH

MKCOL

MOVE

LOCK

UNLOCK

CHECKOUT

UNCHECKOUT

DELETE

PATCH

COPY

VERSION_CONTROL

CHECKIN

UPDATE

LABEL

REPORT

MKWORKSPACE

MKACTIVITY

BASELINE_CONTROL

MERGE

INVALID

Unless we had good reason to allow any of the less common methods, it would be good practice to block any but the commonly used ones. This instantly blocks any potential vulnerability that might be present in the Apache source code for the handling of non-standard methods.

This rule blocks all HTTP methods except for GET, POST, and HEAD:

SecRule REQUEST_METHOD "!^(GET|POST|HEAD)$" "deny,status:405"

We use the HTTP error code 405&mdash;Method not allowed for blocking any such non-standard method access attempts.

Restricting access to certain times of day

Suppose we wanted to restrict access to our web site so that it was only available during business hours (don't laugh, a certain UK government web site actually closes its company information search service at night). To accomplish this, we can use the variable TIME_HOUR together with a regular expression so that our site can only be accessed between the hours of 8 AM and 5 PM:

SecRule TIME_HOUR !^(8|9|10|11|12|13|14|15|16|17)$ deny

This rule contains a list of the "allowed" hours during which users can access the site. The hour format is based on the 24-hour clock, so 1 means 1 o'clock at night and 13 is used to represent 1 PM. The pipe character (|) is a feature of regular expressions that specifies that any of the hours can match&mdash;in effect making it an "or" operator.

There are three additional important characters in the above regex that we'd do well to explore a bit more. They are the exclamation mark (!), the caret (^) and the dollar sign. Let's start with the caret and dollar sign, since they are related. The caret matches the beginning of a string. If we hadn't used it in this example then the 8 would have matched both the hour 8 as well as the hour 18, which would have given users access to the site during the hour starting at 6 PM even though that wasn't our intention.

Similarly, the dollar sign ($) matches the end of a string. By preceding the list of allowed hours with a caret, and terminating it with a dollar sign, we make sure that only the listed hours will match, and so avoid any unpleasant surprises.

You may notice that preceding the list of allowed hours is an exclamation mark. This is used to negate the list of given operators. Since we want the rule to match when the hour is outside the list of allowable hours (and thus block the request), we use the exclamation mark to trigger the rule whenever the hour is outside the given range.

We could of course also have done away with the negation operator and simply specified the "forbidden" hours 0-7 and 18-23, but this would have created a slightly longer regular expression where we would have had to specify 14 separate hours instead of just the 10 in the example above.

An important point to consider is that the negation applies to the whole regular expression. Thus, the exclamation mark above does not apply solely to the first number (8) above, but to the entire regular expression that follows, namely the list of all the hours between 8 and 17.

Detecting credit card leaks

Suppose you had a database on your network that contains customer information such as the names and addresses of customers as well as the credit card numbers they used when making purchases. It would be a bad thing if a hacker was able to get a hold of the credit card numbers stored in the database, so of course you would want to use best practices such as encrypted database files and a separate database server to store the card numbers.

However, suppose that in spite of all this, a hacker was able to leverage a programming error in your web site's administrative interface to get a hold of database records. If this were to happen, he could simply use a web browser to access credit card numbers, perhaps by using some clever SQL injection techniques. Fortunately, we can use ModSecurity as a last line of defense against this kind of disaster!

We do this by examining the HTTP response data that is sent back to clients. ModSecurity contains an operator called @verifyCC. It takes as an argument a regular expression. When this regular expression matches, the argument is passed to another algorithm to validate it as a credit card number. If the algorithm returns true we can block the response from being sent back, because it likely contains a credit card number. This is the way to write a rule to do that:

SecRule RESPONSE_BODY "@verifyCC d{13,16}" "phase:4,deny,t:removeWhitespace,log,msg:'Possible credit card number leak detected'"

Detecting credit card numbers

All the common credit cards in use today (Visa, MasterCard, American Express and others) have card numbers that are between 13 and 16 digits in length. We therefore use a regex to detect sequences of numbers of this length.

It is very important that we have set SecResponseBodyAccess to On, or ModSecurity will be unable to examine the response body for card numbers.

In the example above, the response body is examined to detect any such likely credit card number of the correct length. We use the t:removeWhiteSpace transformation function to enable us to detect card numbers even if the digits are separated by whitespace.

If found, the number is singled out for further inspection by the @verifyCC operator. If the number passes the credit card validation algorithm the request is denied and we log a message about the event.

The Luhn algorithm and false positives

The matched regular expression in our rule above is passed by @verifyCC to an algorithm called the Luhn algorithm. All credit card numbers in use today have numbers that are verifiable by this algorithm. If, for example, you were to take a valid credit card number and change a single digit, the Luhn algorithm would no longer validate it as a card number.

The Luhn algorithm uses a fairly simple checksumming method to verify card numbers: It starts by multiplying the last digit of the card number by one and the next to last number by two. It then continues to multiply numbers, alternating between using the factors one and two. The resulting numbers are all treated as single digits and added together. If the sum that results from this addition is divisible by 10, the credit card number passes the validation.

As an example, let's take a look at the commonly used test card number 4012888888881881. Multiplying the last number with 1, the next to last number with 2, and so on, we get the following result:

1 8 8 1 8 8 8 8 8 8 8 8 2 1 0 4
x 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
-------------------------------------------------
1 16 8 2 8 16 8 16 8 16 8 16 2 2 0 8

Now, we sum up all the digits in the string "116828168168168162208", and get the following result:

1+1+6+8+2+8+1+6+8+1+6+8+1+6+8+1+6+2+2+0+8 = 90

Since 90 is evenly divisible by 10, this card number passes the validation check.

False positive matches are possible with the Luhn algorithm (meaning it could validate a number as a credit card number even though it is in fact some other, non-credit card number). Since the algorithm uses a digit between zero and nine for the checksum, it has a 10% false positive ratio. However, this is only for the numbers of the correct length that we have singled out. If you should encounter a scenario where you get false positive detections you can always add a rule to exclude the page in question from credit card validation checks.

Tracking the geographical location of your visitors

An IP address or a hostname by itself doesn't give much information about where in the world a visitor to your web site is located. Sure, a hostname such as host-18-327.92.broadband.comcast.net might give you a hint that the visitor is from the USA, but that only works with some hostnames and it's not very specific.

Enter geographical lookup databases. These map IP addresses to their geographical location. A company called MaxMind (http://www.maxmind.com) has such a database available, both in a free version and in a paid version. Their free database, GeoLite Country, is accurate enough for most applications (certainly when all you want to do is find out which country a visitor is from), and should you require greater accuracy you can purchase a license for their paid version, GeoIP Country.

So what does this have to do with ModSecurity? As of version 2.5, ModSecurity supports geographical location (or geolocation) of your visitors by referencing a geographical database such as the one published by MaxMind. This means that you can write rules that take into account where in the world your visitor is located. This is useful for many applications. If for example you processed credit card payments you could match the geographical location of the IP to the country in which the credit card was issued. If an American credit card is suddenly used in Taiwan, that should raise suspicions and potentially cause you to decline processing the order.

ModSecurity uses the GEO collection to store geographical information. Let's take a closer look at this collection and the fields it contains.

GEO collection fields

The GEO collection contains the following fields:

  • COUNTRY_CODE

  • COUNTRY_CODE3

  • COUNTRY_NAME

  • COUNTRY_CONTINENT

  • REGION

  • CITY

  • POSTAL_CODE

  • LATITUDE

  • LONGITUDE

  • DMA_CODE (US only)

  • AREA_CODE (US only)

The COUNTRY_CODE field is a two-letter country identified, as defined by ISO standard 3166. For example, the country code for the United States is US and for the United Kingdom GB. COUNTRY_CODE3 contains the three-letter country code, for example, USA or GBR. The COUNTRY_CONTINENT field contains the geographical continent where the user resides. Examples include EU for users from Europe and AS for Asia.

Blocking users from specific countries

Let's say that you run a small software company. Business is good, but you notice in your log file that a significant number of users located in China download the trial version of your software. This is quite odd if at the same time you never have any legitimate sales that come from Chinese users. The explanation is usually that these users from certain countries download the trial version and then run a crack, which is a small program that patches a trial version and converts it to the fully licensed version without the user having to pay.

You could of course allow these downloads and see them as potential future sales if you translate your software into Chinese. But perhaps bandwidth costs are going up and you would rather block these downloads. Here's how to do it using ModSecurity:

First, we need to download the geographical database. Follow these steps:

  1. Go to http://www.maxmind.com and click on GeoLocation Technology.

  2. Click on GeoLite Country, which is the free version of the database.

  3. Copy the link to the binary version of the GeoLite database file.

Once you have the link to the file you can download it to your server using wget, extract it using gunzip, and move it to its own directory, like so:

$ wget  http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz
$ gunzip GeoIP.dat.gz
$ mkdir /usr/local/geoip
$ mv GeoIP.dat /usr/local/geoip

Now we need to configure ModSecurity so that it knows where to find the GeoIP database file. To do this, we use the SecGeoLookupDb directive. Place the following line in your modsec.conf file:

SecGeoLookupDb "/usr/local/geoip/GeoIP.dat"

Now we are ready to start writing rules that can take into account where visitors are located. To instruct ModSecurity that we want to look up the geographical location of an IP address, we use the @geoLookup operator. This operator takes the supplied IP address and performs a geographical lookup of it in the database file specified using SecGeoLookupDb. After a successful lookup, the GEO collection is populated with the fields listed in the previous section. In our case the only fields available will be COUNTRY_CODE, COUNTRY_CODE3, COUNTRY_NAME, and COUNTRY_CONTINENT since we are using the free database. This is however quite enough for our purposes as all we require is the country information.

Now, to block users from specific countries, we use the following two chained rules:

# Block users from China
SecRule REMOTE_ADDR "@geoLookup" "deny,nolog,chain"
SecRule GEO:COUNTRY_CODE "@streq CN"

The country code for China is CN as can be seen by referring to http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2, which contains a list of all available two-letter country codes.

If we wanted to block some additional countries, say Russia (RU) and Pakistan (PK), we could modify the rule chain as follows:

# Block users from China, Russia and Pakistan
SecRule REMOTE_ADDR "@geoLookup" "deny,nolog,chain"
SecRule GEO:COUNTRY_CODE "@pm CN RU PK"

As you can see we used the phrase matching operator @pm to simplify the matching of the country codes. If we wanted to block a long list of countries we would do well to add the country codes to a separate file, one per line, and then use the @pmFromFile operator in the last rule.

Load balancing requests between servers on different continents

If you're serving any sort of large binary files to your visitors you would want them to get the best download speed possible. Suppose that you have one server in the USA and one server in Europe. By using the ModSecurity @geoLookup operator it is possible to determine where your visitor is located and send him to the nearest server, which will give the best download speeds.

The following rules (which you would place in the ModSecurity configuration file on your US server) redirects access to any file in the /download/ directory to the European server when any European visitor requests a download:

# Redirect European visitors to EU server SecRule REQUEST_URI "^/download/(.*)$"  "phase:1,capture,chain,  redirect:http://europe.example.com/download/%{TX.1}"
SecRule REMOTE_ADDR "@geoLookup" "chain"
SecRule GEO:COUNTRY_CONTINENT "@streq EU"

The first rule in the rule chain matches against any requested file in the directory /download/. It uses the capturing parentheses together with the dot and star regex operators to capture the filename requested in the download directory. Since we specify the capture action, ModSecurity will capture this value and store it in the variable TX:1. We then redirect the request to the server europe.example.com and use the macro %{TX.1} to specify which file to redirect to.

Note that there is a subtle difference when specifying the captured variable in the macro expansion as opposed to using it as a variable&mdash;you must write it as %{TX.1} with a dot or the macro will fail to expand properly.

Pausing requests for a specified amount of time

ModSecurity allows you to pause requests for a specified period of time. This is done via the pause action. This can be useful if for example you have detected suspicious behavior, such as a potential spammer submitting POST requests to a comment form at a rate far higher than normal.

To pause a request you specify the time, in milliseconds, that you want to delay it. If we wanted to pause any request after a user has submitted more than five POST requests within the last minute we could use the following rules:

SecAction "initcol:ip=%{REMOTE_ADDR},pass,nolog"
SecRule REQUEST_METHOD "@streq POST"  "pass,setvar:ip.posts_made=+1,expirevar:ip.posts_made=12"
SecRule IP:POSTS_MADE "@gt 5" "pause:5000"

This will pause POST requests for 5 seconds if a user has submitted more than five of them within a one-minute interval (and after the pause the request will be denied if that is what SecDefaultAction specifies). The expirevar action instructs the ModSecurity rule engine to expire the variable ip.posts_made after 12 seconds, so users can not submit more than five POST requests in a minute.

Take care when using pause, as it will cause an Apache child process to sleep for the specified amount of time. If you were under a Denial of Service attack, and the attacker submitted requests that caused a pause to occur, this could make your site go down more quickly than if the pause action had not been in place.

Executing shell scripts

ModSecurity can execute an external shell script when a rule matches. This is done via the exec action. This is a very powerful technique that allows you to invoke the full power of your favorite scripting language to take further action when a rule match occurs. You can in fact also invoke a binary program file, though most of the time a shell script will be more convenient to execute.

The invoked file must be executable by the Apache process, so make sure that you set the permissions on the file correctly. One catch when invoking a script is that the script must write something to stdout. If your script doesn't do this, ModSecurity will assume the execution has failed, and you will get the error message Execution failed while reading output in the Apache error log file.

Sending alert emails

As an example, suppose that we wanted to execute a script to email us an alert message whenever an attempted SQL injection exploit was detected. To do this, we need two things:

  1. A script file that has the ability to email an alert to a specified email address.

  2. A rule that will invoke the email script when a rule match is detected.

For the script, we will use a standard shell script that invokes /bin/sh, though we could have easily used Perl or any other scripting language. We will email the alert to [email protected].

Create a file named email.sh in the directory /usr/local/bin and type the following in it:

#!/bin/sh
echo "An SQL injection attempt was blocked" | mail s "ModSecurity Alert" [email protected]
echo Done.

The script invokes the mail binary to send an email with the subject ModSecurity Alert to [email protected]. The last line of the script writes the string Done. to stdout. This is so that ModSecurity will recognize that the script has executed successfully.

We now have to make the script executable so that it can be invoked when a rule matches:

$ chmod a+rx /usr/local/bin/email.sh

Now all that is left is to create a rule that will trigger the alert script:

SecRule ARGS "drop table" "deny,exec:/usr/local/bin/email.sh"

You can now test out this rule by attempting to access http://yourserver/?test=drop%20table. If you've substituted your own email address in the example above you should get an email telling you that an SQL injection attempt has just been blocked.

Note

The %20 character string in the web address is an example of a url encoded string. URL encoding is a method used to convert a URL containing non-standard characters to a known character set. A URL-encoded character consists of a percent sign followed by a hexadecimal number. In this example, %20 represents a space character. A space has the decimal (base 10, i.e. what we normally use when describing numbers) character code 32, and its hexadecimal equivalent is 20, so the final URL-encoded result is %20.

Receiving such an email can be useful to quickly be alerted of any ongoing attacks. However, what if we wanted the email to contain a little more information on the attempted exploit; would that be possible? Yes, it's not only possible, it's also a very good idea, since more information about an alert can allow us to decide whether it is something to investigate more in-depth (such as when we detect that it's not just an automated vulnerability scanner pounding away at our server but actually a hacker probing for weaknesses with manually crafted exploit URLs).

Sending more detailed alert emails

ModSecurity allows us to set environment variables via the setenv action. By populating environment variables with suitable data we can record more information about the request that was blocked.

Suppose we would like to gather the following data when an attempted SQL injection is detected:

  • The hostname of the server where the alert occurred

  • The remote user's IP address and hostname

  • The full request URI

  • The values of all arguments, whether they were sent using the GET or POST method

  • The unique ID for the request, so we can find this alert in the log files

We will place this information in six separate environment variables, which we will call HOSTNAME, REMOTEIP, REMOTEHOST, REQUESTURI, ARGS, and UNIQUEID. Our modified rule now looks like this:

SecRule ARGS "drop table" "deny,t:lowercase,  setenv:HOSTNAME=%{SERVER_NAME},  setenv:REMOTEIP=%{REMOTE_ADDR},  setenv:REQUESTURI=%{REQUEST_URI},  setenv:ARGS=%{ARGS},  setenv:UNIQUEID={%UNIQUE_ID},  exec:/usr/local/bin/email.sh"

Now all we have to do is modify the email script so that it places the environment variables in the email body:

#!/bin/sh
echo "
An SQL injection attempt was blocked:
Server: $HOSTNAME
Attacking IP: $REMOTEIP
Attacking host: $REMOTEHOST
Request URI: $REQUESTURI
Arguments: $ARGS
Unique ID: $UNIQUEID
Time: `date '+%D %H:%M'`
" | mail s 'ModSecurity Alert' [email protected]
Echo Done.

As you can see, we use a multi-line echo statement to get all the information nicely formatted. Since this is a shell script, it will replace $HOSTNAME and the other environment variables with the value we set the variables to in our ModSecurity rule. The last line of the echo statement also adds a timestamp with today's date and the current time by invoking the date command and placing backticks (`) around it, which causes the shell to execute the command and substitute the command's output for it. Finally, the data is piped into the mail binary, which sends an email with the subject line ModSecurity Alert to the specified email address.

Again, at the end of the script we make sure to echo a dummy text to stdout to make ModSecurity happy. If you test this script you should get a nicely formatted email with all of the attacker's details.

Counting file downloads

ModSecurity makes it possible to solve problems that you thought were hard or impossible to solve using your standard web application. And often in a very elegant way, too.

A common problem webmasters face is counting the number of downloads of a binary file, such as an executable file. If the resource on the web server had been a normal web page, we could easily just add a server-side script to the page to update the download counter in a database. However, being binary, the file can be accessed and linked to directly, with no chance for any server-side script to log the download or otherwise take note that a download is taking place.

We will see how to create a ModSecurity rule that will invoke a shell script when a binary file is downloaded. This shell script contains some simple code to increment a download counter field in a MySQL database.

First, let's start by creating a new SQL database named stats and add a simple table to it that contains the columns date and downloads:

mysql> CREATE DATABASE stats; Query OK, 1 row affected (0.00 sec)
mysql> USE stats; Database changed
mysql> CREATE TABLE download (day DATE PRIMARY KEY, downloads INT); Query OK, 0 rows affected (0.05 sec)

The day column holds a date and the downloads column is the number of downloads in that day. So far so good&mdash;let's move on to the code that will update the database.

To get this right, we need to know a little about how modern web browsers and servers handle file downloads. The HTTP/1.1 protocol allows a client to specify a partial content range when performing a GET request. This range can be used to specify a byte interval of the file that the client wants to download. If successful, the server responds with HTTP status code 206&mdash;Partial content and sends only the data in the requested range to the client. For large files the web browser may perform tens or hundreds of GET requests with a partial content range specified before it has downloaded the entire file. This is useful because it allows a web browser or download manager to re-download the missing parts of an interrupted download without having to resort to downloading the file in its entirety again.

If we were to create a rule that triggers on any GET request for a particular file then a browser that uses partial content GET requests would increase the download counter many times for a single file download. This is obviously not what we want, and therefore we will write our rule to trigger only on requests that result in a standard HTTP 200&mdash;OK response code.

We will name the shell script that gets invoked /usr/local/bin/newdownload.sh. The shell script in newdownload.sh is a simple statement that invokes MySQL, passing it an SQL statement for updating the table:

#!/bin/sh
mysql -uweb -ppassword -e "INSERT INTO download  (day, downloads) VALUES (CURRENT_DATE, 1) ON DUPLICATE  KEY UPDATE downloads = downloads + 1;" stats

The ON DUPLICATE KEY statement is a construct special to MySQL. It instructs the database to ignore the INSERT statement and instead update the database field if the primary key already exists. In this way a row with today's date will get inserted into the database if it's the first download of the day (setting the download counter to 1), or updated with downloads = downloads + 1 if a row with today's date already exists. For this to work we must make the field day a primary key, which we did above when the table was created.

Note

The ON DUPLICATE KEY syntax was introduced in MySQL version 4.1, so check to make sure that you're not using an old version if things don't seem to be working.

After creating the shell script, we need to make sure it is marked as executable:

# chmod a+rx /usr/local/bin/newdownload.sh

We will put this rule in phase 5, since this is the most certain phase to read response codes and we don't need to take any disruptive action. We use two chained rules since there are two conditions that need to be fulfilled&mdash;that the request URI is the path to our file, and that the response code is 200. We do a little smart optimization here by specifying the rule that matches the filename as the first rule in the chain. Had we instead looked at the response code first, both rules would have to be invoked every time the server generated a response code of 200&mdash;by doing it the other way around the second rule in the chain is only considered if the Request URI matches our filename.

The rules look like this:

SecRule REQUEST_URI "^/File.zip$"  "phase:5,chain,pass,nolog"
SecRule RESPONSE_STATUS 200  "exec:/usr/local/bin/newdownload.sh"

As we have learned previously, the ^ and $ characters are regular expression markers that match the start of, and end of, a line, respectively. Using them in this way ensures that only the File.zip found at the exact location /File.zip matches, and not any other file such as /temp/File.zip.

We use the pass action since we want to allow the request to go through even though the rule chain matches. Even though disruptive actions such as deny or drop cannot be taken in phase 5 (logging), we need to specify pass, or we would get an error message about the inherited SecDefaultAction not being allowed in phase 5 when we tried to reload the rules.

Now let's test our download counter. Upload any ZIP file to the web site root directory and name it File.zip. Then before we download the file for the first time let's make sure that the download table is empty:

mysql> USE stats; Database changed mysql> SELECT * FROM download; Empty set (0.00 sec)

Alright, now for the real test&mdash;we will download File.zip and see if the download counter is set to 1:

# wget http://localhost/File.zip --2009-01-29 16:47:11-- http://localhost/File.zip
Resolving localhost... 127.0.0.1 Connecting to localhost|127.0.0.1|:80... connected. HTTP request sent, awaiting response... 200 OK
Length: 100476 (98K) [application/zip]
Saving to: `File.zip'
shell scriptsfile downloads, counting2009-01-29 16:59:03 (142 MB/s) - `File.zip' saved [100476/100476]
mysql> SELECT * FROM download; +------------+-----------+
| day | downloads |
+------------+-----------+
| 2009-01-29 | 1 |
+------------+-----------+
1 row in set (0.00 sec)

Success! Now try downloading the file one more time and verify that the counter goes up to 2 and you will be sure that the shell script is working as intended.

I hope this example has showed you the power of ModSecurity and its exec statement. The shell script could easily be expanded to add additional functionality such as sending an email to you when the number of daily downloads of the file reaches a new all-time high.

Blocking brute-force password guessing

Now let's see how we can use the IP collection to block a user from trying to brute-force a protected page on our server. To do this we need to keep track of how many times the user unsuccessfully tries to authenticate.

One attempt would be the following:

# This looks good, but doesn't work
SecAction initcol:ip=%{REMOTE_ADDR},pass SecRule REQUEST_URI "^/protected/" "pass,chain,phase:2"
SecRule RESPONSE_STATUS "^401$" "setvar:ip.attempts=+1"
SecRule IP:ATTEMPTS "@gt 5" deny

The intention of the above rules is that if someone tries an unsuccessful username/password combination more than 5 times for any resource under /protected, he will be denied access. We use the setvar:ip.attempts=+1 syntax to increase the counter each time an access attempt fails.

This looks good, but if you try it out you will find that it does not work. The reason is that when Apache notices a Require directive (which is what is used to password-protect a resource), it generates a 401&mdash;Authentication Required response and immediately sends it back to the client. This happens right after ModSecurity phase 1 (request headers) and causes the rule engine to immediately jump to phase 5 (logging). This is a caveat that applies to certain internal Apache redirects and also applies to 404&mdash;Not Found responses, so we need to work around it.

The solution is to also keep track of accesses to the resource where the response code is in the 200-299 range (meaning the response was successful). When we detect such a response on a protected resource we know that the client has authenticated successfully, and can set the counter to 0 so that he will not be blocked.

This is how the rules look with our new try:

# Initialize IP collection
SecAction "initcol:ip=%{REMOTE_ADDR},pass,phase:1"
# Track accesses to the protected resource
SecRule REQUEST_URI "^/protected/" "pass,phase:1,setvar:ip.attempts=+1"
# Was this an authenticated access? (Chained rule)
SecRule REQUEST_URI "^/protected/" "chain,pass,phase:3"
# Yes, user is logged in, set counter to 0 SecRule RESPONSE_STATUS "^2..$" "setvar:ip.attempts=0"
# Block if more than 5 non-authenticated access attempts
SecRule IP:ATTEMPTS "@gt 5" "phase:1,deny"

We put all of the rules that need to trigger on a 401&mdash;Authentication Required response in phase 1 so that the rule engine is able to process them. The above now works, but suffers from a shortcoming: If someone legitimately doesn't remember his password and tries various combinations more than five times, he will be locked out of the server for good. To solve this, we modify our previous rule so that in addition to increasing the counter, it also contains an expirevar action to expire the variable after a certain number of seconds:

SecRule REQUEST_URI "^/protected" "pass,phase:1,setvar:ip.attempts=+1, expirevar:ip.attempts=600"

We set the expiration time in seconds to 600, which equals ten minutes. This means that after five failed access attempts, any further requests will be blocked for ten minutes. If the attacker should return nine minutes after being blocked and try another password, the expirevar action will trigger again and reset the timer back to ten minutes. Any legitimate user who accidentally forgot his password would have to wait the full ten minutes before he would be given a further five attempts to remember his password.

The full rule listing to block access after five failed attempts with the reset on the block after ten minutes now looks like this:

# Initialize IP collection
SecAction "initcol:ip=%{REMOTE_ADDR},pass,phase:1"
# Track accesses to the protected resource
SecRule REQUEST_URI "^/protected" "pass,phase:1,setvar:ip.attempts=+1,expirevar:ip.attempts=600"
# Was this an authenticated access? (Chained rule)
SecRule REQUEST_URI "^/protected/" "chain,pass,phase:3"
# Yes, user is logged in, set counter to 0
SecRule RESPONSE_STATUS "^2..$" "setvar:ip.attempts=0"
# Block if more than 5 non-authenticated access attempts
SecRule IP:ATTEMPTS "@gt 5" "phase:1,deny"

If you think that the above solution looks like a bit of a hack then I agree. However, you need to be aware of and know how to work around problems like the one with the 401&mdash;Authentication Required response in the rule engine.

Injecting data into responses

ModSecurity allows us to inject data into the response sent back to the client if the directive SecContentInjection is set to On. This is possible because the rule engine buffers the response body and gives us the opportunity to either put data in front of the response (prepending) or append it to the end of the response. The actions to use are appropriately named prepend and append.

Content injection allows us to do some really cool things. One trivial example just to show how the technique works would be to inject JavaScript code that displays the message "Stop trying to hack our site!" whenever we detected a condition that wasn't severe enough to block the request, be where we did want to issue a warning to any would-be hackers:

SecRule ARGS:username "%"  "phase:1,allow,t:urlDecode,append:  '<script type=text/javascript>alert("Stop trying  to hack our site!");</script>',log,msg:'Potential  intrusion detected'"

The above detects when someone tries to supply a username with a % character in it. In the SQL database query language, which is what many login pages use when they look up username and password information, the % character is a "wildcard" character that can match any string. So if the username contained that character (and we use the transformation urlDecode to make sure that it doesn't contain a % because it's URL-encoded), that would be cause for concern, so we block it. We also display a nice JavaScript message to the potential intruder to let him know that we're keeping an eye on him:

Injecting data into responses

Inspecting uploaded files

Another very useful ModSecurity feature is the ability to inspect files that have been uploaded via a POST request. So long as we have set RequestBodyBuffering to On we can then intercept the uploaded files and inspect them by using the @inspectFile operator.

To show how this works we will write a script that intercepts uploaded files and scans them with the virus scanner Clam AntiVirus. Clam AntiVirus is an open source virus scanner which you can obtain at http://www.clamav.net. Once you have installed it you can use the command clamscan <filename> to scan a file for viruses.

To intercept uploaded files we need to apply a few ModSecurity directives:

SecUploadDir /tmp/modsecurity
SecTmpDir /tmp/modsecurity

This specifies where ModSecurity stores the files it extracts from the request body. We need to make sure we create the temporary directory and that the Apache user has read and write access to it.

When using @inspectFile, ModSecurity treats the script output as follows:

  • If the script returns no output, the file is determined to have passed inspection and ModSecurity will let the request through

  • If the script writes any output to stdout, ModSecurity will consider the intercepted file to be "bad" and will block the request

The script we invoke for intercepted files will simply execute clamscan and write a text string to stdout if a virus is detected.

We create the following simple shell script to perform the virus scan, and save it in /usr/local/bin/filescan.sh:

#!/bin/sh
/usr/bin/clamscan $1 > /dev/null 2>&1
if [ "$?" -eq "1" ]; then
echo "An infected file was found!"
fi

The script first executes clamscan, passing the argument provided to the script ($1) on to clamscan. The output is redirected to /dev/null to prevent ModSecurity from reading the standard clamscan output and think a match has been found. The funny-looking 2>&1 construct at the end of the line tells the shell to redirect both the stdout and stderr output from clamscan to /dev/null.

The next statement checks the return value of the last executed command, which is stored in the variable $?. Clam AntiVirus returns 1 if a virus was found during scanning and 0 otherwise. If we find a 1 being returned we echo the string An infected file was found! to stdout, which tells ModSecurity that the upload should be blocked.

The ModSecurity rule we use to intercept uploaded files and call our shell script to examine each file is a simple one-line affair:

SecRule FILES_TMPNAMES "@inspectFile  /usr/local/bin/filescan.sh" "phase:2,deny,status:418"

To make sure that the file interception and scanning really works we deny the request with HTTP code 418 to differentiate it from any other rules which might also block the request. You can change this later once you've verified that the interception works.

Note

HTTP code 418 is defined in RFC 2324 ("Hyper Text Coffee Pot Control Protocol") as:

418 I'm a teapot

Any attempt to brew coffee with a teapot should result in the error code '418 I'm a teapot'. The resulting entity body MAY be short and stout.

An RFC, or Request for Comments, is a text document that describes a proposed Internet standard. This particular RFC was published on April 1st, 1998.

To test our script, we will use something called the EICAR standard anti-virus test file. This is a small executable file in the old 16-bit DOS format which is completely harmless. When run in a Command Prompt on Windows, it prints the string EICAR-STANDARD-ANTIVIRUS-TEST-FILE! and then exits. The file is convenient because it can be represented entirely in readable ASCII characters, so no encoding is required when we want to upload it to the server.

The EICAR test file looks like this:

X5O!P%@AP[4PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST- FILE!$H+H*

Now we need a way to upload the file to the server so that ModSecurity can intercept it. To do this, we will construct an HTTP POST request by hand (just because we can!) and submit it to the server using the netcat (nc) utility.

We simply create a file named postdata and put the following in it:

POST / HTTP/1.1
Host: localhost
Content-Length: 193
Content-Type: multipart/form-data; boundary=delim
--delim
Content-Disposition: form-data; name="file"; filename="eicar.com"
Content-Type: application/octet-stream
X5O!P%@AP[4PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST- FILE!$H+H*
--delim

Now let's upload the file to our server and see what happens:

$ nc localhost 80 < postdata
HTTP/1.1 418 unused
Date: Wed, 25 Feb 2009 19:45:18 GMT
Server: Test 1.0
Content-Length: 565
Connection: close
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>418 unused</title>
</head><body>
...

Success! The file was rejected with an HTTP error code 418, which is exactly the code we specified for any intercepted files that our Clam AntiVirus script determined to be viruses.

Summary

This chapter contained a lot of information, and you will no doubt want to refer back to it when writing rules. It will take a while to get used to the ModSecurity syntax if you haven't written rules before, so make sure you try out as many examples as possible and write rules of your own to get the hang of the process of creating new rules.

In this chapter we first looked at the basic SecRule syntax, and then learned how to match strings using either regular expressions or simple string comparison operators. We learned in which order the rule engine executes rules and why it's important to know about this to be able to write rules properly. We also learned about all the other things we need to know to successfully write rules such as transformation functions, macro expansion and the actions that can be taken when a rule matches.

In the second half of the chapter we looked at practical examples of using ModSecurity, including how to use a geographical database to locate visitors and how to execute shell scripts when a rule matches. We also saw how to intercept uploaded files and how to use Clam AntiVirus in conjunction with a shell script to scan uploaded files for viruses.

In the next chapter we look at the performance of ModSecurity and how to write rules so as to minimize any performance impact on our web applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.160.63