Chapter 11. Message and Content Filters

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11. Message and Content Filters

In this chapter, you will learn the following:

• What the message filters and content filters features are used for and the differences between the two engines

• When to use message filters over content filters

• Low-level information on filter conditions and actions and performance considerations

Filtering Email Messages with Custom Rules

The Email Security Appliance (ESA) platform and its AsyncOS software has a huge number of features for almost any email task that would be encountered in production, from connection controls, through message routing and address rewriting, to control over final delivery. Tables like HAT, RAT, SMTPRoutes, and Destination Controls offer extensive control. Security filtering is handled automatically by IPAS and the antivirus engines.

Inevitably, however, a certain problem will need a custom solution that can’t be solved using a table or setting. You’ll need to compose and use our own custom filter rules that can act on individual messages and recipients, and the ESA provides two points in the pipeline where you can use them, as shown in Figure 11-1.

Figure 11-1. Location of the Message Filter and Content Filter Steps in the ESA Pipeline

Message filters are only configured in the command-line interface (CLI) using the filters command. Message filters were the first such feature on the ESA, implemented early in the product development for tasks like removing, modifying, or adding message headers, and for selecting delivery source IPs. Content filters are available in both the CLI and the web user interface (WUI), but are fairly cumbersome to use in the CLI, so we’ll stick to the WUI.

Both message and content filters test one or more conditions, and based on the logical truth or falsehood of the condition statement, apply one or more actions. Both types of filters have conditions to test almost every message attribute: source, sender, recipients, size, headers, body, and attachments. Both types of filters offer flexible actions that can be combined together, including dropping, bouncing, redirecting, quarantining, or modifying messages. In most cases, we can do the same tasks in a message or a content filter, so why choose one over the other? Should we stick to message filters, because they’re “more powerful?” Is it simply easier to always use content filters because they’re in the WUI? The answer is that you need both, and at times when the task at hand can be done in either engine, there will be a critical difference between the two that will make the decision for you. Other times, it’s purely your preference.

Message Filters Versus Content Filters

The most obvious difference between the two engines (or, more correctly, the two interfaces into the same engine) is the CLI versus WUI. Message filters get typed into or copy-and-pasted into the CLI, and content filters are built step by step in the WUI.

Example 11-1 is a simple message filter.

Example 11-1. Simple Message Filter

simple_sender_check_and_drop:
if mail-from == "sender@somewhere\.com" {
drop();
}

Example 11-2 shows a CLI session to add that filter to the configuration.

Example 11-2. Adding a Message Filter

esa02.cisco.com> filters

Choose the operation you want to perform:
- NEW - Create a new filter.
- DELETE - Remove a filter.
- IMPORT - Import a filter script from a file.
- EXPORT - Export filters to a file
- MOVE - Move a filter to a different position.
- SET - Set a filter attribute.
- LIST - List the filters.
- DETAIL - Get detailed information on the filters.
- LOGCONFIG - Configure log subscriptions used by filters.
- ROLLOVERNOW - Roll over a filter log file.
[]> new

Enter filter script. Enter '.' on its own line to end.
simple_sender_check_and_drop:
if mail-from == "sender@example\.com" {
drop();
}
.
1 filters added.

First, we see the CLI filters command, which is where we create and examine message filters on the system. You can also enable or disable individual filters and deal with filter logging. One filter action, archive, writes data to log files and that logging is managed here. List, of course, provides all the filter names in a list, and detail shows you the body of one or more filters. Note that there is no edit command—there is no way to modify a filter that’s already in the list. To change a filter, you must delete it and reenter it. For this reason, and for the fact that the space provided to enter your filter doesn’t even come close to being considered an editor, I recommend that you create and edit your filters elsewhere, and then just paste only the final result into the filters → new option. If you have the option, use an editor that’s designed for editing source code or has a “programmers” or source-code mode. I prefer vi, but anything that can give you a line count and some ability to match pairs of parenthesis or brackets makes for easier filter-mangling.

Note

This simple example includes a regular expression as the condition being tested against. The double-backslash is an escape sequence. In English, the three-letter regular expression \. tells the ESA to match a single period (or “dot”) character—and nothing else. We examine regular expressions in filters later in this chapter.

This brings us to a good question: Is filter writing programming? If you’re familiar with Perl, C, PHP, or Java programming, the syntax here might look familiar. Filters support constructs that look like branching, subroutines, and combinatorial logic. But, there are some important differences, most of which are limitations on filters:

• Filters have no state: Every filter is evaluated in the context only of the current message being evaluated. No storage of values or results across messages is possible. No counters can be incremented or compared against. We can cheat a bit and use SMTP message headers as signals or temporary storage.

• Filters operate independently of each other: It’s not possible to call one filter from another, or pass values between filters. They’re not subroutines, and they have no return values.

• Filters have limited access: With a couple of minor exceptions, filters have no access to files, memory, or any other local or remote services. You can only evaluate conditions about the message and its sender, and the results of lookups already performed.

• Filters get only one shot at a message: A message passes through each filter at most once, unless we intentionally loop messages back into the ESA. This technique, called reinjection, can be useful but has some undesirable side effects. We discuss this later in this chapter.

Despite the limitations, there is still a great deal of flexibility, power, and creative ways to get around some of the limitations. We dig deeply into all of that.

All message filters must have a few things. First, it must have a unique name. Any name works, provided that it doesn’t have whitespace or other nonalphanumeric characters, but descriptive is better. A colon follows the name. The filter body is one if statement, although this can have nested ifs below it. An else statement is optional, but supported. The filter must have at least one condition to test and at least one action to perform. We go into more depth on writing message filters after we examine the same exact logic in a content filter in the GUI. Figure 11-2 shows the content filter.

Figure 11-2. Content Filter Example

To create this filter, I started by clicking Add Filter... in the WUI page Incoming Content Filters under the Mail Policies tab. The example would be the same if I were working on Outgoing. On the new filter page, we must provide a unique name with no whitespace. A description is optional, but recommended, especially for complex or dangerous filters. Below that, you can see that I’ve added one condition, Envelope Sender (numbered 1, oddly enough), and one action, Drop (listed as Final). To add the conditions and actions, you naturally just click the corresponding button and, to edit a rule, click the name of the condition or action. This editability makes it immediately evident why content filters are easier to work with than message filters. Also notice that the value in the Rule column looks much like the filter condition shown in Example 11-1. This is not a coincidence, because both types of filters are going into the same rule-parsing engine.

Processing Order

Another fundamental difference between the two filter interfaces is their placement in the pipeline. As we saw in Chapter 3, “ESA Email Pipeline,” message filters are the first stage in the work queue portion of the pipeline. They are processed after message acceptance and after any sender or recipient address manipulation, like aliases or masquerading. Most importantly, message filters are evaluated before any of the security engines, including anti-spam, antivirus, Virus Outbreak Filters (VOF), and DLP. They’re also before the content filters, and messages that are acted on here are evaluated again in content filters. That leads to some interesting interactions that we’ll look at.

The early position of message filters means that we can take certain actions, like skipping anti-spam or antivirus engines, that aren’t possible to do in content filters. It also means that message filters are usually the wrong place to take other actions, like encryption, message body scanning, or disclaimers, because we don’t want to take these actions on messages that turn out to be spam or viruses. There’s also little point in running expensive filter tests against messages that will be dropped by those engines and can hurt performance.

Enabling Filters

Message filters are enabled by default and apply to all incoming and outgoing messages. To apply a message filter to your message traffic, you simply have to submit it and commit the change.

Content filters are not applied to any policy by default and are specific either to incoming or outgoing messages. To apply a content filter to traffic, you must submit it and then add it to one or more policies.

Message filters can also be set inactive through a toggle control in the filters command. An inactive filter is still present in the ESA configuration, but is not applied to messages until it is toggled back to the active state. Content filters don’t have an inactive status, but a content filter that is created but not applied to any policy is effectively inactive.

Some filter conditions and actions refer to other named entities on the system, like dictionaries, notifications, interfaces, or listeners. Content filters require that these entries exist or they won’t be available in the form for that condition or action. For example, you cannot specify a notification template that doesn’t already exist on the ESA; the notify action requires you to select an existing notification from a drop-down list. Message filters are different, however, because they allow for arbitrary names to be provided as arguments. If you specify an entity that does not exist, the message filter will be accepted but marked as invalid. An invalid message filter is not applied to messages. You can correct this by editing the filter or creating the named entity.

Combinatorial Logic

Both message and content filters can combine multiple test conditions using AND and OR Boolean logic. In message filters, combinations of AND and OR clauses can be used to test complex conditions. Message filters can also nest conditions, such as in this example:

complex_logic:
if (rcpt-to == "chriport@cisco\.com" OR header("Received") ==
"chrisporter\.com$") {
    if (body-size > 1024k) {
          bcc ("[email protected]");
    }
}

The filter in this header sends a BCC copy of the message if it’s larger than 1MB in size and has either a recipient of [email protected] or a Received header that contains the string chrisporter.com. Note that, in both the recipient and header conditions, I’m using a regular expression.

By comparison, content filters have more limited logic. When multiple conditions are added to a filter, you have a choice of matching when all conditions or true (i.e., all conditions are ANDed together) or matching when any condition is true (i.e., all conditions are ORed together). You cannot create content filter condition logic that includes both AND and OR. It’s possible, in some cases, to get around this limitation by using multiple content filters, one after another, to affect more complex decision making.

Message filters can also create nested rules that content filters don’t support, with the if/else conditional, as in this example:

message_size_src_host:
if (body-size > 128k) {
    if (body-size > 256k) {
          alt-src-host ("LargeMessage");
    }
    else {
          alt-src-host ("MediumMessage");
    }
}
else {
    alt-src-host ("Default");
}

This filter sorts messages by size and selects the appropriate delivery interface. First, messages larger than 128 KB are selected in the first condition. The second, nested, condition picks the LargeMessage interface for messages over 256 KB, and the else clause applies to messages between 128 KB and 256 KB and uses the MediumMessage interface. All other messages, those smaller than 128 KB, use the Default interface for delivery.

Scope of Message Filters

Message filters apply to all messages, incoming and outgoing, for all policies and senders. If you want to limit the scope of a filter, you must explicitly do so using the available conditions. Content filters must be created as either incoming or outgoing, and are assigned to mail policies and can thus be restricted to certain policies that affect only some senders or recipients.

To limit the scope of message filters, you must use qualifying conditions, such as recv-listener, sendergroup, rcpt-to, or mail-from, in addition to the other conditions you want to test.

Handling Multirecipient Messages

To write effective filters, there’s an important filter logic issue that you should understand. It occurs when a filter that has recipient conditions evaluates a message containing more than one recipient. Take this message filter as an example:

if (rcpt-to == "chriport@cisco\.com") {
drop();
}

You might be tempted to describe this filter as “drops messages where the recipient is [email protected],” but that’s slightly inaccurate. More accurate would be “drops messages where any recipient is [email protected].” With a multirecipient message, if any of the recipients match the condition, the message is dropped for all recipients.

Filters actions affect entire messages, but recipient conditions can match any recipients. When writing a filter that uses recipient conditions like rcpt-to or rcpt-to-group, you need to keep this fact in mind.

With most filter actions, this is not a problem and, in fact, is the desirable effect. Take this example, which sends a copy of mail, silently, to another address:

if (rcpt-to == "[email protected]") {
bcc ("[email protected]");
}

You might write such a filter to monitor a particular user’s mail traffic. You would certainly want this to apply to messages where [email protected] is the only recipient or is one of many recipients.

If you do need actions to apply only to a single recipient, the better approach is to use incoming or outgoing mail policies to splinter messages and use content filters to take action. We describe that later in this chapter.

Availability of Conditions and Actions

The last major difference between content and message filters is that some conditions and actions are simply not available in content filters in the WUI. Some, like skip-spamcheck, are missing because they’re not relevant to content filters; spam checking occurs before content filters in the ESA pipeline. But, others just aren’t included in the list of conditions and actions available to content filters. For the most part, these advanced features aren’t used often, but if you find that you need them, obviously the choice is made for you—you must write a message filter.

Filter Conditions

Filter conditions refer to the tests performed against a message by the filters engine. Conditions vary about what is being tested. Some take no arguments and are essentially simple true/false tests, such as encrypted and valid. Each message filter or content filter can incorporate multiple conditions, combined with AND and OR into a Boolean logic. When the entire condition expression evaluates as true, the actions listed in the filter are taken.

Content filters do not necessarily need conditions; a filter with no condition is considered always true, and the actions will always be taken. In message filters, the true condition can be used to the same effect:

always_add_originator:
if (true) {
strip-header ("X-OriginatingIP");
insert-header ("X-OriginatingIP", "$RemoteIP");
}

Filter conditions often have multiple forms: a unary test, which is a simple true or false result, and a binary, or comparison, form that tests against a supplied value. For example, the condition header (“CC”) by itself tests whether the CC: header is present in a message. The condition header(“CC”) == “cisco” tests whether the CC header exists and contains the string “cisco” anywhere in it.

Conditions That Test Message Data

The most commonly used filter conditions are those that test something about the message itself: sender, recipients, headers, and body. Attachment conditions are discussed later in this chapter. Table 11-1 lists these filter conditions and gives examples of their use.

Table 11-1. Conditions for Testing Message Data

Operating on Message Metadata

Many filter conditions are provided for testing message metadata—that is, data about the message rather than the actual message content. This category includes the conditions that show the results of message processing earlier in the pipeline. Table 11-2 describes metadata conditions.

Table 11-2. Conditions for Testing Metadata and Scanning Results

Attachment Conditions

The ESA has thorough capabilities to open and examine text content within binary attachments, and supports more than 400 different kinds of attachment types. When this conversion is done, it makes the text content of attached documents available for filters to test. It supports archive formats, like TAR and ZIP, providing scanning results against the files contained in a compressed archive. The ESA also treats some document types, like Microsoft Office documents, as containers and can unpack and examine content embedded within. For example, the ESA can find content in a spreadsheet object embedded in a Word document.

When using a content or message filter condition that examines the body of the message, attachments are automatically unpacked, if needed, and converted to plain text. The filter comparisons are then performed against this plain text. Because this process can be resource intensive, the ESA only does the decoding and conversion when the message in question will actually hit a filter condition that requires it. Any filter condition that examines message body content, like body-contains, or attachment data or metadata, like attachment-type, will cause the message to be converted. The process is performed only once, regardless of the number of separate filters that examine the data.

Table 11-3 describes the attachment filtering conditions.

Table 11-3. Attachment Conditions

Attachment filtering on the ESA has a subtle problem in some cases when combining multiple attachment rules. Like the issue with multiple recipients to messages described previously, this problem arises when messages have multiple attachments.

Suppose, for example, that you want to quarantine all messages that had a PDF attachment with the phrases “sensitive and confidential” in it. You might write this message filter or the equivalent in content filters:

detect_sensitive_pdf:
if (attachment-type == "pdf" and attachment-contains "sensitive and confiden
tial") {
quarantine ("Policy");
}

This seems to cover the requirement, and it will indeed stop messages that contain a PDF with such a phrase in it. But, what happens when you have more than one attachment? If a message arrived at the filter that had a PDF attachment without the phrase, that would meet the first condition; if the message also has a Word document with the phrase “sensitive and confidential,” that would meet the second condition. This two-attachment message would trigger the quarantine action, even though the phrase doesn’t appear in the PDF.

A better description of the detect_sensitive_pdf filter in the previous example would be “if any attachment to the message is a PDF, and any attachment contains this phrase, then quarantine.” That’s a slightly different case than the original business requirement. Unfortunately, there’s no way to guarantee that two separate attachment conditions are matching on the same attachment.

System State Conditions

A few convenience conditions are available that test various system states or external data. All are available only in message filters and have no content filter equivalent. These are listed in Table 11-4.

Table 11-4. System State Conditions

Miscellaneous Filter Conditions

The remaining filter conditions don’t easily fit into a single category, so they are grouped in Table 11-5 for completeness.

Table 11-5. Other Filter Conditions

Filter Actions

Actions are the operations that are performed when the conditional expression of a filter evaluates to true. You can have more than one action on a filter, and they will typically run in the order specified. Final actions, such as drop, bounce, or skip-filters, really are final and no further processing is taken on a message that hits them. In content filters, the filter-building WUI will always put final actions as the last action in your filter.

Changing Message Data

Table 11-6 lists the available actions that change basic message data or metadata. Certain attributes, like sender, message size, reputation, remote IP, and others, are not modifiable.

Table 11-6. Message Metadata Actions

Altering Message Body

ESA provides filter actions to modify the actual message data. You can add disclaimers to messages either at the top (header) or bottom (footer) of a message. You can drop attachments based on the same conditions used to identify attachments. In one case, you can actually modify the body of plain text messages, but it is not possible to modify binary attachments.

Table 11-7 lists the actions that modify messages in some way.

Table 11-7. Message Modification Actions

Affecting Message Delivery

Table 11-8 lists the actions that modify message delivery in some way, by altering delivery destination or adding or changing recipients.

Table 11-8. Filter Actions That Affect Message Delivery

Altering Message Processing

Table 11-9 lists the filter actions that alter the processing steps that a message goes through, skipping the upcoming pipeline stages. These actions take no arguments.

Table 11-9. Message-Processing Actions

Miscellaneous Filter Actions

The remaining filter actions are listed in Table 11-10.

Table 11-10. Miscellaneous Filter Actions

Action Variables

In most filters that perform notifications, whether to sender, recipient, or another party, the notification should include data about the message triggering the filter. This is especially true of notifications about dropped or modified messages, or messages that match sensitive data rules. You could create notifications with custom text for each possible matching condition, but it’s far better to include data from the message in question. Users who know exactly why a message was modified or undeliverable will open fewer support tickets and have a clear idea of the rules being enforced. The filters engine on the ESA provides action variables that allow you to include this data.

Action variables start with a $ and are limited to a specific list of supplied variables. You’ve already seen several examples of action variables in this chapter. I used $RemoteIP to add the sender’s IP address to the headers of a message.

Other useful action variables include $EnvelopeFrom, $EnvelopeRecipients, $Subject, and $Filenames. Variable names are not case sensitive. The action variables available to you in notifications and other filter actions are described in Table 11-11. There are additional action variables available for anti-virus notifications and those are described in Chapter 8, “Security Filtering,” in the section “AV Notifications.”

Table 11-11. ESA Action Variables

Regular Expressions in Filters

Almost every filter condition that looks for matching content in a message is expecting a regular expression (regex) as the argument. Regular expressions are a mathematical language for expressing character, word, and string matches. The characters between the quotes of a message filter condition, or entered in the text box in the WUI, are always treated as a regular expression. The ESA generally follows the Python regular expression syntax, and most regex metacharacter, character set, and compilation flags are supported. The filter conditions only support regex matching expressions and do not support modifications like substrings, splitting, or replacement.

There are many great online and text references to regular expressions, so I do not make an attempt to be an authoritative guide. However, some regular expression metacharacters and character classes repeatedly come up in ESA filters.

An important topic in filters is the use of the escape special character, , the backslash. In regular expressions, the backslash serves two purposes: to indicate a character class, such as s or d, which refers to whitespace characters and digits, respectively. The other is to undo the special nature of characters like +, *, and ^. If you want to match an actual asterisk, you must use the sequence *, and if you want to match an actual backslash character, you must use \.

Table 11-12 lists some of the important regex entities commonly used in ESA filters.

Table 11-12. Important ESA Regular Expression Entities

There are some other items to note about regular expressions in the ESA:

• All regular expressions have an implicit .* sequence at the beginning and end. Effectively, this means that unless you anchor your regex, you are asking for a substring match. For example, when a regex such as bit is written, the pattern matches the word bit, but also arbitrary and gambit. This is why the pattern ID:dd matches ID:001 and SID:23A.

• Some filter conditions, like remote-ip, support ranges of matching values but not proper regular expressions. remote-ip allows for matches against inclusive IP ranges, such as 192.168.1.34-37, or with CIDR notation, such as 10.1.0.0/16 or 10/8.

• Regular expressions are always case-sensitive by default. On the ESA, you can make regular expressions case-insensitive by using the (?i) directive at the start of your regex.

• The use of the backslash character as an escape differs across content filters and message filters. Because the backslash is first parsed by the ESA filters engine, and then passed to the Python regex engine, all backslash characters must be doubled. In message filters, you must include the two backslashes in your filter code. In content filters, the ESA automatically adds a backslash to every one that you enter, so you must always use one backslash in your regular expressions.

As an example of this, take this message filter from earlier in this chapter:

simple_sender_check_and_drop:
if mail-from == "sender@somewhere\.com" {
drop();
}

The equivalent content filter condition, as entered by the ESA administrator, is shown in Figure 11-3.

Figure 11-3. Content Filter Condition, as Entered by the Admin; Note the Single Backslash

The resulting content filter, with the single backslash converted into two, is shown in Figure 11-4.

Figure 11-4. Content Filter Results

Dictionaries

Many of the filter conditions I describe refer to dictionaries on the ESA. Dictionaries refer to collections of words, phrases, and regexes, and can be used to test messages for matches. In general, the conditions that refer to a dictionary will evaluate to true if the message matches at least one term in the dictionary that you specify.

Dictionaries also support weighting of terms, where each word, phrase, or pattern is assigned a numeric value. Each match adds the corresponding score to a running total for that message. Every message scanned by the filter has a total score calculated and compared to the threshold you provide.

For example, assume that a dictionary contains these terms and scores:

Confidential, 1
Copyright, 1
Secret, 5
Do Not Distribute, 10

A message with the sentences, “This material is confidential. Do not distribute under any circumstances,” compared against this dictionary would score 11 points. You can create filter conditions that trigger after a particular score threshold is reached, as in this example:

sensitive_information:
if (dictionary-match ("Sensitive", 10)) {
quarantine ("Sensitive");
}

In our examples, a single instance of the phrase Do Not Disturb is enough to trigger the quarantine action, but would require multiple matches of the other terms. Any repeated matching terms are each added to the score, so three instances of the word Confidential would score 3 points for that message.

Dictionaries are managed on the ESA through the dictionaryconfig command or in the WUI on the Mail Policies → Dictionaries page.

Notification Templates

Another topic closely related to filters is that of notification templates, which are a type of text resource on the ESA. Text resources are customizable versions of the various email text that can go to end users, like bounce messages. Although the ESA provides defaults, you can customize them in the WUI by going to Mail Policies → Text Resources or in the CLI to textconfig.

Notification templates are used by the filter action notify. You have the option of specifying a notification template that you’ve created or use the default system notification. Because the notification action in content filters only allows you to choose templates already created, you should create your notification templates before creating the filter. Message filter notify actions that don’t refer to a valid template name will be marked invalid and will not be run against messages.

The distinct advantage of using a custom notification template is that it will contain exactly the information you want, including data from the message that triggered the filter. Notifications support a wide range of action variables that are written with real data from the message at the time the notification is generated in the filter.

Here’s an example notification as created in the ESA interface, complete with action variables:

I'm sorry, your message to $enveloperecipients sent at $timestamp with Subject
'$Subject' was not delivered because the message violated the company's sensitive
information disclosure rules.
Specifically, this content:
$MatchedContent
Was sent in plain text over an unencrypted channel. To send this material, please
encrypt the message contents before resending.

Here’s what the resulting text would be, if triggered by a message or content filter:

I'm sorry, your message to [email protected] sent at 25 Apr 2011 14:23:36 -0500
with Subject 'Legal documents' was not delivered because the message violated the
company's sensitive information disclosure rules.
Specifically, this content:
Copyright 2011 Sensitive and Confidential.

Was sent in plain text over an unencrypted channel. To send this material, please
encrypt the message contents before resending.

Smart Identifiers

In addition to the rich regular expression syntax, the ESA filters support the comparison of message data with smart identifiers. Smart identifiers are like a predefined regex pattern, in that they match sequences of characters, but have additional smarts to recognize particular data entities.

The four smart identifiers provided in the ESA are

• ABA routing numbers: This identifies 9-digit numerical sequences that uniquely identify a financial organization in the United States, also known as a routing transit number (RTN). This code appears on checks and other payment forms and consists of a federal reserve routing portion, an ABA institution identifier, and a check digit.

• Credit card number: Identifies 14-, 15-, or 16-digit payment card numbers issued by U.S. banks or credit unions. This smart identifier calculates the check digit by using the Luhn algorithm to verify that the sequence is actually a valid credit card number.

• CUSIP: A 9-digit alphanumeric code that uniquely identifies U.S. securities, like stocks or mutual funds, maintained and published by S&P. Like the first two smart identifiers, it has a check digit algorithm.

• U.S. Social Security Numbers (SSN): The 9-digit number issued to U.S. citizens and permanent residents working in the U.S. by the Social Security Administration. Although SSNs do not have a checksum, there are requirements for the area and group numbers that are part of every SSN. This smart identifier applies those rules to sequences that are potentially SSNs.

Smart identifiers can be used in message or content filter conditions that examine the body of messages. Furthermore, a smart identifier can be used as an element in a dictionary. Compared with writing regexes for possible combinations and formats of this data, smart identifiers save a lot of time and trouble.

Using smart identifiers in place of a regular expression helps to improve matching accuracy by reducing the number of false positives. For example, if you want a filter to match 16-digit credit card numbers, with groups of 4 separated by dashes, you could use this regex:

d{4}-d{4}-d{4}-d{4}

Applied to these sequences, both would be a match for this regex:

5105-1051-0510-4902
5105-1051-0510-5100

However, only the second is actually a valid credit card number that passes the Luhn checksum algorithm. The smart identifier would correctly match on the second, but not the first.

Using Smart Identifiers

Using smart identifiers in content filters is easy. When adding a content filter condition that tests a message body, like Message Body or Attachment, Message Body, or Attachment Content, one of the selectable options is Contains smart identifier. A drop-down box allows you to select the type, as shown in Figure 11-5.

Figure 11-5. Using Smart Identifier in Content Filters

Using smart identifiers in message filters requires using a specialized keyword that starts with an asterisk. This keyword must be used alone and cannot be combined with another smart identifier or a regular expression. For example:

check_ssn:
if (body-contains ("*ssn")) {
quarantine ("SSN");
}

If you’re using smart identifiers as an entry in a dictionary, just apply the dictionary match to the filter condition as you would any other dictionary.

Smart Identifier Best Practices

Smart identifiers can be used anywhere there’s a requirement to examine email messages and attachments for the entities that they match. It is common for organizations to have a policy in place to prevent the transmission of personally identifiable information (PII) or sensitive financial information via email. This may be driven by corporate data-protection practices, regulatory compliance, or industry requirements.

However, it’s rarely a good idea to use regular expressions or even smart identifiers to take action on messages. This is because smart identifiers lack any context within the message and are looking only for alphanumeric or numeric sequences that follow a particular format. Not every 16-digit number, even those that pass the Luhn checksum, are actually an example of payment card data. Using smart identifiers alone will likely result in many false positive hits, and actions being taken on messages that do not actually represent data loss.

Best practice for using smart identifiers for the purpose of identifying data loss is to provide context along with the smart identifier match. You should, at a minimum, combine your smart identifier rules with other corroborating data—for example, credit card issuer names, expiration date regexes, and the phrases “credit card,” “debit,” and so on. When initially deploying such policies, I recommend a nonintrusive action, like quarantine-copy, so that you can review the matches and modify the filters based on what is being caught.

U.S. SSNs represent a particular challenge. Many 9-digit numbers fall into valid SSN ranges, and the numbers do not have a checksum or other verification. Many common numeric sequences, like shipper tracking numbers and international phone numbers, include a 3-2-4 pattern of digits that fully meet the SSN format requirements. If you have the need to filter U.S. SSNs, it’s imperative that you combine the smart identifiers with other contextual information, like the phrase “Social Security,” “SSN,” “SS#,” and others.

Crafting data loss-prevention policies using smart identifiers is more of an art than a science. To make this process easier, the ESA recently incorporated dedicated data loss prevention features. We examine these capabilities in Chapter 15, “Advanced Topics.”

Content Filter and Mail Policy Interaction

Earlier, this chapter mentioned a subtle detail with message or content filters that have conditions that test recipients. In message filters, this is the rcpt-to condition, and in content filters, it’s the envelope recipient condition. The issue occurs when messages have multiple recipients; the condition tests if any recipient matches the condition, whether a regular expression, LDAP lookup, or dictionary test.

In the first example, we had a single rcpt-to condition and a simple action, drop. Let’s look at a more real-world example, involving intellectual property and encryption. Suppose that your organization is involved in a confidential project, such as an acquisition of another organization that also happens to be a partner and supplier. During a normal day, hundreds of messages pass to and from this organization. Naturally, email will also be used as part of the communication between your organization and the target about the acquisition, but also with a third party, an outside law firm handling the acquisition negotiations.

Suppose that protecting confidentiality in this communication includes the following requirements:

• Outgoing messages to users @targetacq.com and @lawfirm.com, and sent by a group of users in the legal department in your organization, must be encrypted before delivery.

• Outgoing messages to users @targetacq.com and @lawfirm.com, sent by anyone else in your organization, must be dropped and a notification sent back to the sender informing them of the policy.

• Incoming messages sent from users @targetacq.com into your organization are delivered if they are addressed to the Legal department.

Assume that the users in the Legal department have an LDAP attribute identifying them and that there is a working LDAP query on the ESA. What’s the best way to approach this problem? We could write filters that check for the recipient domain and encrypt messages, as shown in Example 11-3.

Example 11-3. Filters for Sensitive Messages

encrypt_sensitive_outgoing:
if ((rcpt-to == "@targetacq\.com" OR
    rcpt-to == "@lawfirm\.com")AND
    mail-from-group == "Legal") {
        encrypt();
        skip-filters();
    }

block_sensitive_outgoing:
if (rcpt-to == "@targetacq\.com" OR
rcpt-to == "@lawfirm\.com") {
      drop();
      notify ("$EnvelopeFrom", "Your message was dropped", "[email protected]",
   "SensitiveTemplate");
}

restrict_sensitive_incoming:
if mail-from == "@targetacq\.com" {
      if rcpt-to-group == "Legal" {
             skip-filters();
      }
      else {
             drop();
      }
}

There are some subtle issues here. For outgoing mail, this solution has the side effect of encrypting all messages where any recipient in the message matches the rule. For example, if an attorney at your company sends a message to both [email protected] and [email protected], the message to Cisco would be encrypted along with the message to the target company. This may not be a terrible problem; the other recipients can simply open the encrypted message.

A similar problem occurs with the second filter. Messages going to more than one recipient will be dropped if at least one of the recipients is in the target or law firm domain.

The far more serious logic issue is in the third filter. For all messages coming from targetacq.com, the rcpt-to-group condition will be true whenever any recipient on the message is in the legal department. A message from targetacq.com to local recipients will get through your filter as long as at least one of the recipients is in Legal. Someone can effectively bypass your restriction with a BCC.

The answer to this dilemma is to combine content filters with outgoing mail policies. As you saw earlier in this chapter and in Chapter 3, content filters are processed later in the pipeline and are created and run on a per-policy basis. When a single message has multiple recipients, and those recipients match different policies, the ESA splinters the message into multiple copies, and passes the message down the pipeline independently. Because content filters are processed after the splintering, we need to switch our approach to using content filters.

The solution to the problem presented in the restrict_sensitive_incoming filter is to make sure that our test only applies to the right recipients. First, navigate to Mail Policies → Incoming Mail Policies and click Add Policy to add a new policy. Name it Legal Department and leave the order at number 1. Use the drop-down at the bottom for LDAP group and type the department name Legal, and then click Add>>. The result should look like Figure 11-6. Click Submit to add the policy.

Figure 11-6. Incoming Mail Policy for Legal Department

This policy only applies to recipients in the Legal department. Now, we can recreate the restrict_sensitive_incoming filter. Create a new incoming content filter from Mail Policies → Incoming Content Filters. Your new filter should look like Figure 11-7. We do not use filter logic to allow messages to Legal; you’ll see what I mean in the next step.

Figure 11-7. Content Filter to Restrict Incoming Messages

The last step is the key. Click the Content Filters column in the Default Policy row to apply the newly created filter to the Default, but not to the Legal Department, as shown in Figure 11-8.

Figure 11-8. Result of Applying Content Filter to Default Policy

What happens to incoming messages from targetacq.com now? For single-recipient messages going to a user not in the Legal department, the message follows the default policy and is dropped. For single-recipient messages going to a user in the Legal department, the check isn’t even performed, so no messages are dropped. For multirecipient messages with users in both policies, ESA splinters the message into two messages: one with the Legal recipients and one with the other recipients. Each message individually progresses through the pipeline. The Legal recipients receive their copy and the other recipients do not.

Here’s another example where you would certainly want to use mail policies to apply per-recipient policies. For protecting corporate data, suppose that you want to prevent your users from sending documents to common webmail domains, like yahoo.com, hotmail.com, and gmail.com. Any message with documents attached should be dropped and a notification sent to the sender. You could do that with this filter, which sends the notification with a customized subject and template and a return address of [email protected]:

drop_webmail_doc_messages:
if (rcpt-to == "(yahoo|gmail|hotmail)\.com$") {
    drop();
    notify ("$EnvelopeFrom", "Your Message Was Not Sent", "postmaster@example.
   com", "WebMailNotify");
}

You’re again in the same boat as before if someone sends a message to two or more recipients, with at least one recipient at a webmail domain and one recipient somewhere else. The non-webmail recipient is caught in the filter and the message dropped.

One nonfilter solution to this issue is to simply map the webmail domains to the destination /dev/null in SMTPRoutes; that would force messages to those domains to be dropped. However, you would not get the sender notification that you need. The right solution is to create an outgoing mail policy that matches these domains, and apply a content filter that drops and notifies the sender.

Using mail policies to separate recipients or senders is so effective, and so convenient, that whenever a filter seems to call for a rcpt-to or mail-from condition, I opt for mail policies instead. It generally does the right thing with multirecipient messages, it’s high performance, and frankly, it’s much easier to administer. The approach has my highest recommendation.

Filter Performance Considerations

Filter conditions and actions, when applied to a large number of messages, can add significant scanning effort to any ESA. The good news is that in most ESA configurations, the IPAS scanning process is typically the most resource intensive, because it runs a lot of anti-spam rules against a lot of messages. Typically, filtering is performed on the smaller set of clean incoming and outgoing mail only. This doesn’t mean that filters can’t cause performance problems, though; at some point, the added work of filters will have a meaningful impact on maximum throughput.

The first performance consideration is the process of decoding, unpacking, and converting MIME attachments. When an ESA receives a message, the system makes no attempt to parse the MIME headers or parts, and simply stores the transmitted message bodies as received. This blob of data remains unparsed until a filter condition or action is executed that requires decoding.

The decoding process starts with parsing the MIME headers to discover the MIME boundaries that separate the message parts. Binary attachments encoded with binary-to-ASCII schemes, like Base64 or uuencode, are decoded to the original binary form. The file-typing process runs on the binary to determine the actual file type. If the resulting binary is an archive format, such as ZIP, TAR, or GZIP, the archive is unpacked and decompressed. At this point, the ESA now has the original binary data, the true file type, and the reported filename and MIME type of each attachment to the message. The files are then passed to the file conversion engine for rendering the file contents into plain text for known file types, like PDF, MS Word, and Excel.

As you might guess, this decoding process can be expensive for large or complex messages. The performance hit on large messages is clear, and the larger the message, the more time it takes to go through the process. Message complexity is more difficult to quantify. Messages with many attachments are more complex. Archive files require decompression and are thus more complex.

Filter conditions and actions that operate on message attributes other than the body, such as sender and recipient rules, header rules, metadata rules, like body-size or remote-ip, are all very fast; the data is stored with the message in memory and the filter test is simply a value comparison. Because these individual message attributes are fairly small, even regular expression matching is not resource intensive.

Filter conditions that modify the message body in some way are the most expensive. The ESA uses a copy on write optimization scheme; this means that, to change the message body, a new copy is created in memory and a new MID assigned to the message.

Table 11-13 categorizes the various filter conditions and actions into performance buckets.

Table 11-13. Performance Categories of Filter Conditions and Actions

The performance impact here is not stating that body scanning suddenly causes all messages to take twice as long, or even that the capacity of the system will decrease by 50%. These numbers indicate that the peak throughput of an ESA, compared with one running no filters whatsoever, decreases by approximately that value. In general, if you are running anti-spam and antivirus, the performance of those engines dictate the peak throughput of the appliance. Adding a body-scanning filter to a fully burdened system likely results in no loss of peak throughput or increased latency.

Improving Filter Performance

Improving filter performance requires, minimizing the use of expensive filter conditions and keeping regular expressions as specific as possible. Ordering of filters can help in some cases, but because of the optimizations that the ESA performs under the hood, using logical conditions to exempt messages from expensive scans usually won’t create a noticeable difference in performance. To avoid calling the regular expression engine multiple times, the ESA checks all filter expressions against each message with a single call and records the results of each pattern, whether that pattern will be used or not.

For example, consider these two filters:

one:
if (body-contains ("simple") {
    alt-mailhost ("10.17.23.2");
    skip-filters();
}
two:
if (body-contains ("Copyright dddd") {
    alt-mailhost ("10.17.23.4");
}

You might expect that these filters, run in this order, would result in messages containing the string simple to not have their content examined for the Copyright dddd regex. However, the ESA evaluates both of these expressions, because a single scan of the message bytes during a single call to the regex engine is more efficient than doing them separately. It’s faster to run all the regular expression tests at once, even if the results won’t necessarily be used.

The ESA has other built-in optimizations for filtering messages. The message decoding, decompression, and parsing process happens only once for messages that must be body scanned, to avoid duplicating that expensive process. The best optimization we can do is avoid the decoding step altogether.

Here are some general tips for improving filter performance:

• If you plan to drop messages, such as those with dangerous attachments, like exe, pif, bat, or com, you should do so as early as possible to avoid any unnecessary scanning. Content filters that drop messages should be the first ones in the list. If you can do this work in message filters, even better.

• The fewer the messages that pass through filters, the better the performance will be. Use LDAP recipient validation to ensure the ESAs aren’t accepting messages for nonexistent recipients. Set message size limits appropriately in the HAT to avoid accepting large messages that will bounce. If spam messages are to be delivered to the end user or to the ISQ, be aware that these will pass through the content filters engine; we’ll address this in a recipe later in this chapter.

• Try to avoid modifying messages, because this is an expensive operation. The most common forms of message modification are disclaimers (footers and headers) and stripping attachments. Disclaimers are best handled by individual user’s mail user agent (MUA) because of performance but also because of trouble with character set mismatches. However, many companies mandate the use of a legal disclaimer at the bottom of messages, and this may override concerns about ESA performance. Instead of stripping attachments, drop the messages, or quarantine, and send notifications to the recipient.

• Reduce the number of calls to the regular expression engine. If you’re looking for one of many possible patterns, use a single pattern with | (the pipe character) to define or terms within the regex. This is an extremely good approach to attachment filtering, for example. Suppose you want to filter attachments by filename and stop all files with file extensions like exe, com, bat, vbs, and pif. Instead of using attachment-filename (“.exe$”) AND attachment-filename (“.com$”), use a single condition like attachment-filename (“.(exe|com|bat|vbs|pif)$”).

• Avoid regular expression wildcards, like .*, especially in body scans. If you need to search for two separate conditions in the same file, try to use a limited wildcard. For example, if you’re searching for the terms “sensitive” and “confidential” when they appear in the same file, you’d be tempted to use a regex of sensitive.*confidential. What happens, though, when a file contains the word “sensitive” near the beginning, and “confidential” near the end, or not in the file at all? The regex pattern matching continues all the way until the end of the file, examining all bytes of the message. If you use a couple of these kinds of regexes in your filters, you end up asking the scanning engine to search the content multiple times. Because the words “sensitive” and “confidential” will likely appear near each other in the file, use a regex like sensitive.{1,100}confidential, which matches on the words sensitive and confidential when they are separated by no more than 100 characters. Limiting the scope of the regex match in this way considerably improves performance.

Another tactic for improving performance is to move some of the load off system, especially for large messages. The ESA can alter delivery for large messages to avoid processing them on box, using the body-size condition and alt-mailhost action. We discuss two-tier architectures for handling large messages in Chapter 13, “Multiple Device Deployments.” Unfortunately, no such simple test exists for message complexity: The complexity of a message isn’t known until the decoding is complete.

Filter Recipes

We spent much of this chapter on the details of filter conditions and actions with a few examples to illustrate some of the art of crafting effective filters. Here, we devote time to specific filter recipes that solve particular problems. For most of these scenarios, there’s more than one way to compose such filters.

Dropping Messages

It may be obvious, but a word of warning about the drop action in either message or content filters. Dropping a message is dangerous: The system immediately discards the content and provides no means of recovery.

Even the best filter writers in the world make mistakes or write overly broad regular expression matches. Take this message filter, for example:

drop_virus_messages:
if (subject == "[VIRUS]") {
drop();
}

Seems like a great idea, no? Drop messages that either the ESA or another system have detected as a virus? Do you see the problem? This is a bad regular expression because the [ ] brackets indicate a choice of characters. That means that this filter will drop any messages that contain a single V, I, R, U, or S character anywhere in the subject. This results in a lot of legitimate messages becoming permanently unrecoverable.

For these reasons, if you intend to drop messages to prevent them from reaching the end recipient, I recommend using the quarantine action to hold the messages, giving you a safety net should anything untoward occur. At the very least, use the archive action in a message filter to store a copy-of-last-resort. Once a filter has been running for some time with a nondestructive action, you can consider changing it to a drop action. Even then, I would limit the use of drop to situations where discarding the messages is a matter of life and death, such as traffic storms or other situations that risk the entire email infrastructure.

Basic Message Attribute Filters

This section covers some scenarios that can be addressed with filters that examine basic message data. We’ve already seen filters that examine sender and recipient to look for specific email addresses, but filters have access to almost all parts of the message body and to metadata like the source IP of the connecting client.

For example, filters can examine message headers, acting if a particular header exists or not, or if it contains some value. Suppose employees in your environment are instructed to mark messages as company confidential when sending sensitive information. The ESA can act on the sensitivity header, like this:

encrypt_confidential:
if (header ("Sensitivity") == "Company-Confidential") {
encrypt();
}

Filters can also be used to add information to messages that can make troubleshooting easier:

markup_originating_IP:
if (true) {
strip-header ("X-Originating-IP");
insert-header ("X-Originating-IP", "$remoteip");
}

Attachment filtering is a common solution, and it can be done either through message filters or content filters; the regular expressions are what matters.

First, a basic filter that identifies messages with executable attachments. This identifies executables by file type, not relying on the filename reported in the MIME headers for identification. This will find win32 executables even if they’ve been renamed, and it will even find them within zip or tar archives:

stop_exe_attachment:
if (attachment-filetype == "exe") {
quarantine ("Policy");
}

If your security policy requires dropping other kinds of executable software, like installers, dynamic link libraries, screensavers, and Java-based applications, you can do that with an entire category of file type identifications:

stop_exe_attachment:
if (attachment-filetype == "Executable") {
quarantine ("Policy");
}

Some types of scripting content, like VBScript or DOS batch files, cannot be identified by type and are not categorized in the executable file type. We must resort to attachment filenames to filter these messages. It’s tempting to write individual conditions for these in a filter, like this:

stop_script_and_batch:
if (attachment-filename == "\.bat$" or attachment-filename == "\.vbs$") {
quarantine ("Policy");
}

Although that filter certainly gets the job done, it’s not as efficient as it could be—and if you’re stopping a few dozen different file extensions, the overhead of invoking the regex engine many times can be high. A better approach is to invoke the regex engine just once:

stop_script_and_batch:
if (attachment-filename == "(?i)\.(bat|vbs)$") {
quarantine ("Policy");
}

For long lists of file extensions, the regex grouping can be easily extended:

attachment-filename == "(?i)\.(bat|vbs|cmd|wmf)$"

Note that I’m using the regex preprocessor flag (?i) that tells the engine that the pattern following it is case-insensitive. This allows the regex to match on filenames regardless of the case the names were created in.

Body and Attachment Scanning

Almost every organization will have a need, at some point, to use filters to scan the bodies of messages, whether for acceptable use, security filtering, intellectual property protection, or regulatory compliance. Here are some examples of how to solve common tasks that involve content in the body of messages, including any attachments.

For matching single terms or regular expressions in filters, we’ve already seen filter examples that use the body-contains condition. Most content filtering needs require matching more than one term, and the best way to search for lists of content is to use a dictionary. For example, if your organization prohibits the use of profanity in email messages, you can use a filter like this:

stop_profanity:
if (dictionary-match ("Profanity")) {
quarantine ("Policy");
}

The advantage of using a dictionary is that modifying the list of search terms is as simple as modifying the dictionary. You don’t need to change the message filter to add or remove terms. Composing a dictionary of profanity terms is left as an exercise for you.

The next example is a common combination of tasks: Stop local users from sending any kind of executable attachment, knowingly or otherwise. Aside from being good policy, this kind of filter can catch local user with infected PCs that are sending malware-laden messages. In this case, we focus on executables, quarantine the message for administrative review, and notify the sender about the action:

quarantine_and_notify_exe:
if (sendergroup == "RELAYLIST" AND attachment-filetype == "Executable") {
quarantine ("Policy");
notify ("$EnvelopeFrom", "Your message contains a restricted attachment",
"[email protected]", "AttachmentNotifyTemplate");
}

I’m using the sendergroup condition to apply this to outgoing mail only, because I don’t want to send notifications to Internet senders of incoming mail. The AttachmentNotifyTemplate uses action variables to describe the message and the policy violation to the sender:

Your message contained a dangerous attachment type and was not delivered:

Subject: $Subject:
Recipients: $EnvelopeRecipients:
Filenames: $Filenames

Corporate policy prohibits sending potentially dangerous attachment types. If you
believe this message is in error, please contact the helpdesk.

Another common use of filters is to attach a footer disclaimer to the bottom of each outgoing message. However, in the case of a reply, you likely only want to do that if the message doesn’t already contain the disclaimer. If you blindly add the disclaimer, a back-and-forth reply conversation will end up having the disclaimer added with each outgoing reply, resulting in messages that grow larger. A simple way to prevent this is to look for the disclaimer text in the message already, as in this filter:

add_disclaimer:
if (sendergroup == "RELAYLIST") AND (NOT (body-contains ("This message contains
sensitive information for the intended recipient only"))) {
add-footer ("LegalDisclaimer");
}

As in the attachment notification filter, I’m using the sendergroup condition to apply this to outgoing mail only.

Complex Combinatorial Logic with Content Filters

Near the beginning of this chapter, we saw this message filter that includes both AND and OR statements:

complex_logic:
if (rcpt-to == "chriport@cisco\.com" OR header("Received") == "chrisporter\.
com$") {
   if (body-size > 1024k) {
          bcc ("[email protected]");
   }
}

This filter’s logic is “if the recipient is Chris Porter, or the Received header indicates it passed through a host called chrisporter.com, and the message is larger than 1MB, send a copy to [email protected].” More formally, the logic of this is if (A OR B) AND C.

I stated that this kind of complex logic isn’t possible with content filters, because they are limited to all the conditions being ANDed or ORed together. It’s not possible, directly, but you can implement this logic with two filters, using a header to pass values between. Here’s how to do that.

The first content filter, shown in Figure 11-9, tests condition (A or B). Instead of directly acting on the message, it adds a header to the message, indicating that the first compound statement is true.

Figure 11-9. First of a Two-Filter Combination for Some Complex Logic

The second content filter immediately follows in the processing order. If the header from the first filter is present, and the body is larger than 1MB, it takes action. This filter is shown in Figure 11-10.

Figure 11-10. Second Filter in the Two-Filter Combination

The second filter also removes the header added in Figure 11-9, so this little trick doesn’t add unnecessary information to the message.

Routing Messages Using Filters

The filter action alt-mailhost, or “Send to Alternate Destination Host” in content filters, can be used to alter the normal delivery path of a given message. Because of the granularity of filters and mail policies, you can implement some complex routing using filters. If you combine this with information in an LDAP directory, such as Active Directory (AD), you can even route based on user attributes in that directory.

As an example, suppose that your environment has an external server specifically for encrypting email messages, and you want to send messages there when they are confidential or if the sender puts a tag in the Subject. You can use a filter to redirect messages, like this:

redirect_confidential:
if (subject == "CONFIDENTIAL" or header("Sensitivity") == "Company-Confidential")
{
alt-mailhost ("encrypt.company.com");
}

Be aware that routing messages does just that—reroutes the entire message, including all recipients. So, a filter that tests for a specific recipient, and reroutes the message, like this, does so for all recipients:

redirect_faculty_staff:
if (rcpt-to == "@facstaff\.university\.edu") {
alt-mailhost ("factstaff-mail.university.edu");
}

This approach risks sending all recipients on a mult-recipient message to the same server. This is an artificial example—the right thing to do in this case is to use SMTPRoutes. If you’re going to do some sort of per-recipient routing, it’s best to stick with the features on the ESA that are designed for it: aliases, LDAP routing, and SMTPRoutes, or some combination.

Integration with External SMTP Systems

Routing with filters can be used to direct messages to other local systems using SMTP as transport. For example, you might need to deliver certain messages, or all messages, to an external server that scans attachments for regulatory compliance, or one that encrypts or archives a copy of email messages before they’re sent out.

For some external systems, like email archiving, all that’s needed is a copy of the message. If you don’t need to reroute the original message, but just save a copy of the original’s content, use the Send Copy (BCC) action or the Notify action with the option to include a copy of the original message.

Warning

The BCC action creates a new message (and new MID) with the original content and the BCC recipient as the sole recipient. For archiving purposes, this may not be suitable, because the list of original recipients will not be retained. Although the original To, From, and CC headers are preserved, any recipients that were BCC on the original are lost.

To retain the original recipient list, you might choose to add a header that lists all the recipients, like this:

if (true) {
add-header ("X-rcpt-to", "$EnvelopeRecipients");
bcc ("[email protected]");
}

But, that adds another problem: Anyone who receives the message can see all of the recipient addresses, defeating the “blind” of the BCC. There’s a simple solution: Delete the header:

if (true) {
   add-header ("X-rcpt-to", "$EnvelopeRecipients");
   bcc ("[email protected]");
   strip-header ("X-rcpt-to");
}

On the face of it, this seems like it shouldn’t work. But, it does, because of a subtle pipeline processing fact: The BCC action creates a new message that is not subject to any further processing. So, in this filter, the header is added to the message, and BCC creates a copy of that message, including the new header. The original continues through the actions, and the header is stripped. The BCC copy does not encounter the strip-header action; in fact, it skips all filters and scanning and heads straight to delivery.

In cases where this behavior is not desirable, there is the bcc-scan action.

For other systems, like encryption or large-message storage servers, the original message needs to be rerouted in its entirety. The external system will potentially modify the message and either deliver it directly or return it to the ESA for final delivery.

Cul-de-Sac Architecture

There are several approaches to the problem, depending on the specifics of your environment. The first approach is what I call a cul-de-sac model, because messages will be delivered from the ESA to the external system where they’re being processed, and the results delivered back to the ESA. This model is conceptually pictured in Figure 11-11.

Figure 11-11. Integration with an External SMTP Server, Such as Encryption or Archiving

The advantages of this approach are

• The ESA can do some prefiltering and selectively send messages to the external system, instead of all messages having to be sent. This decreases the volume of mail making a double-trip, increasing performance. This is especially important if the rerouting is based on sender or recipient and is limited to small groups.

• The ESA is built for performance and reliability, and won’t be impacted if messages cannot be delivered to the external system. Messages destined for the external system will queue until that system is available again, while other messages will be delivered straight through as normal. This limits the impact of any failure of the external system.

• When messages are routed out, and back through the ESA, the ESA continues to act as the final delivery host in your network. This fits in with the recommended last hop out placement of ESAs in the email architecture, allowing for consolidated delivery reports, consistent TLS usage, and the use of email authentication features, like DKIM signing.

• The ESA can deliver to multiple external systems for redundancy. This can even be done with filters, if you follow my suggestion on delivering to stand-in hostnames detailed later in the section, “Delivering to Multiple External Hosts.”

The problems with this cul-de-sac approach are

• If messages are delivered through the ESA a second time, there will be an additional set of log entries, reporting hits, and tracking entries, because the ESA treats it as a brand-new message.

• When messages are returned, they are for all intents and purposes new messages. The external system may modify the message slightly, or modify it completely, or possibly not even return it at all. Unless the external system has an absolutely predictable behavior, you cannot write filters that depend on message data like attachments, headers or header values, and so on. It’s best not to count on message data being preserved.

• You must create the logic to prevent mail loops. A single filter that redirects messages based on an attribute, like Subject, or a header value, risks a hit if the message is returned to the ESA with that attribute intact. The ESA discards messages that loop more than 100 times.

In the redirect_confidential filter example, using that filter by itself results in a mail loop if the external encryption server delivers its results back to the ESA, and the message retains the CONFIDENTIAL subject or the Sensitivity header.

To avoid a mail loop, use a filter that short-circuits the process, and it must be above the redirect in the filtering order:

short_circuit_encryption:
if (remote-ip == "10.1.17.23-29") {
skip-filters();
}

That creates a maintenance headache; you need to list the IPs for the encryption servers and maintain the filter if it changes. A better approach is to use the HAT to create a sender group for your encryption servers. If you name the sender group ENCRYPTION, you can use this filter to perform the short circuit:

short_circuit_encryption:
if (sendergroup == "ENCRYPTION") {
skip-filters();
}

With this filter, maintenance is a bit easier: Simply list the IP addresses of the encryption servers in the HAT sendergroup, and don’t touch the filter. The HAT is easier to see and maintain than message or content filters. Furthermore, having a separate HAT group and policy allows you to customize connection control, rate limits, and other SMTP parameters for email originating from your external servers.

The same issue exists with content filters, but the sendergroup condition is not available. You must use the remote IP rule.

Inline Architecture

Alternately, you can use an inline model, as shown in Figure 11-12. This model is simpler, because all mail passes through both layers. The ESA configuration is simpler. Messages traverse each layer only once, so message tracking and reporting don’t count duplicate data.

Figure 11-12. Inline Architecture for ESA with an External SMTP Server

The problems with the inline approach are

• Failure of either layer stops all mail, reducing effective mean time between failures (MTBF).

• Many email systems, such as DLP, archiving, and encryption, are not intended to handle the kinds of volume and throughput that ESA is designed. The performance of the entire environment is limited by the slowest component in the architecture, so performance will almost certainly be lower with an inline model.

• When the ESA receives all outgoing mail from the external system, you lose the visibility of internal hosts and the ability to create per-host HAT policies.

In general, inline models with ESA should place the ESA as the first hop of incoming mail, delivering messages to the external server after processing. This keeps the ESA security filters in their ideal configuration. For outgoing mail, the external servers should finish their processing and deliver all mail to the ESAs as the last hop of outgoing mail.

Delivering to Multiple External Hosts

When using filters to route messages, be aware of a limitation in the alt-mailhost action: You get only one. If you have multiple alt-mailhost actions, the last one in the filter takes precedence. If multiple filters are hit, each with an alt-mailhost action, only the last one takes effect and the other redirects are lost.

A cul-de-sac model with two external SMTP hosts—one for DLP scanning and another for encryption—looks like Figure 11-13.

Figure 11-13. Email Cul-de-Sac Architecture with Multiple External Hosts

To handle this, we just need to plan our logic carefully and avoid mail loops. Let’s assume that messages will traverse this architecture from right to left with ESA acting as the “traffic cop” directing the flow: groupware, ESA, DLP, ESA, encryption, and finally, ESA for delivery.

First, you need short circuits for the external servers. Let’s start with the last hop in the architecture, assuming that if the encryption server is sending it, it’s ready for delivery:

short_circuit_encr:
if (remote-ip == "10.1.17.23-29") {
skip-filters();
}

Second, you need a filter to handle mail that’s coming from the DLP servers. You need to short-circuit that process, too, but the message can’t go straight to Internet delivery because it must next go to the encryption server. This filter does both:

short_circuit_dlp:
if (remote-ip == "10.1.109.63-70") {
alt-mailhost ("encryption.yourco.com");
skip-filters();
}

The third and final step is to redirect everything else that’s outgoing. We don’t want to redirect our incoming mail, so we add an additional condition to test the sendergroup and only redirect outgoing relayed mail. We are not skipping any filters, assuming that the first pass through the ESA is the place you want to run any other filters:

outgoing_to_dlp:
if (sendergroup == "RELAYLIST") {
alt-mailhost ("dlp.yourco.com");
}

Warning

I don’t recommend that you use the SMTPRoutes default entry of ALL for this style of integration: The default route affects all mail, incoming and outgoing, and can’t be overridden by filter actions. See the discussion in the next section.

Another common reason for multiple external hosts is for redundancy or load balancing. If you have multiple instances of encryption software servers, you must either use a load balancer or account for multiple hosts in the ESA configuration. The problem here is that the alt-mailhost action can only have one destination. The solution to this problem is to combine alt-mailhost with the SMTPRoutes tables. First, we direct the message to a nonexistent hostname; you can use anything you like, as long as it looks like an FQDN:

redirect_confidential:
if (subject == "CONFIDENTIAL" or header("Sensitivity") == "Company-Confidential") {
alt-mailhost ("encryption.server");
}

Then, in the SMTPRoutes table, create a mapping of that host to the destination servers:

encryption.server: 10.1.17.23, 10.1.17.24, 10.1.17.25

In AsyncOS 7.1 and later, you can even apply a weight to individual destinations in SMTPRoutes, allowing ESA to failover or load-balance multiple hosts. SMTPRoutes are easier to maintain than a filter. Mapping alt-mailhost to an SMTPRoutes entry also avoids a problem where a default SMTPRoute (for the ALL route) can override the alt-mailhost action. For these reasons, I recommend that you always set your alt-mailhost action to dummy hostnames and use SMTPRoutes to map to the proper destination.

Interacting with Security Filters

Because content filters run after the anti-spam and antivirus engines, you can create rules that interact with the results of the engines. For the most part, the security engines give you the actions that you need: drop, deliver, and notify.

Messages dropped as spam or viruses will not be processed by the content filters engine; drop is an immediate action that discards the message. If you’re not dropping the messages, but instead are delivering to quarantine or to the end user, the messages continue through the pipeline, including filter processing. This may not be desirable: Body-scanning known spam messages adds unnecessary overhead and, if you’re scanning for terms that might appear in spam messages, you’ll end up with false positives.

For example, take a simple content filter that uses the condition “Body or Attachment Contains” and the pattern “Confidential” to send a notification about matching messages. You might apply this filter to inbound mail going to certain groups within your organization. Whatever the reason, applying this filter to known spam messages is counterproductive. Spammers will at least occasionally use the term Confidential in their messages.

Unfortunately, there’s no “is-spam” verdict test to bypass content filters automatically for spam messages. The same logic can be accomplished using the anti-spam engine’s advanced markup capabilities to add a header to the message and allowing us to short circuit it. The anti-spam actions for a policy include an advanced section that allows you to add a header to messages found to be spam or suspected to be spam. An example of this setting is shown in Figure 11-14.

Figure 11-14. Anti-Spam: Configuring a Header for Spam Positive Verdict

The header is then used in a simple content filter to short circuit all other processing. An example of this filter is shown in Figure 11-15. I’ve taken the step of removing the header from the message so that my trick goes unnoticed. Note that the name of the header and its value are arbitrary; you don’t even necessarily need to test the value, because the header presence alone will indicate the verdict.

Figure 11-15. Content Filter to Act on Spam Positive Messages

Note

The IronPort AntiSpam engine and other security filters add some headers to messages by default to indicate that the message was processed and to record the results in an encoded string. Because the string is encoded, you cannot use this header or its value to determine the verdict.

The antivirus engines also write their results into headers. You could possibly act on these headers if they indicate a viral message, but the question then is this: Why do you need to do further processing on viral messages? It’s best to drop them and forget it.

Reinjection of Messages

Reinjection refers to changing the delivery destination of the ESA to itself, so that as the message is delivered, the ESA is acting as both SMTP client and server.

Why would you want to reinject messages? There are a few cases where some group of tasks cannot be performed in a single pass and the messages need to be processed a second time. This is because the fixed ordering of the pipeline in the ESA means that most processing cannot be done in order on the same pass. These are usually complex situations; I don’t consider reinjection to be a general-purpose approach to any problem, but a solid and reliable solution when nothing else will suit.

In reinjection, the filters that you’re writing aren’t solving the issue directly, but are routing the messages so that other features can get to the solution.

Suppose that your organization runs two separate email domains, but you’re in the process of retiring the old domain and reassigning users to addresses in the new domain. It would be nice if [email protected] could just be domain-mapped to [email protected], but username overlaps prevent this. The ideal solution is to use LDAP routing to look up each user’s old and new address, and rewrite the addresses on the fly. This ideal solution allows the migration to proceed as needed.

Problem solved, right? Well, further suppose that at the end of the migration, there are several dozen email addresses at the old domain that haven’t been migrated. They don’t appear in the directory at all. Either the users have been lost in the shuffle, or these are addresses without real people behind them, and don’t have directory information. At some point, perhaps the directory will be updated to include them, but you’re under pressure to finalize the migration, and it’s decided to simply use aliases or domain-map to rewrite these addresses to addresses in the new domain. In effect, you’re adding a little bit of manual rewriting to the automated LDAP rewrites, but if the directory is ever updated with proper addresses, you want that to work the way it should.

The problem with this configuration on the ESA is twofold: First, aliases and domain-map are early stages in the pipeline, before LDAP or filters. Second, there is no means in filters of testing whether LDAP routing queries succeeded or not. You do not know which addresses are failing to lookup or not, other than the fact that they are still of the recipient form @olddomain.com.

This is a case where reinjection can help. (I did say that it only comes up in complex situations.)

The solution that we implement here is simple to describe: Any recipients still being sent to @olddomain.com, because they were not found in LDAP, are reinjected into the ESA so that the domain-map can transform the addresses on a second pass. To ensure that domain-map does not affect any other messages, we create another listener on another port to handle this. This new listener, named Reinject, has a simple configuration with no LDAP and only the domain-map entries we need.

The filters we need are just for doing the reinjection and for short-circuiting the process. First, the short-circuit filter, assumes that anything that comes through the new listener is now on its second pass and should go directly to delivery:

short_circuit_reinject:
if (recv-listener == "Reinject") {
skip-filters();
}

The other filter actually performs the reinjection:

reinject_old_domain:
if (rcpt-to == "@olddomain\.com") {
alt-mailhost ("esa02.cisco.com:2525");

}

Astute observers will realize that the multirecipient problem is going to rear its head here, but it’s not much to worry about. The other recipients will be reinjected even if they don’t need the second pass, but that’s not going to cause any harm. The domain-map will not match on them, and the LDAP query won’t even be performed. Other than a tiny bit more overhead, there’s no harm in reinjecting the other recipients.

The real danger of a reinjection configuration is message loops, where reinjection results in messages being delivered back to the ESA ad infinitum. To avoid loops, the ESA will drop messages that have more than 100 hops, as indicated by the count of Received headers in the message. This drop is permanent and final, and you need filter logic to avoid it. The short_circuit_reinject filter, which must be earlier in the process than the reinjection, accomplishes this.

Summary

Message and content filters are the most versatile feature of the ESA, able to act on almost any part of messages. This chapter covered all the available conditions and actions and demonstrated some of those with real examples. Filters bear a resemblance to programming scripts in that they use branch logic for decision making, but there are important differences between filters and a scripting language.

Understanding the difference between message and content filters is critical for administering an ESA environment. Although many organizations will only have a need for content filtering through the WUI, knowledge of the CLI message filters is important, because it can solve numerous issues that can’t be addressed anywhere else in the product.

In the next chapter, we revisit some of the networking topics we only briefly touched on in early chapters, and then dive into more advanced network configurations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 11. Message and Content Filters

Create new playlist

Sign In

Sign Up

Chapter 11. Message and Content Filters

Filtering Email Messages with Custom Rules

Message Filters Versus Content Filters

Processing Order

Enabling Filters

Combinatorial Logic

Scope of Message Filters

Handling Multirecipient Messages

Availability of Conditions and Actions

Filter Conditions

Conditions That Test Message Data

Operating on Message Metadata

Attachment Conditions

System State Conditions

Miscellaneous Filter Conditions

Filter Actions

Changing Message Data

Altering Message Body

Affecting Message Delivery

Altering Message Processing

Miscellaneous Filter Actions

Action Variables

Regular Expressions in Filters

Dictionaries

Notification Templates

Smart Identifiers

Using Smart Identifiers

Smart Identifier Best Practices

Content Filter and Mail Policy Interaction

Filter Performance Considerations

Improving Filter Performance

Filter Recipes

Dropping Messages

Basic Message Attribute Filters

Body and Attachment Scanning

Complex Combinatorial Logic with Content Filters

Routing Messages Using Filters

Integration with External SMTP Systems

Cul-de-Sac Architecture

Inline Architecture

Delivering to Multiple External Hosts

Interacting with Security Filters

Reinjection of Messages

Summary

Table of Contents for
Chapter 11. Message and Content Filters