Objectives
How to use SpamAssassin to identify spam email using a set of scoring rules
How to create and modify SpamAssassin rules
How to hack MIMEDefang to classify spam depending upon its spam score
How to use Procmail to sort spam and other emails into mail folders
Introduction
Email is a powerful and useful tool and – despite other, newer types of communication such as texts and social media – email retains a prime place in the communications strategies of most organizations and individuals. Email is not the oldest form of digital communication, having been preceded by tools such as the telegraph and teletype, but it has been around since the early years of Unix. Email is a well-defined tool and is available not just on computers but on nearly every connected device such including mobile phones and tablets.
Email is also used as a tool by spammers and the distributors of malware. Spammers use email to defraud recipients of huge amounts of money each year, to steal IDs, and to peddle knock-off or nonexistent wares. Some crackers send emails with attached malware that they try to entice users into installing to their own detriment.
All of this criminal and disruptive spam requires some method for dealing with it. I use three open source programs to do this and it has reduced my need to read or even to glance at offensive or undesirable material in order to identify and classify incoming email.
The problem
I like to sort incoming email into a couple folders besides the inbox. Spam is always filed into the spam folder, and I leave it there for a couple days so I can look at it later in case someone sends something that I want to receive but that got marked as spam because I have not whitelisted them. Some of the incoming ham (good) email from a couple other sources is also sorted into other folders. The rest does get filed into the inbox by default.
So a quick word about terminology before going any further. Sorting is the process of classifying email and storing it in an appropriate folder. Filters like SpamAssassin classify the email. MIMEDefang uses that classification to mark it as spam by adding a text string to the subject line. That classification allows other software to file the email into the designated folders. It is this last bit of software that I was looking for – the one that does the filing.
I had several email filters set up in Thunderbird, my client of choice and the best GUI client I have found for my personal needs. I also had set up some email filters for my wife on her computer. When we travel, or use our handheld devices, those filters would not always work because Thunderbird – or any other email client with filters – must be running in order to perform their filtering tasks. If I have my laptop with me, I can set that up to do the filtering, but that means I have to maintain multiple sets of filters.
I also ran into a technical problem that I wanted to fix. Client-side email filtering relies on scanning messages after they are deposited in the inbox. For some unknown reason, this has resulted in situations where the client does not always delete (expunge) the moved messages from the Inbox. This may be an issue with Thunderbird or it may be a problem with my configuration of Thunderbird. I have worked on this problem for years with no success, even through multiple complete reinstallations of Fedora and Thunderbird.
I have my own email server and Spam is a major problem for me. I have several email addresses I use, some of which I have had for a couple decades so they have become major spam magnets. In fact, I get at a minimum of 300 spam emails per day. The record was just over 2,500 spam emails in a single day. I currently get between 800 and 1,200 spam emails per day, and the numbers keep increasing.
So I needed a method for filing emails, that is, sorting it into appropriate folders, that is server-based rather than client-based. This would solve a number of issues. I would no longer need to leave an email client running on my home workstation just to perform filtering. It would prevent the need to delete or expunge messages – especially the spam – from our inboxes. And it would require filter configuration in just one location, the server.
But why?
By now, after two full chapters on email and just starting another one, you are probably asking yourself, “Why do I want to put myself through all of this aggravation just to have an email server? Why not just use Gmail or the email service provided by my ISP?” This is an excellent question because I ask it of myself on occasion.
When I decided that I wanted to become a Unix and Linux SysAdmin, I understood that I needed to learn about all aspects of system administration. I needed to deal with clients, but especially with servers of all types. Despite the fact that it takes a lot of work to set up, configure, and maintain a series of servers, like the ones we cover in this volume of Using and Administering Linux – Zero to SysAdmin, I learn best with hands-on. Working with these servers and the clients that use them on a daily basis enabled me to learn so much more than I would have otherwise.
I believe that most of us who are truly well-suited for the role of SysAdmin are the same way. Not everyone but many of us.
But even if we learn best in other ways, we always need a laboratory in which to perform our experiments and to learn to use and support hardware and software of all kinds. I have learned enough by doing this in my own home network that I have landed some amazing jobs and currently write prolifically about Linux.
Oh – and because it is fun!1
My email server
Having grown up with SendMail as the de facto email server in more than one of my jobs, I started using it for my own email server as soon as I switched permanently from OS/2 to Red Hat Linux 5 in about 1997. I have used it as my mail transfer agent (MTA) since then for both business and personal use.
Note
I am not sure why Wikipedia refers to SendMail as a “message” transfer agent. All my other references use “mail” transfer agent. The Talk tab of the Wikipedia page has a bit of discussion about this which generated even more confusion for me.
I was already using SpamAssassin and MIMEDefang together to score and mark incoming emails as spam, placing a known string in the subject, “###SPAM###”, so that I can identify and sort spam both as a human and with software. I use UW IMAP for client access to emails, but that is not a factor in server-side filtering and sorting.
Yes, I use a lot of old-school software for the server side of email, but it is well known, well documented, it works well, and I understand how to make it do the things I need it to do. Understanding this old but still extensively used software is the key to understanding many of the more recent incarnations of email software. This software enables us to understand the protocols and requirements for any software that is used to perform these tasks. Current versions of Fedora provide all of these tools as packages in their standard repositories.
Project requirements
- 1.
Sort incoming spam emails into the spam folder on the server side using the identifying text that is already being added to the subject line by MIMEDefang.
- 2.
Sort other incoming emails into designated folders.
- 3.
Circumvent problems with moved messages not being deleted or expunged from the Inbox.
- 4.
Keep the SpamAssassin and MIMEDefang software that I was already using.
- 5.
Any new software would have to be easy to install and configure.
This set of objectives meant that I would therefore need to be using a sorting program that would integrate well with the parts I already have.
Procmail
After extensive research, I settled on the venerable Procmail.2 I know – more old stuff – and allegedly unsupported these days, too. But it does what I need it to do and is known to work well with the software I am already using. It is stable and has no known serious bugs. It can be configured for use at the system level as well as at the individual user level.
Red Hat and RH-based distributions such as CentOS and Fedora use Procmail as the default mail delivery agent (MDA) for SendMail, so it does not even need to be installed because it is already there. The MDA delivers email to users’ mailboxes on the local host so it can also be known as the LDA or Local Delivery Agent.
My email server runs Fedora, so this is a real no-brainer. I will use Procmail. Besides, Red Hat is now supporting Procmail no matter what else you might read on the Internet, and several recent patches have been included in the most recent version. We can check the change log for Procmail to verify this.
Experiment 9-1
I have shortened the output data, but you can see that there are several issues that have been fixed since 2017 including one security patch.
The results of Experiment 9-1 also show that we should not always believe everything we read on the Internet, including Wikipedia. We should also explore the sources of statements we encounter online and look for ourselves at the original data – in this case the Procmail RPM package.
In addition to delivering email, Procmail can be used to filter and sort it. Procmail rules – known as recipes – can be used to identify spam and delete or sort it into a designated mail folder. Other recipes can identify and sort other mail as well such as sorting emails from specific email accounts or organizations into particular folders. Procmail can be used for many other things besides sorting email into designated folders, such as automated forwarding, duplication, and much more. In this chapter, we will confine our use of it to identifying spam and sorting it into the Spam folder.
How it works
A complete discussion of the configuration SpamAssassin, MIMEDefang, and Procmail is beyond the scope of this chapter, in part because there are so many ways of implementing anti-spam solutions using these three programs. This chapter will be limited to the configuration I used to integrate these three packages to implement my own solution.
Processing of incoming email begins with SendMail. SendMail calls MIMEDefang as part of the normal email processing. MIMEDefang uses SpamAssassin as a subroutine. MIMEDefang sends email to SpamAssassin and receives the spam score as a return code.
SpamAssassin uses its default set of rules and scores, as well as any located in the local.cf file, to evaluate each email and generate a total score. We can modify the scores for existing rules, add your own rules, and create white- and blacklists that can assist you in adapting the rules and scoring to the needs of your own installation. The /etc/mail/spamassassin/local.cf file is used for all of this and it can grow quite large; mine is just over 70KB at this writing and still growing.
It is important to understand that when SpamAssassin scans an email, it checks every rule, both its default rules and local rule sets that are created and maintained by the SysAdmin or email administrator. For each rule that matches, the score defined for that rule is added to the total score for that email. This is not a “one and done” type of scan; the email is checked against every rule.
SpamAssassin can be run as standalone software in some applications, however, in this environment, SpamAssassin is not run as a daemon, it is called by MIMEDefang. After the spam score for the email is returned to it, MIMEDefang calls the /etc/email/mimedefang-filter program which can perform any of several actions on the email. This program can add headers to the email, modify the subject, or just discard the email.
MIMEDefang is programmed in Perl, so it is easy to hack. I have hacked the last major portion of the code in /etc/mail/mimedefang-filter to provide a filtering breakdown with a little more granularity than it does by default. This code adds specified text to the subject line of the emails as a means to identify how likely this particular email is to be spam.
Preparation
Although I had already installed MIMEDefang and SpamAssassin on my email server prior to using Procmail for email sorting, our server, StudentVM2, does not have those tools installed. So we need to install them.
Experiment 9-2
Note that despite the fact that Perl is already installed on our VMs, this command results in the installation of about 30 additional Perl packages that are required for MIMEDefang.
Verify that there are now some mimedefang* files and a spamassassin directory in /etc/mail.
Configuration
We need to configure MIMEDefang in order to have it add the text we need to the subject line and we also need to set up some SpamAssassin rules that we can intentionally trigger with our test emails so that they will be marked as spam. We also need to configure SendMail to call MIMEDefang as part of its normal mail processing tasks.
Configuring SendMail
SendMail must call MIMEDefang in order to start the spam-filtering process. It does this by calling the MIMEDefang mail filter. The term “mail filter” is generally shortened to “milter.”
We enable the MIMEDefang mail filter by inserting one line into our sendmail.cf configuration file.
Experiment 9-3
Test to verify that we have not broken anything. Use tail -f to follow the maillog file on StudentVM2 and send an email from the student user StudentVM1 to the [email protected] account and your external email account. Ensure that there are no errors in the maillog file and that the email is delivered to the addressees.
Refer to Chapter 8 of the SpamAssassin3 book for more information about using MIMEDefang with SendMail and SpamAssassin.
Hacking mimedefang-filter
Let’s hack mimedefang-filter and have it add the text “####SPAM####” to emails with high enough spam scores to be considered spam. This is easy even if you don’t know Perl4 because I will show you exactly what to do.
Experiment 9-4
Perform this experiment as the root user on StudentVM2. In a later experiment in this chapter, we are going to modify one line of the mimedefang-filter Perl program but make a backup copy first and then examine the code we want to change.
The first two non-comment lines begin with “my” and are used to create local copies of certain variables that may be used in this code segment. The $hits variable is a numeric value that represents the spam score of the email.
The first if-else structure uses the Perl “x” operator to create a string that consists of a number of asterisks (*) equal to the integer number of the spam score. For example, a spam score of 7 would result in a string of 7 asterisks “*******” which results in a bar graph of the spam score. The first part of this if statement does this so long as the value of the $hits variable is less than 40. The “else” part of the logic simply creates a string of 40 asterisks if the value of $hits is 40 or more.
The line md_graphdefang_log('spam', $hits, $RelayAddr); adds an entry to a log file in the /var/log directory if we uncomment a line earlier in this file.
The final statement in this “if” section appends a SpamAssassin report to the email as an inline attachment. I find this report makes it easy to do problem determination when issues arise with SpamAssassin and its scoring.
Over years of working with MIMEDefang and SpamAssassin, I have decided that I do not like the default actions taken to mark this as spam. The bar graph is not visible to the end user, and although it could be used by Procmail to determine how to sort the spam, I wanted something in the subject line where the recipient could see it and decide what to do with the message. I created a set of actions that works better for me and enable me to see spam info more quickly.
This revised code adds the X-Spam-Status header, prepends the “####SPAM####” string and the number of hits to the subject line, and attaches the SpamAssassin report to the end of the email message. It also does this for non-spam emails except that the message prepended to the subject is a bit different and says “####NOT SPAM####.” We do it this way in this experiment so that we can see that our spam detector is working even if the emails are not spam.
In a real-world environment, I do add the X-Spam-Status line to the headers on non-spam messages (ham), but I do not normally add anything to the subject line or append the SpamAssassin report to the message.
Note that this revision of the code does not delete existing headers.
Many users tend to freak out when they see that SpamAssassin report and the subject line with “####SPAM####” in it. As a result, I only add the report when I am trying to determine the source of a problem, such as a rule that is not working. The report allows me to easily see what is in the headers, but includes more information such as the exact score added by each rule. Also, if a user forwards an email to me, the report stays attached to the email but the original headers are deleted so they would be useless at that point.
Open the email as the student user on StudentVM2 in the mailx client. Examine the email and view the added headers and the attached SpamAssassin report.
We can see that our anti-spam configuration is working as it should.
The subject line of the now contains the string “####SPAM####” or “####NOT SPAM####” but without the quotes, and the spam score, that is, the variable $hits. Having a known string in the subject line of spam makes further filtering easy.
The modified email is returned to SendMail for further processing.
Setting up a mail folder
We have not yet set up a folder on the server to contain the email folders like Spam and others we might want to create for the student user. We want to store all of our email folders in a subdirectory of the account’s home directory to prevent our email client from accessing other files and folders in the account’s home directory.
Whether a login or non-login account, the email client will see all files in the home directory as a possible email folder. We want to prevent that so we create a folder called “Mail” to use as the main email location in which the folders created by the email client, such as Thunderbird, are located.
Experiment 9-5
Start this experiment as the root user on StudentVM2. We do this because most email accounts will be no-login accounts so root normally does this.
Ensure that the home directory for the student user is the PWD. Create a directory, /home/student/Mail, and set the user and group ownership both to student.
Move the Sent and Trash files into the Mail directory.
Now, as the student user on StudentVM1, open the Thunderbird Edit ➤ Account Settings, and select Server Settings. Click the Advanced button to open the Advanced Account Settings dialog. In the field for IMAP server directory, type Mail. Click OK to close the dialog and OK to close the Account Settings.
On the main Thunderbird window, right-click the [email protected] account – the top-level folder where it says [email protected] and not any of the sub-folders – and choose New Folder. Type “Spam” (without the quotes) in the Name field and click the Create Folder button. The new folder should appear in the list of folders along with Sent and Trash.
Configuring Procmail
The last thing that SendMail does is call Procmail to act as the MDA. Procmail then checks the home directory of the user to which the email is addressed for the existence of a ~/.procmailrc file. If one does not exist, Procmail deposits the email into the user’s inbox in /var/spool/mail. What happens when the ~/.procmailrc file does exist is the topic of this section.
What we need Procmail to do is to use the text now added to the subject line to look at the email before it gets placed in the Inbox and to route it to a different folder which we will call, naturally enough, “Spam.”
Procmail uses global and user-level configuration files. The global /etc/procmailrc file and individual user ~/.procmailrc files must be created. The structure of the files is the same, but the global file operates on all incoming email while the local files can be configured for each individual user. I do not use a global file so all of the sorting is done on the user level. My .procmailrc file is shown in Experiment 9-6 and is simple.
Note that the ~/.procmailrc file must be located in the home directory of the email account on the email server. It does not go in the home directory on individual client workstations. Because most email accounts are not login accounts, they use the nologin program as the default shell. Therefore, the admin will need to create and maintain these files. The other option is to change to a login shell such as BASH and set passwords so that knowledgeable users can login to their email accounts on the server and maintain their ~/.procmailrc files.
Each recipe starts with :0 (yes, that is a zero) on the first line and contains a total of three lines. The second line starts with * and contains a conditional statement consisting of a regular expression (regex) that Procmail compares to each line in the incoming email. If there is a match, Procmail sorts the email into the folder specified by the third line. The use of the ^ symbol denotes the beginning of the line when making the comparison.
Experiment 9-6
Perform this experiment as the root user on StudentVM2. We are going to create a .procmailrc file for the student user.
The first recipe in my .procmailrc file sorts the spam identified in the subject line by MIMEDefang into my spam folder. Procmail ignores case, so there is no need to create recipes that look for various combinations of upper- and lowercase. The second and last recipe sorts all email that does not match another recipe into the default folder, usually the Inbox.
Ensure that both of these new files have ownership of student.student. It is not necessary to restart either SendMail or MIMEDefang when creating or modifying the Procmail configuration files.
The problem here is that, as I did on my own mail server, I missed one step. It is easy to miss because I found the true answer in only one place.
We need to add a symbolic link in the /etc/smrsh directory. Smrsh stands for “SendMail restricted shell,” which is a reasonably Secure Shell in which SendMail can run scripts and which will help prevent crackers from exploiting SendMail for their own purposes.
Now perform the test again by sending the spam and non-spam email messages again. Now they should go into the correct folders.
We could have created both of these files as the student user on StudentVM2 but, in most environments, regular users will not have login access to the server.
Reports of Procmail’s demise
Having done many Internet searches while researching this chapter, I found a number of results dating from 2001 through about 2013 that declare Procmail to be dead. They point for evidence at the no longer working web pages, missing source code, and a short article on Wikipedia that does no more than declare Procmail to be dead and provides links to more recent replacements.
However, all Red Hat, Fedora, and CentOS distributions install Procmail as the MDA for SendMail. The Red Hat, Fedora, and CentOS repositories all have the source RPMs for Procmail, and the source code is also on GITHub. Red Hat documentation for CentOS contains some decent documentation for Procmail.5
Considering the continued use of Procmail by Red Hat, I have no problem with using this mature software that does its job silently and without fanfare.
Creating SpamAssassin rules
Now that we have a working solution, what happens when we start getting spam that does not match any rules or for which the matched rules do not add up to a high enough score to make the cut as spam? We can adjust the default scores and write new rules using the /etc/mail/spamassassin/local.cf file.
The files located in /usr/share/spamassassin that begin with two-digit numbers are configuration files that define rules for specific types of spam. When SpamAssassin matches a rule in one of these files, it then searches for a score in the 72_scores.cf file. These files should not be altered.
There are also two files in /usr/share/spamassassin that we can use as templates or starting points for local configuration. These files make it easy for us to configure SpamAssassin by adding rules and changing scores so that we don’t need to change the default configuration files. The default files can be replaced during an update and cause our changes to them to be overwritten.
The local.cf file, of which there is already a copy in /etc/mail/spamassassin, is used to create local rules, alter the scores of existing default rules, and set whitelist and blacklist entries.
The user_prefs.template can be used by individual users to override the default preferences. This file would need to be copied to the user’s home directory and renamed to user_prefs. For example, a user might wish to specify a higher required_score to ensure that some emails with somewhat higher spam scores than the default of 5 be allowed through as ham. This would also be the file in which users would add whitelist and blacklist entries, create their own rules, and change scores. In most modern installations, end users will not be knowledgeable enough, or not have login access to the email server, to perform these tasks so it would fall to the SysAdmin to make those changes for them.
Before we make any changes, we need to look at the default rule set which should never be changed in any way.
Experiment 9-7
In another root terminal session, make /usr/share/spamassassin the PWD. List the files in that directory. The files you see there are used for local configuration or, as in the case of the files that begin with “V”, are version-specific configuration. We need only concern ourselves with the local.cf file to specify our local configuration changes.
As the student user on StudentVM1, open Thunderbird, if it is not already, and look in the Spam folder for the new email. Select the spam email that was just received and scroll down to the SpamAssassin attachment. You will see that this email matched the GTUBE6 rule which gave the email a score of 1000 which is high enough that even the best non-spam rules, such as ALL_TRUSTED and many more, could not overcome this score to make it look like non-spam.
I wrote a little script to do this on my own mail server and you can too, if you like. I experiment with this a lot and make changes to the local.cf so a script with a short name can save a lot of typing. At any rate, here are the commands you need. It does not matter in which order you stop the services.
Be sure to check the log file messages and then look at the email using Thunderbird on StudentVM1. If necessary, scroll down and look at the SpamAssassin report which shows the scores. The score for GTUBE should be 600.
Now let’s add a new rule to local.cf. It takes three lines to create a new rule. The first line defines the location of the search such as the header or body of the message and a Perl REGEX to define a specific line such as the subject and the pattern to be matched. Each line also contains an identifier that is typically in uppercase.
The second line is a description of the rule that will be printed in the SpamAssassin report. And the third line contains the score that is applied to the message when the rule is matched. I also add a comment as a bit of explanation and a separator to make a long list of rules easier to read. And I have a very long list.
I get a lot of spam email that has something about “back taxes” in the body of the message. It can take a long time to scan the body of a message especially if the body is large so I try to have as few rules that scan the body as possible.
The regular expression /back taxes/i looks for the text “back taxes” and the trailing “i” tells Perl to ignore the case so that any combination of upper- and lowercase will match.
Send an email that does not contain “back taxes” in the body to verify that this rule would not match. Also, send some emails with various upper- and lowercase combinations of “back taxes” to ensure that they do match.
Restart the SendMail and MIMEDefang services in the proper sequence and test this new rule.
We want to ensure that email from certain domains such as example.com, both.org, and opensource.com are allowed through regardless of their other spam scores. We can also blacklist a domain like spammer.com.
The * character is a metacharacter that matches all characters to the left of the @ sign in the email address. Entries using this metacharacter as we have here match all email accounts from the specified domain. I have specified a whitelist for only my own email from my personal both.org domain. Other accounts from both.org will not be whitelisted. That does not mean that they will be automatically considered spam because they would still need a score of at least +5.
Additional resources
There are few really good resources for someone who needs to create an email system from nothing. My intent in the chapters about and pertaining to email in this course was to at least partially fill that gap. Chapters 7, 8, and 9 of this volume of the course provide enough information to get started with a reasonably well-constructed email server that can grow to absorb the workloads of a small to medium-sized organization.
As part of my research for this course – these chapters dealing with email, spam, and malware in particular – I discovered the book Pro Open Source Mail – Building an Enterprise Mail Solution7 by Curtis Smith. That book is the one I wish I had when I first started my own email server. In many ways, Smith takes the same path as I did and ends up with most of the same software. The only significant difference is his choice of Dovecot as his IMAP server whereas we use UW-IMAP. The author of that book also goes into much more detail than I have in this course. I highly recommend Pro Open Source Mail despite the fact that it is somewhat older because it presents a complete, integrated solution rather than just one part as do most books.
Chapter summary
Although we could have used Procmail by itself for spam filtering and sorting, I think SpamAssassin does a better job of scoring because it does not rely on a single rule to match, but rather the aggregate score from all of the rules, as well as scores from Bayesian filtering.
Procmail works very well when matches can be made very explicit with known strings such as the ones that I have configured MIMEDefang to place in the subject line. I think Procmail works better as a final sorting stage in the spam-filtering process than as a complete solution all by itself. Of course, I know that many admins have made complete spam-filtering solutions using nothing more than Procmail.
Now that I have server-side filtering, I am somewhat less limited in my choice of email clients because I no longer need a client that performs filtering and sorting. Nor do I have a need to leave an email client running all the time to perform that filtering and sorting.
Exercises
- 1.
Add and test a SpamAssassin rule that adds two points when it matches the text “free money” in the subject line. Name the rule FREE_MONEY_1. Send an email to [email protected] from StudentVM1 that contains that phrase. View the SpamAssassin report to verify that the new rule is working.
- 2.
Use the Thunderbird email client to add a new folder and name it FreeMoney. Then add a new rule to the Procmail file that matches any emails with the string FREE_MONEY_1 in the X-Spam-Status header and sorts them into the new folder. Test.
- 3.
Locate the file in which the default scores for whitelists and blacklists are stored. What rule name is used when a user account is whitelisted?
- 4.
What is the score added to a user that is whitelisted?
- 5.
Why must MIMEDefang be started (or restarted) before SendMail?