© David Both 2020
D. BothUsing and Administering Linux: Volume 3https://doi.org/10.1007/978-1-4842-5485-1_9

9. Combating Spam

David Both1 
(1)
Raleigh, NC, USA
 

Objectives

In this chapter, you will learn
  • How to use SpamAssassin to identify spam email using a set of scoring rules

  • How to create and modify SpamAssassin rules

  • How to hack MIMEDefang to classify spam depending upon its spam score

  • How to use Procmail to sort spam and other emails into mail folders

Introduction

Email is a powerful and useful tool and – despite other, newer types of communication such as texts and social media – email retains a prime place in the communications strategies of most organizations and individuals. Email is not the oldest form of digital communication, having been preceded by tools such as the telegraph and teletype, but it has been around since the early years of Unix. Email is a well-defined tool and is available not just on computers but on nearly every connected device such including mobile phones and tablets.

Email is also used as a tool by spammers and the distributors of malware. Spammers use email to defraud recipients of huge amounts of money each year, to steal IDs, and to peddle knock-off or nonexistent wares. Some crackers send emails with attached malware that they try to entice users into installing to their own detriment.

All of this criminal and disruptive spam requires some method for dealing with it. I use three open source programs to do this and it has reduced my need to read or even to glance at offensive or undesirable material in order to identify and classify incoming email.

The problem

I like to sort incoming email into a couple folders besides the inbox. Spam is always filed into the spam folder, and I leave it there for a couple days so I can look at it later in case someone sends something that I want to receive but that got marked as spam because I have not whitelisted them. Some of the incoming ham (good) email from a couple other sources is also sorted into other folders. The rest does get filed into the inbox by default.

So a quick word about terminology before going any further. Sorting is the process of classifying email and storing it in an appropriate folder. Filters like SpamAssassin classify the email. MIMEDefang uses that classification to mark it as spam by adding a text string to the subject line. That classification allows other software to file the email into the designated folders. It is this last bit of software that I was looking for – the one that does the filing.

I had several email filters set up in Thunderbird, my client of choice and the best GUI client I have found for my personal needs. I also had set up some email filters for my wife on her computer. When we travel, or use our handheld devices, those filters would not always work because Thunderbird – or any other email client with filters – must be running in order to perform their filtering tasks. If I have my laptop with me, I can set that up to do the filtering, but that means I have to maintain multiple sets of filters.

I also ran into a technical problem that I wanted to fix. Client-side email filtering relies on scanning messages after they are deposited in the inbox. For some unknown reason, this has resulted in situations where the client does not always delete (expunge) the moved messages from the Inbox. This may be an issue with Thunderbird or it may be a problem with my configuration of Thunderbird. I have worked on this problem for years with no success, even through multiple complete reinstallations of Fedora and Thunderbird.

I have my own email server and Spam is a major problem for me. I have several email addresses I use, some of which I have had for a couple decades so they have become major spam magnets. In fact, I get at a minimum of 300 spam emails per day. The record was just over 2,500 spam emails in a single day. I currently get between 800 and 1,200 spam emails per day, and the numbers keep increasing.

So I needed a method for filing emails, that is, sorting it into appropriate folders, that is server-based rather than client-based. This would solve a number of issues. I would no longer need to leave an email client running on my home workstation just to perform filtering. It would prevent the need to delete or expunge messages – especially the spam – from our inboxes. And it would require filter configuration in just one location, the server.

But why?

By now, after two full chapters on email and just starting another one, you are probably asking yourself, “Why do I want to put myself through all of this aggravation just to have an email server? Why not just use Gmail or the email service provided by my ISP?” This is an excellent question because I ask it of myself on occasion.

When I decided that I wanted to become a Unix and Linux SysAdmin, I understood that I needed to learn about all aspects of system administration. I needed to deal with clients, but especially with servers of all types. Despite the fact that it takes a lot of work to set up, configure, and maintain a series of servers, like the ones we cover in this volume of Using and Administering Linux – Zero to SysAdmin, I learn best with hands-on. Working with these servers and the clients that use them on a daily basis enabled me to learn so much more than I would have otherwise.

I believe that most of us who are truly well-suited for the role of SysAdmin are the same way. Not everyone but many of us.

But even if we learn best in other ways, we always need a laboratory in which to perform our experiments and to learn to use and support hardware and software of all kinds. I have learned enough by doing this in my own home network that I have landed some amazing jobs and currently write prolifically about Linux.

Oh – and because it is fun!1

My email server

Having grown up with SendMail as the de facto email server in more than one of my jobs, I started using it for my own email server as soon as I switched permanently from OS/2 to Red Hat Linux 5 in about 1997. I have used it as my mail transfer agent (MTA) since then for both business and personal use.

Note

I am not sure why Wikipedia refers to SendMail as a “message” transfer agent. All my other references use “mail” transfer agent. The Talk tab of the Wikipedia page has a bit of discussion about this which generated even more confusion for me.

I was already using SpamAssassin and MIMEDefang together to score and mark incoming emails as spam, placing a known string in the subject, “###SPAM###”, so that I can identify and sort spam both as a human and with software. I use UW IMAP for client access to emails, but that is not a factor in server-side filtering and sorting.

Yes, I use a lot of old-school software for the server side of email, but it is well known, well documented, it works well, and I understand how to make it do the things I need it to do. Understanding this old but still extensively used software is the key to understanding many of the more recent incarnations of email software. This software enables us to understand the protocols and requirements for any software that is used to perform these tasks. Current versions of Fedora provide all of these tools as packages in their standard repositories.

Project requirements

Having a well-defined set of requirements before starting a project is imperative, so based on the description of the problem, I created five simple requirements for this project.
  1. 1.

    Sort incoming spam emails into the spam folder on the server side using the identifying text that is already being added to the subject line by MIMEDefang.

     
  2. 2.

    Sort other incoming emails into designated folders.

     
  3. 3.

    Circumvent problems with moved messages not being deleted or expunged from the Inbox.

     
  4. 4.

    Keep the SpamAssassin and MIMEDefang software that I was already using.

     
  5. 5.

    Any new software would have to be easy to install and configure.

     

This set of objectives meant that I would therefore need to be using a sorting program that would integrate well with the parts I already have.

Procmail

After extensive research, I settled on the venerable Procmail.2 I know – more old stuff – and allegedly unsupported these days, too. But it does what I need it to do and is known to work well with the software I am already using. It is stable and has no known serious bugs. It can be configured for use at the system level as well as at the individual user level.

Red Hat and RH-based distributions such as CentOS and Fedora use Procmail as the default mail delivery agent (MDA) for SendMail, so it does not even need to be installed because it is already there. The MDA delivers email to users’ mailboxes on the local host so it can also be known as the LDA or Local Delivery Agent.

My email server runs Fedora, so this is a real no-brainer. I will use Procmail. Besides, Red Hat is now supporting Procmail no matter what else you might read on the Internet, and several recent patches have been included in the most recent version. We can check the change log for Procmail to verify this.

Experiment 9-1

Perform this experiment as the root user on StudentVM2.
[root@yorktown mail]# dnf list procmail
Last metadata expiration check: 0:05:37 ago on Sun 07 Jul 2019 09:39:50 PM EDT.
Installed Packages
procmail.x86_64                   3.22-50.fc30                   @fedora
[root@yorktown mail]# rpm -q --changelog procmail | less
* Sat Feb 02 2019 Fedora Release Engineering <[email protected]> - 3.22-50
- Rebuilt for https://fedoraproject.org/wiki/Fedora_30_Mass_Rebuild
* Thu Dec 06 2018 Jaroslav Škarvada <[email protected]> - 3.22-49
- Fixed issues found by Coverity Scan
* Fri Jul 20 2018 Jaroslav Škarvada <[email protected]> - 3.22-48
- Fixed FTBFS by adding gcc requirement
  Resolves: rhbz#1606850
<snip>

I have shortened the output data, but you can see that there are several issues that have been fixed since 2017 including one security patch.

The results of Experiment 9-1 also show that we should not always believe everything we read on the Internet, including Wikipedia. We should also explore the sources of statements we encounter online and look for ourselves at the original data – in this case the Procmail RPM package.

In addition to delivering email, Procmail can be used to filter and sort it. Procmail rules – known as recipes – can be used to identify spam and delete or sort it into a designated mail folder. Other recipes can identify and sort other mail as well such as sorting emails from specific email accounts or organizations into particular folders. Procmail can be used for many other things besides sorting email into designated folders, such as automated forwarding, duplication, and much more. In this chapter, we will confine our use of it to identifying spam and sorting it into the Spam folder.

How it works

A complete discussion of the configuration SpamAssassin, MIMEDefang, and Procmail is beyond the scope of this chapter, in part because there are so many ways of implementing anti-spam solutions using these three programs. This chapter will be limited to the configuration I used to integrate these three packages to implement my own solution.

Processing of incoming email begins with SendMail. SendMail calls MIMEDefang as part of the normal email processing. MIMEDefang uses SpamAssassin as a subroutine. MIMEDefang sends email to SpamAssassin and receives the spam score as a return code.

SpamAssassin uses its default set of rules and scores, as well as any located in the local.cf file, to evaluate each email and generate a total score. We can modify the scores for existing rules, add your own rules, and create white- and blacklists that can assist you in adapting the rules and scoring to the needs of your own installation. The /etc/mail/spamassassin/local.cf file is used for all of this and it can grow quite large; mine is just over 70KB at this writing and still growing.

It is important to understand that when SpamAssassin scans an email, it checks every rule, both its default rules and local rule sets that are created and maintained by the SysAdmin or email administrator. For each rule that matches, the score defined for that rule is added to the total score for that email. This is not a “one and done” type of scan; the email is checked against every rule.

SpamAssassin can be run as standalone software in some applications, however, in this environment, SpamAssassin is not run as a daemon, it is called by MIMEDefang. After the spam score for the email is returned to it, MIMEDefang calls the /etc/email/mimedefang-filter program which can perform any of several actions on the email. This program can add headers to the email, modify the subject, or just discard the email.

MIMEDefang is programmed in Perl, so it is easy to hack. I have hacked the last major portion of the code in /etc/mail/mimedefang-filter to provide a filtering breakdown with a little more granularity than it does by default. This code adds specified text to the subject line of the emails as a means to identify how likely this particular email is to be spam.

Preparation

Although I had already installed MIMEDefang and SpamAssassin on my email server prior to using Procmail for email sorting, our server, StudentVM2, does not have those tools installed. So we need to install them.

Experiment 9-2

Perform this experiment as the root user on StudentVM2. We will install MIMEDefang and SpamAssassin.
[root@studentvm2 ~]# dnf -y install mimedefang spamassassin

Note that despite the fact that Perl is already installed on our VMs, this command results in the installation of about 30 additional Perl packages that are required for MIMEDefang.

Verify that there are now some mimedefang* files and a spamassassin directory in /etc/mail.

Configuration

We need to configure MIMEDefang in order to have it add the text we need to the subject line and we also need to set up some SpamAssassin rules that we can intentionally trigger with our test emails so that they will be marked as spam. We also need to configure SendMail to call MIMEDefang as part of its normal mail processing tasks.

Configuring SendMail

SendMail must call MIMEDefang in order to start the spam-filtering process. It does this by calling the MIMEDefang mail filter. The term “mail filter” is generally shortened to “milter.”

We enable the MIMEDefang mail filter by inserting one line into our sendmail.cf configuration file.

Experiment 9-3

Perform this experiment as the root user on StudentVM2. Edit the sendmail.mc file and insert the following lines. I placed them just after the EXPOSED_USER line.
dnl #######################################################################dnl
dnl # The following line causes sendmail to use the MIMEdefang milter. dnl
INPUT_MAIL_FILTER(`mimedefang', `S=unix:/var/spool/MIMEDefang/mimedefang.sock, T=S:5m;R:5m')dnl
dnl #######################################################################dnl
Ensure that /etc/mail is the PWD and run the make command.
[root@studentvm2 mail]# make

Test to verify that we have not broken anything. Use tail -f to follow the maillog file on StudentVM2 and send an email from the student user StudentVM1 to the [email protected] account and your external email account. Ensure that there are no errors in the maillog file and that the email is delivered to the addressees.

Refer to Chapter 8 of the SpamAssassin3 book for more information about using MIMEDefang with SendMail and SpamAssassin.

Hacking mimedefang-filter

Let’s hack mimedefang-filter and have it add the text “####SPAM####” to emails with high enough spam scores to be considered spam. This is easy even if you don’t know Perl4 because I will show you exactly what to do.

Experiment 9-4

Perform this experiment as the root user on StudentVM2. In a later experiment in this chapter, we are going to modify one line of the mimedefang-filter Perl program but make a backup copy first and then examine the code we want to change.

After making a backup of the mimedefang-filter program, open it with the Vim editor. The code reproduced below is near the end of the mimedefang-filter file, starting around line 263 for version 2.84.
    # Spam checks if SpamAssassin is installed
    if ($Features{"SpamAssassin"}) {
        if (-s "./INPUTMSG" < 100*1024) {
            # Only scan messages smaller than 100kB.  Larger messages
            # are extremely unlikely to be spam, and SpamAssassin is
            # dreadfully slow on very large messages.
            my($hits, $req, $names, $report) = spam_assassin_check();
            my($score);
            if ($hits < 40) {
                $score = "*" x int($hits);
            } else {
                $score = "*" x 40;
            }
            # We add a header which looks like this:
            # X-Spam-Score: 6.8 (******) NAME_OF_TEST,NAME_OF_TEST
            # The number of asterisks in parens is the integer part
            # of the spam score clamped to a maximum of 40.
            # MUA filters can easily be written to trigger on a
            # minimum number of asterisks...
            if ($hits >= $req) {
                action_change_header("X-Spam-Score", "$hits ($score) $names");
                md_graphdefang_log('spam', $hits, $RelayAddr);
                # If you find the SA report useful, add it, I guess...
                action_add_part($entity, "text/plain", "-suggest",
                                "$report ",
                                "SpamAssassinReport.txt", "inline");
            } else {
                # Delete any existing X-Spam-Score header?
                action_delete_header("X-Spam-Score");
            }
        }
    }

The first two non-comment lines begin with “my” and are used to create local copies of certain variables that may be used in this code segment. The $hits variable is a numeric value that represents the spam score of the email.

The first if-else structure uses the Perl “x” operator to create a string that consists of a number of asterisks (*) equal to the integer number of the spam score. For example, a spam score of 7 would result in a string of 7 asterisks “*******” which results in a bar graph of the spam score. The first part of this if statement does this so long as the value of the $hits variable is less than 40. The “else” part of the logic simply creates a string of 40 asterisks if the value of $hits is 40 or more.

The second if-else statement takes some defined actions. If $hits is larger than the $req (required) variable, a header named X-Spam-Score is added with the following structure:
numeric spam score ($hits), the string of asterisks, test names (hits) that comprise the score

The line md_graphdefang_log('spam', $hits, $RelayAddr); adds an entry to a log file in the /var/log directory if we uncomment a line earlier in this file.

The final statement in this “if” section appends a SpamAssassin report to the email as an inline attachment. I find this report makes it easy to do problem determination when issues arise with SpamAssassin and its scoring.

If the $hits variable is less than $req, any existing spam score headers are deleted. Since emails may be scanned by multiple mail servers, this prevents spam scores from other servers from looking like we think this is spam. The $req variable defines the score at or above which an email is considered to be spam. The default is 5. To change this value, you must change the following entry in the /etc/mail/sa-mimedefang.cf configuration file.
required_hits           5

Over years of working with MIMEDefang and SpamAssassin, I have decided that I do not like the default actions taken to mark this as spam. The bar graph is not visible to the end user, and although it could be used by Procmail to determine how to sort the spam, I wanted something in the subject line where the recipient could see it and decide what to do with the message. I created a set of actions that works better for me and enable me to see spam info more quickly.

In this experiment, we change both sets of actions – the actions taken when an email is determined to be spam and those taken when it is not. That completely revised section of code is shown here. Replace the original section in your mimedefang-filter with the following code.
if ($hits >= $req) {
    action_add_header("X-Spam-Status", "Spam, score=$hits required=$req tests=$names");
    action_change_header("Subject", "####SPAM#### ($hits) $Subject");
    action_add_part($entity, "text/plain", "-suggest", "$report ", "SpamAssassinReport.txt", "inline");
# action_discard();
} else {
    action_add_header("X-Spam-Status", "Spam, score=$hits required=$req tests=$names");
    action_change_header("Subject", "####NOT SPAM#### ($hits) $Subject");
    action_add_part($entity, "text/plain", "-suggest", "$report ", "SpamAssassinReport.txt", "inline");
    # Delete any existing X-Spam-Score header?
    # action_delete_header("X-Spam-Score");
}

This revised code adds the X-Spam-Status header, prepends the “####SPAM####” string and the number of hits to the subject line, and attaches the SpamAssassin report to the end of the email message. It also does this for non-spam emails except that the message prepended to the subject is a bit different and says “####NOT SPAM####.” We do it this way in this experiment so that we can see that our spam detector is working even if the emails are not spam.

In a real-world environment, I do add the X-Spam-Status line to the headers on non-spam messages (ham), but I do not normally add anything to the subject line or append the SpamAssassin report to the message.

Note that this revision of the code does not delete existing headers.

Many users tend to freak out when they see that SpamAssassin report and the subject line with “####SPAM####” in it. As a result, I only add the report when I am trying to determine the source of a problem, such as a rule that is not working. The report allows me to easily see what is in the headers, but includes more information such as the exact score added by each rule. Also, if a user forwards an email to me, the report stays attached to the email but the original headers are deleted so they would be useless at that point.

Now we can start and enable MIMEDefang and restart SendMail. Note that SendMail must always be restarted after starting or restarting MIMEDefang. I wrote a little shell script to stop both services and then restart them in the correct sequence because I automate everything. It does not matter in which order they are stopped but they must be started MIMEDefang first and then SendMail. This is because MIMEDefang opens a socket that SendMail must find and also connect to. The socket is their communication channel.
[root@studentvm2 ~]# systemctl start mimedefang ; systemctl enable mimedefang
Created symlink /etc/systemd/system/multi-user.target.wants/mimedefang.service → /usr/lib/systemd/system/mimedefang.service.
Created symlink /etc/systemd/system/multi-user.target.wants/mimedefang-multiplexor.service → /usr/lib/systemd/system/mimedefang-multiplexor.service.
[root@studentvm2 ~]#
Restart SendMail.
[root@studentvm2 ~]# systemctl restart sendmail
Test this as the student user on StudentVM1, and send an email to [email protected]. When the email shows up in the Thunderbird inbox, view the email source. You could also use mailx as the student user on StudentVM2 to view the email which always shows the headers.
Return-Path: <[email protected]>
Received: from studentvm1.example.com ([192.168.56.21])
    by studentvm2.example.com (8.15.2/8.15.2) with ESMTP id x691SGdO016816;
    Mon, 8 Jul 2019 21:28:16 -0400
From: Student User <[email protected]>
Subject: ####NOT SPAM#### (-1) Test of SpamAssassin and MIMEDefang
Message-ID: <[email protected]>
Date: Mon, 8 Jul 2019 21:28:16 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.7.0
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----------=_1562635701-16772-0"
Content-Language: en-US
X-Spam-Status: Spam, score=-1 required=5 tests=ALL_TRUSTED
X-Scanned-By: MIMEDefang 2.84 on 192.168.56.1
This is a multi-part message in MIME format...
------------=_1562635701-16772-0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-Language: en-US
Hello world!
------------=_1562635701-16772-0
Content-Type: text/plain; name="SpamAssassinReport.txt"
Content-Disposition: inline; filename="SpamAssassinReport.txt"
Content-Transfer-Encoding: 7bit
Spam detection software, running on the system "studentvm2.example.com",
has NOT identified this incoming email as spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.
Content preview:  Hello world!
Content analysis details:   (-1.0 points, 5.0 required)
 pts rule name              description
---- ---------------------- -------------------------------------------------
-1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
------------=_1562635701-16772-0--
Now we need to find a way to test for true spam. SpamAssassin has provisions for this. In a terminal session as root on StudentVM2, make /usr/share/doc/spamassassin the PWD and list the contents. You will find, among other files, two text files that we can use to test with, sample-nonspam.txt, and sample-spam.txt. Use the test mode of the spamassassin command to test this.
[root@studentvm2 spamassassin]# spamassassin --test-mode < sample-spam.txt
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
        studentvm2.example.com
X-Spam-Flag: YES
X-Spam-Level: **************************************************
X-Spam-Status: Yes, score=1000.0 required=5.0 tests=GTUBE,NO_RECEIVED,
        NO_RELAYS autolearn=no autolearn_force=no version=3.4.2
X-Spam-Report:
        * -0.0 NO_RELAYS Informational: message was not relayed via SMTP
        * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email
        * -0.0 NO_RECEIVED Informational: message has no Received headers
Subject: [SPAM] Test spam mail (GTUBE)
Message-ID: <[email protected]>
Date: Wed, 23 Jul 2003 23:30:00 +0200
From: Sender <[email protected]>
To: Recipient <[email protected]>
Precedence: junk
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Spam-Prev-Subject: Test spam mail (GTUBE)
This is the GTUBE, the
        Generic
        Test for
        Unsolicited
        Bulk
        Email
If your spam filter supports it, the GTUBE provides a test by which you
can verify that the filter is installed correctly and is detecting incoming
spam. You can send yourself a test mail containing the following string of
characters (in upper case and with no white spaces and line breaks):
XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X
You should send this test mail from an account outside of your network.
Spam detection software, running on the system "studentvm2.example.com",
has identified this incoming email as possible spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.
Content preview:  This is the GTUBE, the Generic Test for Unsolicited Bulk Email
   If your spam filter supports it, the GTUBE provides a test by which you can
   verify that the filter is installed correctly and is detecting incoming spam.
   You can send yourself a test mail containing t [...]
Content analysis details:   (1000.0 points, 5.0 required)
 pts rule name              description
---- ---------------------- -------------------------------------------------
-0.0 NO_RELAYS              Informational: message was not relayed via SMTP
1000 GTUBE                  BODY: Generic Test for Unsolicited Bulk Email
-0.0 NO_RECEIVED            Informational: message has no Received headers
(END)
This method tests SpamAssassin and MIMEDefang but not the full path a real email would take through the MTAs and the email never appears in our inbox. So we can also test using the mailx command so that the email goes to our inbox.
[root@studentvm2 spamassassin]# cat sample-spam.txt | mailx -s "Test spam" [email protected]

Open the email as the student user on StudentVM2 in the mailx client. Examine the email and view the added headers and the attached SpamAssassin report.

We can see that our anti-spam configuration is working as it should.

The subject line of the now contains the string “####SPAM####” or “####NOT SPAM####” but without the quotes, and the spam score, that is, the variable $hits. Having a known string in the subject line of spam makes further filtering easy.

The modified email is returned to SendMail for further processing.

Setting up a mail folder

We have not yet set up a folder on the server to contain the email folders like Spam and others we might want to create for the student user. We want to store all of our email folders in a subdirectory of the account’s home directory to prevent our email client from accessing other files and folders in the account’s home directory.

Whether a login or non-login account, the email client will see all files in the home directory as a possible email folder. We want to prevent that so we create a folder called “Mail” to use as the main email location in which the folders created by the email client, such as Thunderbird, are located.

Experiment 9-5

Start this experiment as the root user on StudentVM2. We do this because most email accounts will be no-login accounts so root normally does this.

Ensure that the home directory for the student user is the PWD. Create a directory, /home/student/Mail, and set the user and group ownership both to student.

Move the Sent and Trash files into the Mail directory.

Now, as the student user on StudentVM1, open the Thunderbird EditAccount Settings, and select Server Settings. Click the Advanced button to open the Advanced Account Settings dialog. In the field for IMAP server directory, type Mail. Click OK to close the dialog and OK to close the Account Settings.

On the main Thunderbird window, right-click the [email protected] account – the top-level folder where it says [email protected] and not any of the sub-folders – and choose New Folder. Type “Spam” (without the quotes) in the Name field and click the Create Folder button. The new folder should appear in the list of folders along with Sent and Trash.

Configuring Procmail

The last thing that SendMail does is call Procmail to act as the MDA. Procmail then checks the home directory of the user to which the email is addressed for the existence of a ~/.procmailrc file. If one does not exist, Procmail deposits the email into the user’s inbox in /var/spool/mail. What happens when the ~/.procmailrc file does exist is the topic of this section.

What we need Procmail to do is to use the text now added to the subject line to look at the email before it gets placed in the Inbox and to route it to a different folder which we will call, naturally enough, “Spam.”

Procmail uses global and user-level configuration files. The global /etc/procmailrc file and individual user ~/.procmailrc files must be created. The structure of the files is the same, but the global file operates on all incoming email while the local files can be configured for each individual user. I do not use a global file so all of the sorting is done on the user level. My .procmailrc file is shown in Experiment 9-6 and is simple.

Note that the ~/.procmailrc file must be located in the home directory of the email account on the email server. It does not go in the home directory on individual client workstations. Because most email accounts are not login accounts, they use the nologin program as the default shell. Therefore, the admin will need to create and maintain these files. The other option is to change to a login shell such as BASH and set passwords so that knowledgeable users can login to their email accounts on the server and maintain their ~/.procmailrc files.

Each recipe starts with :0 (yes, that is a zero) on the first line and contains a total of three lines. The second line starts with * and contains a conditional statement consisting of a regular expression (regex) that Procmail compares to each line in the incoming email. If there is a match, Procmail sorts the email into the folder specified by the third line. The use of the ^ symbol denotes the beginning of the line when making the comparison.

Experiment 9-6

Perform this experiment as the root user on StudentVM2. We are going to create a .procmailrc file for the student user.

Use a text editor to create a new /home/student/.procmailrc file and add the following content.
################################################################################
# .procmailrc file for [email protected]                                     #
#                                                                              #
# Rules are run sequentially - first match wins                                #
# It is not necessary to reboot or to restart email. Changes take place as     #
# soon as the file is saved.                                                   #
#                                                                              #
################################################################################
# Set the environment
PATH=/usr/sbin:/usr/bin
MAILDIR=$HOME/Mail  #location of your mailboxes
DEFAULT=/var/spool/mail/student
# Send Spam to the spam mailbox
:0
* ^Subject:.*####SPAM####
$MAILDIR/Spam
# sorts all remaining messages into the default inbox
:0
* .*
$DEFAULT
############################################################################

The first recipe in my .procmailrc file sorts the spam identified in the subject line by MIMEDefang into my spam folder. Procmail ignores case, so there is no need to create recipes that look for various combinations of upper- and lowercase. The second and last recipe sorts all email that does not match another recipe into the default folder, usually the Inbox.

Having the .procmailrc file in my home directory does not cause Procmail to filter my mail. I have to add one more file, the ~/.forward file, which tells Procmail to filter all of my incoming email. Create the /home/student/.forward file and add the following content.
# .forward file
# process all incoming mail through procmail - see .procmailrc for the filter rules.
|/usr/bin/procmail

Ensure that both of these new files have ownership of student.student. It is not necessary to restart either SendMail or MIMEDefang when creating or modifying the Procmail configuration files.

To test all of these changes, return to StudentVM2 as the root user and from a root terminal session send some test emails, both ham and spam. Make sure that /usr/share/doc/spamassassin is the PWD and then issue these commands.
[root@studentvm2 spamassassin]# cat sample-nonspam.txt | mailx -s "Test nonspam" [email protected]
[root@studentvm2 spamassassin]# cat sample-spam.txt | mailx -s "Test spam" [email protected]
The non-spam email should be sorted to the Inbox and the spam email sorted to the Spam folder. Except that did not happen. So I looked in /var/log/maillog and found the entries below.
Jul 10 07:10:33 studentvm2 sendmail[3930]: x6ABAU7d003928: x6ABAX7c003930: DSN: Service unavailable
Jul 10 07:10:33 studentvm2 smrsh[3932]: uid 1000: attempt to use "procmail" (stat failed)

The problem here is that, as I did on my own mail server, I missed one step. It is easy to miss because I found the true answer in only one place.

We need to add a symbolic link in the /etc/smrsh directory. Smrsh stands for “SendMail restricted shell,” which is a reasonably Secure Shell in which SendMail can run scripts and which will help prevent crackers from exploiting SendMail for their own purposes.

Create the link, /etc/smrsh/procmail using the following commands.
[root@studentvm2 ~]# cd /etc/smrsh ; ln -s /usr/bin/procmail procmail ; ll
total 0
lrwxrwxrwx. 1 root root 17 Jul 10 07:15 procmail -> /usr/bin/procmail

Now perform the test again by sending the spam and non-spam email messages again. Now they should go into the correct folders.

We could have created both of these files as the student user on StudentVM2 but, in most environments, regular users will not have login access to the server.

Reports of Procmail’s demise

Having done many Internet searches while researching this chapter, I found a number of results dating from 2001 through about 2013 that declare Procmail to be dead. They point for evidence at the no longer working web pages, missing source code, and a short article on Wikipedia that does no more than declare Procmail to be dead and provides links to more recent replacements.

However, all Red Hat, Fedora, and CentOS distributions install Procmail as the MDA for SendMail. The Red Hat, Fedora, and CentOS repositories all have the source RPMs for Procmail, and the source code is also on GITHub. Red Hat documentation for CentOS contains some decent documentation for Procmail.5

Considering the continued use of Procmail by Red Hat, I have no problem with using this mature software that does its job silently and without fanfare.

Creating SpamAssassin rules

Now that we have a working solution, what happens when we start getting spam that does not match any rules or for which the matched rules do not add up to a high enough score to make the cut as spam? We can adjust the default scores and write new rules using the /etc/mail/spamassassin/local.cf file.

The files located in /usr/share/spamassassin that begin with two-digit numbers are configuration files that define rules for specific types of spam. When SpamAssassin matches a rule in one of these files, it then searches for a score in the 72_scores.cf file. These files should not be altered.

There are also two files in /usr/share/spamassassin that we can use as templates or starting points for local configuration. These files make it easy for us to configure SpamAssassin by adding rules and changing scores so that we don’t need to change the default configuration files. The default files can be replaced during an update and cause our changes to them to be overwritten.

The local.cf file, of which there is already a copy in /etc/mail/spamassassin, is used to create local rules, alter the scores of existing default rules, and set whitelist and blacklist entries.

The user_prefs.template can be used by individual users to override the default preferences. This file would need to be copied to the user’s home directory and renamed to user_prefs. For example, a user might wish to specify a higher required_score to ensure that some emails with somewhat higher spam scores than the default of 5 be allowed through as ham. This would also be the file in which users would add whitelist and blacklist entries, create their own rules, and change scores. In most modern installations, end users will not be knowledgeable enough, or not have login access to the email server, to perform these tasks so it would fall to the SysAdmin to make those changes for them.

Before we make any changes, we need to look at the default rule set which should never be changed in any way.

Experiment 9-7

Begin this experiment as the root user on StudentVM2. If you do not already have a root terminal session open on the desktop and following /var/log/maillog, do so now with this command.
[root@studentvm2 ~]# tail -f /var/log/maillog

In another root terminal session, make /usr/share/spamassassin the PWD. List the files in that directory. The files you see there are used for local configuration or, as in the case of the files that begin with “V”, are version-specific configuration. We need only concern ourselves with the local.cf file to specify our local configuration changes.

We start by changing the score for a rule that we know the spam test email already matches. As the student user on StudentVM2 make /usr/share/doc/spamassassin the PWD and send this email.
[student@studentvm2 spamassassin]$ cat sample-spam.txt | mailx -s "Test email" [email protected]

As the student user on StudentVM1, open Thunderbird, if it is not already, and look in the Spam folder for the new email. Select the spam email that was just received and scroll down to the SpamAssassin attachment. You will see that this email matched the GTUBE6 rule which gave the email a score of 1000 which is high enough that even the best non-spam rules, such as ALL_TRUSTED and many more, could not overcome this score to make it look like non-spam.

Let’s change this number just to see how a score change works. As root on StudentVM2, edit /etc/mail/spamassassin/local.cf and add the following line.
score           GTUBE   600
Save the local.cf file but do not exit the editor because we will be making some additional changes to the local.cf file. Stop both SendMail and MIMEDefang and then start them MIMEDefang first and then SendMail.
[root@studentvm2 ~]# systemctl stop sendmail ; systemctl stop mimedefang ; systemctl start mimedefang ; systemctl start sendmail

I wrote a little script to do this on my own mail server and you can too, if you like. I experiment with this a lot and make changes to the local.cf so a script with a short name can save a lot of typing. At any rate, here are the commands you need. It does not matter in which order you stop the services.

Now send the following email message as the student user on StudentVM2, where the number in the subject is in the form YYYYMMDDHHMM for easy identification.
[student@studentvm2 spamassassin]$ cat sample-spam.txt | mailx -s "Test email 201907220828" [email protected]

Be sure to check the log file messages and then look at the email using Thunderbird on StudentVM1. If necessary, scroll down and look at the SpamAssassin report which shows the scores. The score for GTUBE should be 600.

Now let’s add a new rule to local.cf. It takes three lines to create a new rule. The first line defines the location of the search such as the header or body of the message and a Perl REGEX to define a specific line such as the subject and the pattern to be matched. Each line also contains an identifier that is typically in uppercase.

The second line is a description of the rule that will be printed in the SpamAssassin report. And the third line contains the score that is applied to the message when the rule is matched. I also add a comment as a bit of explanation and a separator to make a long list of rules easier to read. And I have a very long list.

I get a lot of spam email that has something about “back taxes” in the body of the message. It can take a long time to scan the body of a message especially if the body is large so I try to have as few rules that scan the body as possible.

Add the following three lines below the score modification that we previously added to local.cf.
# Back Taxes
body            BACK_TAXES              /back taxes/i
describe        BACK_TAXES              Contains "back taxes" in the body
score           BACK_TAXES              6.0

The regular expression /back taxes/i looks for the text “back taxes” and the trailing “i” tells Perl to ignore the case so that any combination of upper- and lowercase will match.

Restart the MIMEDefang and SendMail services in the correct order and send the following email as the student user on StudentVM2. Be careful because it is different and now has our trigger text in the body.
[student@studentvm2 spamassassin]$ echo "Let us save your back taxes." | mailx -s "Test email 201907220910" [email protected]
View the message and its source using Thunderbird. The message source looks like this on my host. It should be very similar on yours. Notice the X-Spam-Status line and the SpamAssassin report.
Received: from studentvm2.example.com (localhost [127.0.0.1])
    by studentvm2.example.com (8.15.2/8.15.2) with ESMTPS id x6MDANK6003406
    (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT)
    for <[email protected]>; Mon, 22 Jul 2019 09:10:23 -0400
Received: (from student@localhost)
    by studentvm2.example.com (8.15.2/8.15.2/Submit) id x6MDAMSG003405
    for [email protected]; Mon, 22 Jul 2019 09:10:22 -0400
From: Student User <[email protected]>
Message-Id: <[email protected]>
Date: Mon, 22 Jul 2019 09:10:22 -0400
Subject: ####SPAM#### (5) Test email 201907220910
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----------=_1563801023-3364-0"
X-Spam-Status: Spam, score=5 required=5 tests=ALL_TRUSTED,BACK_TAXES
X-Scanned-By: MIMEDefang 2.84 on 192.168.56.1
This is a multi-part message in MIME format...
------------=_1563801023-3364-0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
“Let us save your back taxes.â€
------------=_1563801023-3364-0
Content-Type: text/plain; name="SpamAssassinReport.txt"
Content-Disposition: inline; filename="SpamAssassinReport.txt"
Content-Transfer-Encoding: quoted-printable
Spam detection software, running on the system "studentvm2.example.com",
has identified this incoming email as possible spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.
Content preview:  =E2=80=9CLet us save your back taxes.=E2=80=9D=20
Content analysis details:   (5.0 points, 5.0 required)
 pts rule name              description
---- ---------------------- -----------------------------------------------=
---
-1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
 6.0 BACK_TAXES             BODY: Contains "back taxes" in the body
------------=_1563801023-3364-0--

Send an email that does not contain “back taxes” in the body to verify that this rule would not match. Also, send some emails with various upper- and lowercase combinations of “back taxes” to ensure that they do match.

Now add a rule that checks for the text string “XXX” in the subject line and adds 15 points to ensure that it gets counted as spam. The Perl regular expression uses =~ to specify that the subject “contains” the search pattern. So “I have XXX for you” would be a match.
# XXX
header          XXX              Subject =~ /XXX/i
describe        XXX              Contains "XXX" in the subject line
score           XXX              15.0

Restart the SendMail and MIMEDefang services in the proper sequence and test this new rule.

We want to ensure that email from certain domains such as example.com, both.org, and opensource.com are allowed through regardless of their other spam scores. We can also blacklist a domain like spammer.com.

Add the following lines to the local.cf file and restart the services.
whitelist_from  *@example.com
whitelist_from  [email protected]
whitelist_from  *@opensource.com
blacklist_from  *@spammer.com          # Misc spammer

The * character is a metacharacter that matches all characters to the left of the @ sign in the email address. Entries using this metacharacter as we have here match all email accounts from the specified domain. I have specified a whitelist for only my own email from my personal both.org domain. Other accounts from both.org will not be whitelisted. That does not mean that they will be automatically considered spam because they would still need a score of at least +5.

We can only test this with the example.com domain by sending another email to ourselves. Send the following email as the student user on StudentVM1.
[student@studentvm1 ~]$ echo "This is a test email" | mailx -s "Test email" [email protected]

Additional resources

There are few really good resources for someone who needs to create an email system from nothing. My intent in the chapters about and pertaining to email in this course was to at least partially fill that gap. Chapters 7, 8, and 9 of this volume of the course provide enough information to get started with a reasonably well-constructed email server that can grow to absorb the workloads of a small to medium-sized organization.

As part of my research for this course – these chapters dealing with email, spam, and malware in particular – I discovered the book Pro Open Source Mail – Building an Enterprise Mail Solution7 by Curtis Smith. That book is the one I wish I had when I first started my own email server. In many ways, Smith takes the same path as I did and ends up with most of the same software. The only significant difference is his choice of Dovecot as his IMAP server whereas we use UW-IMAP. The author of that book also goes into much more detail than I have in this course. I highly recommend Pro Open Source Mail despite the fact that it is somewhat older because it presents a complete, integrated solution rather than just one part as do most books.

Chapter summary

Although we could have used Procmail by itself for spam filtering and sorting, I think SpamAssassin does a better job of scoring because it does not rely on a single rule to match, but rather the aggregate score from all of the rules, as well as scores from Bayesian filtering.

Procmail works very well when matches can be made very explicit with known strings such as the ones that I have configured MIMEDefang to place in the subject line. I think Procmail works better as a final sorting stage in the spam-filtering process than as a complete solution all by itself. Of course, I know that many admins have made complete spam-filtering solutions using nothing more than Procmail.

Now that I have server-side filtering, I am somewhat less limited in my choice of email clients because I no longer need a client that performs filtering and sorting. Nor do I have a need to leave an email client running all the time to perform that filtering and sorting.

Exercises

Perform the following exercises to complete this chapter.
  1. 1.

    Add and test a SpamAssassin rule that adds two points when it matches the text “free money” in the subject line. Name the rule FREE_MONEY_1. Send an email to [email protected] from StudentVM1 that contains that phrase. View the SpamAssassin report to verify that the new rule is working.

     
  2. 2.

    Use the Thunderbird email client to add a new folder and name it FreeMoney. Then add a new rule to the Procmail file that matches any emails with the string FREE_MONEY_1 in the X-Spam-Status header and sorts them into the new folder. Test.

     
  3. 3.

    Locate the file in which the default scores for whitelists and blacklists are stored. What rule name is used when a user account is whitelisted?

     
  4. 4.

    What is the score added to a user that is whitelisted?

     
  5. 5.

    Why must MIMEDefang be started (or restarted) before SendMail?

     
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.62.197