|
|
Spam Filters
Home Page > Training > Tutorials > Spam Filters
Spam Filters are software that screen out spam from Emails. There
are several filters and several ways to run them. This article
explains the concepts behind two Linux filters and gives
guidance on how to use the filters with the EMail software KMail.
How filters work
EMail software that collects emails from the internet is called
an EMail client.
EMail sent to you by people you with whom you have no
sort of relationship and which is not requested, is referred to as
"spam". For those who want to rid themselves
of spam, there is more than one ways to do so. Different
methods have varying levels of success.
Simple spam procedures eliminate emails that have certain
known "bad" words (such as "free offer" or "viagra"), but they miss
misspelt words (such as "f.r.e.e offers" or "v i a g r a") and
words not identified as bad. The systems that identify a huge
number of bad words also run the risk of eliminating valid
emails that happen to contain one of the words on the
bad word list.
The more intelligent systems have a variety of ways to identify
spam, often referred to as "rules" with a programmed ability to
learn from experience. Both SpamAssassin and Quick Spam Filters
featured in this article are examples of software with this
intelligent capability.
When an email is collected by your email software (the email
client) from the internet (the email server), the email is
"passed" to the filter software which checks each email against
its known/learnt spam rules. There are
several ways these software can be used. The one we recommend
identifies emails as spam by adding some text to the
email header. In this way, the email message remains
in tact - useful where the email software incorrectly
identified a valid email as spam. Once email has been
identified as spam, albeit in a relatively invisible way to
you, spam emails can be spirited away to their own folder,
leaving you to read your "genuine" emails in peace.
(Back to contents)
SpamAssassin
All images in this section can be seen in perfect resolution by clicking
on the image.
SpamAssassin can be found at spamassassin.rediris.es.
Firstly, launch KMail and set up folders into which you can
separate out spam email. In KMail, set up three
email folders, called something like "FilteredSpam", "MissedSpam" and
"NonSpam". You can achieve this in KMail by right-clicking
any folder (that you want to contain the new
folders) and selecting the "Create Child Folder" option.
Copy (or move) as many spam emails as you can to the "MissedSpam"
folder, which will soon become the basis on which SpamAssassin will
learn which emails you consider to be spam.
Next, set up the system that will call SpamAssassin for each
new email collected from the server. This is achieved by using
the KMail filter system - two filters are required.
In KMail,
click "Settings","Configures Filters". To create a new filter,
click on the funny looking "new" button just above the "help"
button. Rename it to something like "SpamAssassin". In this
filter, set the following options.
In the Filter Criteria,
click "Match all of the following". Set the first filter
rule so that "<size>" "is less than" "250000". (This tells
KMail to apply the following rule to any message that is
smaller than 250KB in size. The larger the size of the
email, the longer SpamAssassin takes to vet it. This size limit
is a suggestion that you may want to play with once
you have the system working). Set the first "Filter Action" to
"pipe through" and in the text box, type either "spamc" or
"spamassassin". (This tells email to pipe each message that
meets the Filter Rule above through the program using
either the spamassassin or spamc command - see below for
guidance about which will be appropriate to you. The program
will add the line "X-Spam-Status: YES" to the email header
where it identifies the email as a spam, or "X-Spam-Status: NO"
otherwise). Finally, uncheck the option "if this filter matches,
stop processing". (If left checked, the emails will not
flow through to the next stage, which may make the entire
process largely redundant). Click the "apply" button when ready.
Add a second filter. This time, rename it to something like
"SpamFilter". Set the Criteria settings to "match any of the
following", "any header" "contains" "X-Spam-Status: YES". Set
the Actions settings to "move to folder" "FilteredSpam". Uncheck
the "if this filter matches, stop processing". Again, click the
"apply" button when ready.
KMail is now ready to apply SpamAssassin. Here are a couple of
notes about SpamAssassin itself.
- The installation process is well documented at the SpamAssassin
web site (spamassassin.rediris.es).
When installed, run through the software's learning process using the
command-line command "sa-learn --mbox --spam /home/UserName/Mail/.spam.directory/MissedSpam/*".
This assumes you have set up your folders as an "mbox" folder. If
your folder is of type "maildir", use instead the command
"sa-learn --spam --dir /home/UserName/Mail/.spam.directory/MissedSpam/*".
(Note the KMail specific organising of its folders. If you have
a folder in KMail called "spam", which containts child folders,
they can be found, physically, below the "Mail" directory
within a directory name ".spam.directory". The word "spam" will
be replaced with whatever name the folder is called. You
can tell which format the folder is in by right clicking the folder
in KMail, and looking at the properties via the "properties" option). You
will probably want to run this command frequently, to continue
qsf's training well into the future, so you may want to keep it handy.
- You also want to teach SpamAssassin what emails you do like to receive.
Make sure none of the folders used in the training contain spam. When
you are ready, use on each of the folders the command-line command
"sa-learn --mbox --ham /home/UserName/Mail/.spam.directory/NonSpam"
(or use the --dir format above if not using mbox format). Of course,
you can use existing folders instead of the "/Spam/NonSpam" folder
in this example. SpanAssassin
recommends that several hundred emails are included in the training
directories to provide a solid base for filtering.
-
SpamAssassin comes with a base set of rules. According
to people who have used the software without any of the
"training" above, the software still works very well. So
training is by no means mandatory.
- Filter software is run against every email that comes in. Needless
to say, the filtering can take some time. To accommodate those of
use for whom this is an issue, SpamAssassin can run in either of
two modes. The straight-forward mode launches the program
every time an email is checked. In this mode, just use the command
"spamassassin" in the relevant filter above.
- The other mode is the "daemon" mode. In this mode, SpamAssassin
is running all the time, as a daemon, waiting for the next
call. Within the installation process, a directory
appears below the installation directory called "spamd".
This directory contains a file called "spamc". This is a
pre-compiled program. It is the client program that
hooks up with a running daemon. Move (or copy) this executable
file somewhere your email program can see it. This
program will only work when the daemon is running. If this
is for you, use the command "spamc" in the Filter Rules above.
To run the daemon, copy the file "suse-rc-script.sh" in the
spamd directory (or the red hat version if appropriate) to
somewhere like the /etc/init.d directory and follow the instructions
in the shell command. If you a shell is not preconfigured for you,
it should not be too difficult to adapt one of the existing
shells to your own system. To get the daemon running, use the
command-line command "/etc/init.d/spamd start" (adapted to suit
the directory and filename you copied the shell command to).
- To check out the software, just collect your email in the normal way.
If you are still unsure whether things are working as expected,
you can always click on a known spam and run the filter manually.
To do so, make sure the spam is highlighted in KMail and click
"Message", "Apply filters". This runs the filters on the selected
message(s). Look at the headers (click on the message and click
"View", "All headers") to see if there is a header of
"X-Spam-Status: YES" (or "NO"). If so, you know for certain that
the software is properly running.
- When running the software, one tip is to move all spam
email that was "passed" as non-spam to a folder called
"MissedSpam", to run the "sa-learn --spam..." command on this
folder, and to delete it when finished. Similarly, a
quick scan of the "FilteredSpam" folder will help you
pick up if legitimate emails are getting filtered out, in
which case you need to use the "sa-learn --ham ..." command
to prevent future errors in future.
(Back to contents)
Quick Spam Filter (qsf)
All images in this section can be seen in perfect resolution by clicking
on the image.
qsf can be found at www.ivarch.com.
Firstly, launch KMail and set up folders into which you can
separate out spam email. In KMail, set up three
email folders, called something like "FilteredSpam", "MissedSpam" and
"NonSpam". You can achieve this in KMail by right-clicking
any folder (that you want to contain the new
folders) and selecting the "Create Child Folder" option.
For qsf, the folders must have a MailBox format of type "mbox".
This is one of the options available when creating a new folder. If you
have missed this point or have MailBox format "maildir", create
new directories with the "mbox" format, move emails from the old
to the new directories and delete the old directories.
Copy (or move) as many spam emails as you can to the "MissedSpam"
folder, which will soon become the basis on which qsf will
learn which emails you consider to be spam.
Next, set up the system that will call qsf for each
new email collected from the server. This is achieved by using
the KMail filter system - two filters are required.
In KMail,
click "Settings","Configures Filters". To create a new filter,
click on the funny looking "new" button just above the "help"
button. Rename it to something like "QuickSpamFilter". In this
filter, set the following options.
In the Filter Criteria,
click "Match all of the following". Set the first filter
rule so that "<size>" "is less than" "250000". (This tells
KMail to apply the following rule to any message that is
smaller than 250KB in size. The larger the size of the
email, the longer qsf takes to vet it. This size limit
is a suggestion that you may want to play with once
you have the system working, up to the system's upper limit
of 512KB). Set the first "Filter Action" to
"pipe through" and in the text box, type "qsf". The program
will add the line "X-Spam: YES" to the email header
where it identifies the email as a spam, or "X-Spam: NO"
otherwise. (Note, the message may be "X-Spam-Flag: YES" in
some versions of the software). Finally, uncheck the option
"if this filter matches, stop processing". (If left checked,
the emails will not
flow through to the next stage, which may make the entire
process largely redundant). Click the "apply" button when ready.
Add a second filter. This time, rename it to something like
"SpamFilter". Set the Criteria settings to "match any of the
following", "any header" "contains" "X-Spam: YES". Set
the Actions settings to "move to folder" "FilteredSpam". Uncheck
the "if this filter matches, stop processing". Again, click the
"apply" button when ready.
KMail can now be used to filter out spam. There are two other filters
that you may find helpful. One updates qsf, letting it add
the missed spam to its database. The other lets qsf know
to remove email it had incorrectly added to its "spam" database.
Add a third filter, renaming it to something like "Mark as Non-Spam". This filter
removes an email incorrectly identified as spam. Set the Criteria with "match any of
the following", "any header", "contains", "X-Spam: YES". Set the Actions setting to
"pipe through", with a value of "qsf -M -a". You want
to add another Action, which you can do by clicking the "More" button, just at the
end of the Action section. This adds a new line which should contain something
like "move to folder", "inbox" (if you want to move the incorrectly identified spam
back to your inbox). Under the Advanced option, uncheck the "apply to incoming
messages" and "apply to sent messages" and check the "on manual filtering" option.
See the notes after the next paragraph for instruction on how to apply this filter.
Add the fourth filter, renaming it to something like "Mark as Spam". This filter
adds a spam email that was missed by qsf. Set the Criteria with "match any of
the following", "any header", "contains", "X-Spam: NO". Set the Actions setting to
"pipe through", with a value of "qsf -m -a". You want
to add another Action, which you can do by clicking the "More" button, just at the
end of the Action section. This adds a new line which should contain something
like "move to folder", "trash" (if you want to move the spam email to trash). Under
the Advanced option, uncheck the "apply to incoming messages" and "apply to
sent messages" and check the "on manual filtering" option.
Filters are applied by highlighting the message(s) you want to be filtered,
then click the menu bar item "Message", then the "Apply filter" option (or Control-J for
short). Say, for example, you had identified some spam that should not
really be spam. If you followed the above suggestion, you would have a
folder called "Incorrectly marked". You would
typically run through the "FilteredSpam" folder, moving the incorrectly marked
spam to the "IncorrectlyMarked" folder. At the end of the process, you would
highlight every message in this folder and click the "Apply filter" option. Similarly,
for messages in your inbox that were not identified as spam, you might move
them to a "MissedSpam" box. Highlight all these messages and repeat the "Apply filter"
instruction.
KMail is now ready to apply qsf. Here are a couple of
notes about qsf itself.
- The installation process is well documented at the qsf
web site (www.ivarch.com).
When installed, run through the software's learning process using the
command-line command "qsf -T /home/UserName/Mail/.spam.directory/MissedSpam /home/UserName/Mail/.spam.directory/NonSpam".
This assumes you have set up your folders as an "mbox" folder. If
your directory is of type "maildir", see above for a fix.
(Note the KMail specific organising of its folders. If you have
a folder in KMail called "spam", which containts child folders,
they can be found, physically, below the "Mail" directory
within a directory name ".spam.directory". The word "spam" will
be replaced with whatever name the folder is called. You
can tell which format the folder is in by right clicking the folder
in KMail, and looking at the properties via the "properties" option.) You
will probably want to run this command frequently, to continue
qsf's training well into the future, so you may want to keep it handy.
- The one command-line command teaches qsf both the emails you
do not like to receive and the ones you do. To be effective, qsf
recommend that you copy a minimum of 75 spam emails and 300 non-spam
emails to the relevant directories.
- To check out the software, just collect your email in the normal way.
If you are still unsure whether things are working as expected,
you can always click on a known spam and run the filter manually.
To do so, make sure the spam is highlighted in KMail and click
"Message", "Apply filters". This runs the filters on the selected
message(s). Look at the headers (click on the message and click
"View", "All headers") to see if there is a header of
"X-Spam: YES" (or "NO"). If so, you know for certain that
the software is properly running.
- When running the software, one tip is to move all spam
email that was "passed" as non-spam to a folder called
"MissedSpam", to run the "qsf -T ... " command on this
folder, and to delete it when finished. Similarly, a
quicky scan of the "FilteredSpam" folder will help you
pick up if legitimate emails are getting filtered out, in
which case you, the qsf documentation provides guidance
on how to "unteach" invalid rules.
(Back to contents)
|
|
|