Why I Don't Trust Spam Filters
This is an attempt to describe why I don't trust spam filters - especially filters that aren't based on a particular user's preferences or reading habits. This is a work in progress.
Spam is in the eye of the beholder.
There's no universal and objective definition of spam that can be
evaluated without knowing whether the user wanted to see such
messages. A message might be offensive to 99% of email users in
general, but the 1% that want to receive such messages have a right to
receive them. Advertisements for Fireman's Pump might be spam to most
people, but they aren't spam to people who have requested information
about such products. You cannot reliably determine whether a
message is spam simply by looking at its content.
Spam filters are often based on dubious criteria.
Many spam filters employ criteria that have nothing to do with whether
the message is spam. For example:
To: xx-digest-list: ;This is perfectly valid syntax, and it's not an indicator of spam. The space between the colon and semicolon wasn't there when the message was generated; it got added as the result of sendmail header-munging. But it's still valid.
Some of these criteria might correlate well with spam but that doesn't mean they're good indicators of spam.
The more criteria that are employed by a spam filter, the
harder it is to analyze that spam filter for reasonableness, and the
more likely it is that one or more buggy criteria could result in loss
of email.
Some spam filters employ lots of criteria, with varying degrees of
accuracy, in an attempt to increase their effectivenes. But this can
also increase the false positive rate, especially if the criteria are
weighted only according to how well they correlate with spam.
Criteria that do not have a very low false positive rate are at best
useless. At worst they degrade the false positive rate of the filter.
It's easy to misinterpret effectiveness claims.
Say that a filter claims a false positive rate, defined as: (number of
messages mistakenly labeled as spam / total number of messages ) of
1%. That might sound fairly good to some people. But I could get a
FP rate of better than 1% simply by deleting all of my incoming mail,
since I get well over 100 spams for every legitimate message. Somehow
that doesn't seem like a useful filter.
A somewhat better metric would be ( number of messages mistakenly labeled as spam / total number of legitimate messages ). If this were in the .01% range or better for every user it might be considered acceptable, as then the rate of delivery failures due to spam filtering would be marginal compared to other reasons for delivery failure. And some of the better spam filters are approaching this when evaluated against a corpus of test messages. However this should still be taken with a grain of salt, because what is or isn't spam varies from one user to another, and because how well a filter works with a pre-determined corpus of messages is not a very good indicator of how well the filter will work with real messages.
What works for one person does not work well for everyone.
Say you are a sysadmin and you install a spam filter for yourself, and
you find that it filters out 90% of your incoming spam and gives you
very few false positives. Would it make sense to impose that same
filter on several thousand users? No, because unless your user
community is small, it's probably the case that neither your mail
reading preferences nor your set of regular correspondents are
representative of that user community.
Using dubious criteria to filter spam is bad. OTOH, merely
tagging email as spam is nearly useless.
There are people who don't mind losing incoming email, basically
because they don't expect to ever receive anything important
over email. Those people are quite happy to have most of their spam
disappear even if it means losing an occasional legitimate message.
But that's certainly not true of everyone. Do we really want an email
system that's so unreliable it cannot be trusted to send important
messages? For someone who actually needs to receive his or her mail,
spam filters that aren't tuned for that user are nearly useless.
Yes, you can use a spam filter to simply tag incoming messages. But with or without the tagging, the routine is the same:
Rarely can the spam filter aid the user in making a decision about which messages to unselect.
I have a few filters that identify messages that cannot possibly be for me, usually because they are in a language that I cannot possibly understand. Those messages get silently deleted. But the messages tagged by our local SpamAssassin can't get deleted because too many valid messages get falsely identified as spam. I still have to peruse them.
A filter that can tag X% of my mail with as spam with 99.99% reliability saves me the effort required to review X% of my incoming mail. But a filter that only gives me 99% reliability is almost useless. I am not willing to use a spam filter that loses 1 in 100 messages that were meant for me.
Even if I set SpamAssassin's threshhold very high, all it takes is a couple of bogus tests to mis-identify something as spam. (And yes it would be possible for me to assign my own weight to SpamAssassin tags and trust only the ones that I consider reliable, but there are too many tests to peruse and the only way to get good information on most of the tests is to read the source code. Basically it's too much trouble.)
A filter that works well for a large aggregate group of users does
not necessarily work well for individual users in that group, or in a
different group.
Messages are not uniformly distributed among users. The set of people
a user corresponds with, and the kinds of mail a user wants to receive
can vary considerably from one user to another. A filter that works
well for some users (say, those who don't care if they lose mail) but
which causes even a small percentage of users to lose important mail,
is not acceptable.
Evaulating the effectiveness of a spam filter by how well it selects messages from a large corpus of test messages is of dubious value, for several reasons:
Other people's spam filters block mail that I send without any good
reason.
Since my local mailer doesn't actually filter spam, but only tags it,
I don't have any problem receiving legitimate mail that people send to
me. However I frequently (say twice a month) have to deal with people
whose incoming mail was filtered using poor criteria. Sometimes this
is from mail that I sent directly, other times this is from mail that
was sent or relayed using software that I wrote. Which brings me
to...
The failures of spam filters are detected on the wrong end, if
they are detected at all.
Like many operational failures in email, failures of spam filters are
rarely detectable by the people that install them. Since widespread
practice is simply to drop messages suspected to be spam, frequently
the only way that spam filter bugs are detected is when the sender
doesn't get a reply from a recipient to a message that he thinks
should have elicited a reply. Occasionally a recipient finds out
out-of-band that he was sent a message, which was never received.
When this happens, it's very difficult to track down the problems,
because spam filters rarely log their actions, and even when they do,
the sender isn't in a position to peruse the logs or trace the
messages that were sent.
When someone else's spam filter blocks my mail, it becomes
my problem even though it's someone else's fault.
Recipients have so much misplaced faith in spam filters that rather
than fixing them, they will insist that senders alter their messages
to circumvent them. This has the effect of discouraging seldom-used
(but useful) features of email - the fact that they appear
infrequently in ordinary mail makes messages that contain them appear
to be suspect.
Summary of my beliefs about spam filtering: