Skip to content

Better spam control

May 26, 2012

A fellow I used to work with was telling me that their new e-mail spam conrol system was so much better than the old one that they were getting no spam at all.  Could this really be true?  The new one is a commercial appliance system.

They key question is how to measure the effectiveness of a spam control system.  Each user of this system will have a different impression, of course.  How much spam does it detect seems to be a reasonable measure.  A good one should detect 100% of spam.  That seems simple enough.  In general, a spam control system will be based on a series of rules that attempt to discriminate between spam and legitimate e-mail messages.  Setting all of these to the maximum will result in the highest detection rate.  Of course, this will include some legitimate messages along with the spam.  100% discrimination is impossible.

Maybe we should stand back a bit and ask ourselves what exactly is e-mail spam.  That’s easy.  We all know what spam looks like.  It tells us many times a day that we’ve won millions of dollars in the lottery.  Other spam offers us enhancement products of various kinds.  Those are the easy ones, the ones that everybody recognizes as spam.  They are also easy for an automated system to recognize.

To the recipient, however, spam is really unwanted e-mail messages.  SMTP, the Internet transport for e-mail, allows strangers to send us e-mail.  This is e-mail’s greatest advantage and also its greatest disadvantage.  It provides the opportunity for spam senders to accumulate thousands of e-mail addresses and to send e-mail messages to all of them.  This is the classic type of spam, the one that everybody recognizes because they see them so often.  It’s also possible for someone to add your e-mail address to a mailing list without your consent and start sending to the mailing list.  Is this spam?  Many people would say that it is because the messages are unsolicited and sent in bulk.  On the other hand, they might be exactly what you wanted.  You wouldn’t consider them to be spam at all.  What about a mailing list to which you subscribed.  If you no longer want the messages, has it become spam?  Some people consider it so.  Ultimately, each individual person has their own definition of what is spam and what is not.  A good spam control system has to take individual preference into account.

What sorts of spam detection are possible for an automated system to use?  Content analysis is the first one that comes to mind.  Certain topics and even certain words are obvious indicators of spam, aren’t they?  It depends.  For most people, this is true, but not for everyone.  Bulkiness, or mass sending, is another good indicator of spam.  When millions of copies of the same message are received all over the place, it’s bound to be spam, isn’t it.  Unfortunately, there are lots of exceptions to this rule.  Legitimate mailing lists, to which people have actually subscribed, are one example.  Another is service messages, which are substantially identical because they convey the same information to many people.  Those shouldn’t be detected as spam.  What about e-mail that originates from a computer that’s not an e-mail server?  Spam gangs have captured millions of home computers to form them into remote-controlled networks.  Spam senders use these computers to spew millions of e-mail messages.  This practice has now been stopped by many ISPs by blocking the SMTP port, forcing their clients to authenticate before sending e-mail.  Still, messages that originate from an IP address that’s known to belong to a home computer is a good indication of spam.  But again there are exceptions.  Networks change constantly.  ISPs don’t announce which IP address ranges are used for their clients.

There is no rule that works perfectly.  A combination of rules is better than a single one, but even a combination isn’t perfect.  They will be incorrect for some messages or some people.  Some require knowledge of recipient’s preference.  As well, some rules are not appropriate, depending on the organization.  It’s easy for a spam control system to block all messages that are not in English, for example.  This may be fine for individuals who only speak English, or for organizations that only deal in English.  It can’t be used for others that have people who communicate in many languages.  A similar argument holds for all messages that originate from China, to take another example.  Some organizations will have to loosen their spam rules to accomodate their actual e-mail activity.

It’s unreasonable to expect 100% spam detection from any spam control system.  It’s also unreasonable for a vendor to advertize 100% spam detection.  If they do, you should suspect that they are adjusting their definition of spam to coincide with whatever their system detects.

From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: