From: Aron Roberts (aron_at_socrates.berkeley.edu)
Date: Mon Jul 14 2003 - 15:55:40 PDT
In the message "Re: [Micronet] spam", dated 2003-07-14, Ford Chiang wrote:
>My solution to spam these days is to use the new Mozilla 1.4 for
>email. It's got one of those much talked about Bayesian spam filters
>that "learns" your preferences. It starts off with a clean slate,
>and then when spam comes in, you tag it as spam. You also tell it
>what's NOT spam (ham). After it's been trained for a while, it works
>pretty well. ... I've had very few false positives (which it has
>correctly learned subsequently) and it's much better than just
>regular filter rules (filters still exist if you want to use them).
Netscape 7.1, which is based on Mozilla 1.4, includes this same
filtering capability in its mail component. Eudora 6, currently in
late beta, will also include a Bayesian-based spam filter. In
addition, third-party plug-ins or mail proxies using similar
techniques are available for other e-mail clients, such as Microsoft
Outlook.
The spam filtering components in these products require that you
manually identify spam messages, either for a while or on an ongoing
basis. Over time, the product essentially "learns" what frequencies
of words and certain other message characteristics are more typically
found in your 'good' messages and which are more commonly found in
your spam, and can use this data when attempting to automatically
detect incoming spam messages.
This training is specific to your unique set of incoming messages:
the types of messages sent to you by your campus correspondents,
colleagues at other institutions, friends and family, newsletters and
other mailing lists, and the like, as well as the types of spam you
receive. Also, because these frequencies may change over time -- for
instance, as spammers may change their techniques when crafting their
messages in the future -- you may need to continue this training.
In contrast, the server-side product used to tag probable spam
messages on Socrates (and in the near future, UCLink), ActiveState's
PureMessage 3.0.2 with the Anti-Spam plugin,
<http://www.activestate.com/Products/PureMessage/>, is currently
using rule-based filtering.
Messages are identified as spam or non-spam at the server side,
without requiring any effort on the part of individual users.
PureMessage currently uses a set of approximately 750 filter rules,
which are updated periodically, plus a small number of rules we've
added here at UC Berkeley. Each rule contributes a weighted value
toward scores which are used to calculate an overall spam probability.
Both of these techniques have their place, and the most effective
approach is likely to combine them: to have probable spam identified
in a first pass at your e-mail server using tools such as PureMessage
or Spam Assassin (which is used by a number of campus departments'
e-mail systems), in combination with static rule-based and/or
Bayesian-based spam filters that you customize in your desktop e-mail
client.
Aron Roberts
Workstation Software Support Group
P.S. The 4.0.1 release of PureMessage, which just came out in
mid-June, now for the first time includes a Bayes engine that can be
manually trained by the product's administrator by feeding it
archives of messages that have been incorrectly identified as spam or
non-spam. This generates a set of Bayesian-based rules that
supplement -- and presumably fine tune -- the standard filter rules.
It remains to be seen how effective this approach may be, and a
campus implementation, if any, will likely occur at some future,
unspecified date.
------------------------------------------------------------------------
The following was automatically added to this message by the list server:
For information about Micronet, including subscribing to
or unsubscribing from its mailing list and finding out
about upcoming meetings, please visit the Micronet Web site:
<http://micronet.berkeley.edu/>.
This archive was generated by hypermail 2.1.5 : Mon Jul 14 2003 - 16:01:59 PDT