james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Means" <cme...@intfar.com>
Subject SPAM filter based on Bayesian principals
Date Fri, 13 Sep 2002 04:41:28 GMT
Hi,

I've finally finished my first pass at a set of SPAM "blocking" routines,
based on Paul Grahams' "A Plan for Spam" which can be found at
http://www.paulgraham.com/spam.html.

Included is a mailet for setting a message header based upon whether the
message appears to be spam or not.  (Just a simple Yes or No for now.)

A mailet to feed ham/spam to the "corpus" by sending the message to a
specific email address (one for spam, one for ham).

The core classes that can be used outside of James to perform bulk updates
of the corpus, or for use with other systems.

I've (hopefully) attached all the source necessary (no it doesn't have the
official formatting, and I've not put in all the nice documentation, but
it's a start) to try it out on your own system.

This version is built with MySQL in mind as the backend, however, it's
designed to easily allow for a different mechanism for persisting the
statistics.  Let me know if you need help implementing this...

Note, this hasn't been put under much of a load for testing (consider it
Alpha code), and there's probably a big bottle-neck in the
JDBCBayesianAnalysisFeeder mailet as it immediately updates the SQL backend
with new statistics...which may not be a good idea...

See the readme.txt in the ZIP for a bit more info...

Please let me know if you try this out, and what problems (if any) you run
into...especially if you find any bugs <g> as I may have translated the
Bayesian routines incorrectly.

This was designed and developed against James v2.0a3, and JDK 1.3.

I'll let this sit for a while and provide updates once I get some feedback.

Thanks.

-Chris

Mime
View raw message