aboutsummaryrefslogtreecommitdiff
path: root/mail/spamprobe/files/spamprobe.1
diff options
context:
space:
mode:
Diffstat (limited to 'mail/spamprobe/files/spamprobe.1')
-rw-r--r--mail/spamprobe/files/spamprobe.1321
1 files changed, 0 insertions, 321 deletions
diff --git a/mail/spamprobe/files/spamprobe.1 b/mail/spamprobe/files/spamprobe.1
deleted file mode 100644
index 18a1884d41d7..000000000000
--- a/mail/spamprobe/files/spamprobe.1
+++ /dev/null
@@ -1,321 +0,0 @@
-.\"
-.\" $Id$
-.\"
-.\" Note: The date here should be updated whenever a non-trivial
-.\" change is made to the manual page.
-.Dd September 5, 2002
-.Dt SPAMPROBE 1
-.Os
-.Sh NAME
-.Nm spamprobe
-.Nd "Spam detector using Bayesian analysis of word counts."
-.Sh SYNOPSIS
-.Nm
-.Op Fl a Ar char
-.Op Fl c
-.Op Fl d Ar directory
-.Op Fl h
-.Op Fl H Ar option
-.Op Fl m
-.Op Fl n Ar number
-.Op Fl r Ar number
-.Op Fl s Ar number
-.Op Fl v
-.Op Fl V
-.Op Fl Y
-.Op Fl 7
-.Op Fl 8
-.Ar command Op ...
-.Nm
-.Ar receive Op filename ...
-.Nm
-.Ar score Op filename ...
-.Nm
-.Ar find-spam Op filename ...
-.Nm
-.Ar find-good Op filename ...
-.Nm
-.Ar good Op filename ...
-.Nm
-.Ar spam Op filename ...
-.Nm
-.Ar remove Op filename ...
-.Nm
-.Ar dump
-.Nm
-.Ar export
-.Nm
-.Ar import Op filename ...
-.Sh DESCRIPTION
-Welcome to
-.Nm SpamProbe !
-Are you tired of the constant bombardment of your inbox by unwanted
-email pushing everything from porn to get rich quick schemes? Have you
-tried other spam filters but become disenchanted with them when you
-realized that their manually generated rule sets weren't updated fast
-enough to keep up with spammers wording changes? Or that they generated
-unwanted false positive scores?
-.Pp
-.Nm SpamProbe
-operates on a different basis entirely. Instead of using pattern matching
-and a set of human generated rules
-.Nm SpamProbe
-relies on a Bayesian analysis
-of the frequency of words used in spam and non-spam emails received by an
-individual person. The process is completely automatic and tailors itself
-to the kinds of emails that each person receives.
-.Ss FEATURES
-.Bl -bullet -offset indent -compact
-.It
-Spam detection using Bayesian analysis of terms contained in each email.
-Words used often in spams but not in good email tend to indicate that a
-message is spam.
-.It
-Written in C++ for good performance. Database access using GDBM for quick
-startup and fast term count retrieval.
-.It
-Recognition and decoding of MIME attachments in quoted-printable and
-base64 encoding. Automatically skips non-text attachments.
-.It
-Counts two word phrases as well as single words for higher precision.
-.It
-Ignores HTML tags in emails for scoring purposes unless the -h command
-line option is used. Many spams use HTML and few humans do so HTML tends
-to become a powerful recognizer of spams. However in the author's opinion
-this also substantially increases the likelihood of false positives if
-someone does send a non-spam email containing HTML tags.
-.Nm SpamProbe
-does pull urls from inside of html tags however since those tend to be
-spammer specific.
-.It
-Locks mboxes and databases using fcntl file locking to avoid problems when
-multiple emails arrive simultaneously.
-.It
-Scores only the Received, Subject, To, From, and Cc headers. All other
-headers are ignored to make it hard for spammers to hide non-spammy words
-in X- headers to fool the filter. The
-.Fl H
-command line option can be used to override this.
-.El
-.Ss OPTIONS
-.Bl -tag -width ".Fl d Ar directory"
-.It Fl a Ar char
-By default
-.Nm
-converts non-ascii characters (characters with the most significant bit
-set to 1) into the letter 'z'. This is useful for lumping all Asian
-characters into a single word for easy recognition. The
-.Fl a
-option allows you to change the character to something else if you don't
-like the letter 'z' for some reason.
-.It Fl c
-Create the database directory if it does not already exist. Normally
-.Nm
-exits with a usage error if the database directory does not already exist.
-.It Fl d Ar directory
-By default
-.Nm
-stores its database in a directory named .spamprobe under your home
-directory. The
-.Fl d
-option allows you to specify a different directory to use. This is
-necessary if your home directory is NFS mounted for example.
-.It Fl h
-By default
-.Nm
-removes HTML markup from the text in emails to help avoid false positives.
-The
-.Fl h
-option allows you to override this behavior and force
-.Nm
-to include words from within HTML tags in its word counts. Note that
-.Nm
-always counts any URLs in hrefs within tags whether
-.Fl h
-is used or not. Use of this option is discouraged. It can increase the
-rate of spam detection slightly but unless the user receives a significant
-amount of HTML emails it also tends to increase the number of false
-positives.
-.It Fl H Ar option
-By default
-.Nm
-only scans a meaningful subset of headers from the email message when
-searching for words to score. The
-.Fl H
-option allows the user to specify additional headers to scan. Legal values
-are "all", "nox", or "normal". "all" scans all headers, "nox" scans all
-headers except those starting with X-, and "normal" scans the normal set
-of headers.
-.It Fl m
-Use mbox format for reading emails in receive mode. Normally
-.Nm
-assumes that the input to receive mode contains a single message so it
-doesn't look for message breaks.
-.It Fl n Ar number
-Changes the number of most significant words/phrases used by
-.Nm
-to calculate the score for each message. Generally this is changed only
-for optimization purposes.
-.It Fl r Ar number
-Changes the number of times that a single word/phrase can occurr in the
-top words array used to calculate the score for each message. Allowing
-repeats reduces the number of words overall (since a single word occupies
-more than one slot) but allows words which occur frequently in the message
-to have a higher weight. Generally this is changed only for optimization
-purposes.
-.It Fl s Ar number
-.Nm
-maintains an in memory cache of the words it has seen in previous messages
-to reduce disk i/o and improve performance. By default the cache is
-flushed and cleared every 250 messages. This number can be changed using
-the
-.Fl s
-option. A value of zero causes
-.NM
-to use 100,000 as the limit which effectively means that the cache will
-only be flushed at program exit (unless you have really enormous mailbox
-files). The cache doesn't affect receive, dump, or export but has a
-significant impact on the others.
-.It Fl v
-Write debugging information to stderr. This can be useful for debugging
-or for seeing which terms
-.Nm
-used to score each email.
-.It Fl V
-Prints version and copyright information and then exits.
-.It Fl Y
-Assume traditional Berkeley mailbox format, ignoring any Content-Length:
-fields.
-.It Fl 7
-Ignore any characters with the most significant bit set to 1 instead of
-mapping them to the letter 'z'.
-.It Fl 8
-Store all characters even if their most significant bit is set to 1.
-.El
-.Pp
-.Ss COMMANDS
-.Bl -tag -width ".Ar find-spam Op filename ..."
-.It Ar receive Op filename ...
-Tells
-.Nm
-to read its standard input (or a file specified after the receive command)
-and score it using the current databases. Once the message has been
-scored the message is classified as either spam or non-spam and its word
-counts are written to the appropriate database. The message's score is
-written to stdout along with a single word. For example:
-.Pp
-.Dl "SPAM 0.99"
-.Pp
-or
-.Pp
-.Dl "GOOD 0.02"
-.It Ar score Op filename ...
-Similar to receive except that the databases are not modified in any way
-and only the score is printed to stdout.
-.It Ar find-spam Op filename ...
-Similar to score except that it prints a short summary and score for each
-message that is determined to be spam. This can be useful when testing.
-.It Ar find-good Op filename ...
-Similar to score except that it prints a short summary and score for each
-message that is determined to be good. This can be useful when testing.
-.It Ar good Op filename ...
-Scans each file (or stdin if no file is specified) and reclassifies every
-email in the file as non-spam. The databases are updated appropriately.
-Previously processed messages (recognized using their message ids) are
-ignored.
-.It Ar spam Op filename ...
-Scans each file (or stdin if no file is specified) and reclassifies every
-email in the file as spam. The databases are updated appropriately.
-Previously processed messages (recognized using their message ids) are
-ignored.
-.It Ar remove Op filename ...
-Scans each file (or stdin if no file is specified) and removes its term
-counts from the database. Messages which are not in the database
-(recognized using their message ids) are ignored.
-.It Ar dump
-Prints the contents of the word counts database one word per line in human
-readable format with good count, spam count, and word in columns separated
-by whitespace. Note that when using GDBM for the database the words are
-printed in the order they are hashed so the results will need to be sorted
-to be most useful. The standard unix sort command can do this. For
-example to list all words from "most good" to "least good" use this
-command:
-.Pp
-.Dl "spamprobe dump | sort -k 1 -n -r"
-.Pp
-To list all words from "most spammy" to "least spammy" use this command:
-.Pp
-.Dl "spamprobe dump | sort -k 2 -n -r"
-.It Ar export
-Similar to the dump command but prints the counts and words in a comma
-separated format with the words surrounded by double quotes. This can be
-more useful for importing into some databases.
-.It Ar import Op filename ...
-Reads the specified files which must contain export data written by the
-export command. The terms and counts from this file are added to the
-database. This can be used to convert a database from a prior version.
-.El
-.Sh ENVIRONMENT
-The
-.Nm
-command looks for the database directory in the users home directory
-specified by the
-.Ev HOME
-environment variable. Use the
-.Fl d
-flag to specify a different database directory.
-.Sh FILES
-.Bl -tag -width ".Pa $HOME/. Ns Nm" -compact
-.It Pa $HOME/. Ns Nm
-The default database directory.
-.El
-.Sh EXAMPLES
-Typically one would use
-.Nm
-with
-.Nm procmail
-and
-.Nm formail
-to flag and filter incoming email.
-.Pp
-.Dl "# SpamProbe rule."
-.Dl ":0"
-.Dl "{"
-.Dl " # Generate a score for the message."
-.Dl " SCORE=`spamprobe receive`"
-.Dl " # Add a X-SpamProbe header to the message."
-.Dl " :0 fhW"
-.Dl " | formail -I ""X-SpamProbe: $SCORE"""
-.Dl "}"
-.Pp
-.Dl "# Filter matching messages to their own mailbox."
-.Dl ":0:"
-.Dl "*^X-SpamProbe: SPAM"
-.Dl "spamprobe"
-.Sh DIAGNOSTICS
-Exit status is 0 on success, and 1 if
-.Nm
-encounters an invalid command.
-.Sh COMPATIBILITY
-Version of
-.Nm
-previous to 0.7 use a different database format. To convert your existing
-database to the new format use the following command.
-.Pp
-.Dl "spamprobe-export_0.6 | spamprobe import"
-.Sh SEE ALSO
-.Xr formail 1 ,
-.Xr procmail 1 ,
-.Rs
-.%A "Paul Graham"
-.%T "A Plan for Spam"
-.%O http://www.paulgraham.com/spam.html
-.%D "August 2002"
-.Re
-.Sh AUTHORS
-This
-manual page was written by
-.An Matthew N. Dodd Aq mdodd@FreeBSD.org .
-.Nm
-was written by
-.An Brian Burton Aq bburton@users.sourceforge.net