aboutsummaryrefslogtreecommitdiff
path: root/textproc/p5-Algorithm-RabinKarp/pkg-descr
diff options
context:
space:
mode:
authorAaron Dalton <aaron@FreeBSD.org>2006-05-24 17:47:11 +0000
committerAaron Dalton <aaron@FreeBSD.org>2006-05-24 17:47:11 +0000
commitfb3f2c89204e07f4e0778551ca6ad5b7fccfb3c1 (patch)
treeff25ebd851f98010a86fec15d179b12ccf18894a /textproc/p5-Algorithm-RabinKarp/pkg-descr
parentbd57d8de936cbee750045cc3a9199525df26130f (diff)
downloadports-fb3f2c89204e07f4e0778551ca6ad5b7fccfb3c1.tar.gz
ports-fb3f2c89204e07f4e0778551ca6ad5b7fccfb3c1.zip
Notes
Diffstat (limited to 'textproc/p5-Algorithm-RabinKarp/pkg-descr')
-rw-r--r--textproc/p5-Algorithm-RabinKarp/pkg-descr18
1 files changed, 18 insertions, 0 deletions
diff --git a/textproc/p5-Algorithm-RabinKarp/pkg-descr b/textproc/p5-Algorithm-RabinKarp/pkg-descr
new file mode 100644
index 000000000000..40d892395f70
--- /dev/null
+++ b/textproc/p5-Algorithm-RabinKarp/pkg-descr
@@ -0,0 +1,18 @@
+This is an implementation of Rabin and Karp's streaming hash, as described
+in "Winnowing: Local Algorithms for Document Fingerprinting" by Schleimer,
+Wilkerson, and Aiken. Following the suggestion of Schleimer, I am using
+their second equation:
+
+ $H[ $c[2..$k + 1] ] = (( $H[ $c[1..$k] ] - $c[1] ** $k ) + $c[$k+1] ) * $k
+
+The results of this hash encodes information about the next k values in
+the stream (hense k-gram.) This means for any given stream of length n
+integer values (or characters), you will get back n - k + 1 hash values.
+
+For best results, you will want to create a code generator that filters
+your data to remove all unnecessary information. For example, in a large
+english document, you should probably remove all white space, as well as
+removing all capitalization.
+
+WWW: http://search.cpan.org/dist/Algorithm-RabinKarp
+Author: Norman Nunley <nnunley@gmail.com>