diff options
author | Erwin Lansing <erwin@FreeBSD.org> | 2005-07-18 11:37:53 +0000 |
---|---|---|
committer | Erwin Lansing <erwin@FreeBSD.org> | 2005-07-18 11:37:53 +0000 |
commit | 16c825a5ea161487437650256316aa1d6584f21c (patch) | |
tree | 2825d2695d5e6bf796029c895ec79ec8b2fdda6b /textproc/p5-Search-VectorSpace/pkg-descr | |
parent | 8e5bb0a052360963090a5e73ff6f40c5a3081736 (diff) |
Notes
Diffstat (limited to 'textproc/p5-Search-VectorSpace/pkg-descr')
-rw-r--r-- | textproc/p5-Search-VectorSpace/pkg-descr | 12 |
1 files changed, 12 insertions, 0 deletions
diff --git a/textproc/p5-Search-VectorSpace/pkg-descr b/textproc/p5-Search-VectorSpace/pkg-descr new file mode 100644 index 000000000000..0d30dae3d834 --- /dev/null +++ b/textproc/p5-Search-VectorSpace/pkg-descr @@ -0,0 +1,12 @@ +This module takes a list of documents (in English) and +builds a simple in-memory search engine using a vector +space model. Documents are stored as PDL objects, and +after the initial indexing phase, the search should be +very fast. This implementation applies a rudimentary +stop list to filter out very common words, and uses a +cosine measure to calculate document similarity. +All documents above a user-configurable similarity +threshold are returned. + +Author: Maciej Ceglowski <maciej AT ceglowski.com> +WWW: http://search.cpan.org/dist/Search-VectorSpace/ |