author: David W. Chapman Jr. <dwcjr@FreeBSD.org> 2001-06-23 16:09:53 +0000
committer: David W. Chapman Jr. <dwcjr@FreeBSD.org> 2001-06-23 16:09:53 +0000
commit: 302ddeb73e4485411c6ccb4dbc8d317879222dba (patch)
tree: 821fd4b1cec4d66d83d4ac4c891d8fc73127f184 /www/crawl/pkg-descr
parent: cb1d95a5762937e7d8a54e4d881ec7d453e843c0 (diff)
1 files changed, 23 insertions, 0 deletions
diff --git a/www/crawl/pkg-descr b/www/crawl/pkg-descr
new file mode 100644
index 000000000000..96361c6087a2
--- /dev/null
+++ b/www/crawl/pkg-descr
@@ -0,0 +1,23 @@
+The crawl utility starts a depth-first traversal of the web at the
+specified URLs. It stores all JPEG images that match the configured
+constraints.  Crawl is fairly fast and allows for graceful termination.
+After terminating crawl, it is possible to restart it at exactly
+the same spot where it was terminated. Crawl keeps a persistent
+database that allows multiple crawls without revisiting sites.
+
+The main reason for writing crawl was the lack of simple open source
+web crawlers. Crawl is only a few thousand lines of code and fairly
+easy to debug and customize.
+
+Some of the main features:
+ - Saves encountered JPEG images
+ - Image selection based on regular expressions and size contrainsts
+ - Resume previous crawl after graceful termination
+ - Persistent database of visited URLs
+ - Very small and efficient code
+ - Supports robots.txt
+
+WWW: http://www.monkey.org/~provos/crawl/
+
+- Pete
+petef@databits.net
author	David W. Chapman Jr. <dwcjr@FreeBSD.org>	2001-06-23 16:09:53 +0000
committer	David W. Chapman Jr. <dwcjr@FreeBSD.org>	2001-06-23 16:09:53 +0000
commit	302ddeb73e4485411c6ccb4dbc8d317879222dba (patch)
tree	821fd4b1cec4d66d83d4ac4c891d8fc73127f184 /www/crawl/pkg-descr
parent	cb1d95a5762937e7d8a54e4d881ec7d453e843c0 (diff)