diff options
author | David W. Chapman Jr. <dwcjr@FreeBSD.org> | 2001-06-23 16:09:53 +0000 |
---|---|---|
committer | David W. Chapman Jr. <dwcjr@FreeBSD.org> | 2001-06-23 16:09:53 +0000 |
commit | 302ddeb73e4485411c6ccb4dbc8d317879222dba (patch) | |
tree | 821fd4b1cec4d66d83d4ac4c891d8fc73127f184 /www/crawl/pkg-descr | |
parent | cb1d95a5762937e7d8a54e4d881ec7d453e843c0 (diff) |
Notes
Diffstat (limited to 'www/crawl/pkg-descr')
-rw-r--r-- | www/crawl/pkg-descr | 23 |
1 files changed, 23 insertions, 0 deletions
diff --git a/www/crawl/pkg-descr b/www/crawl/pkg-descr new file mode 100644 index 000000000000..96361c6087a2 --- /dev/null +++ b/www/crawl/pkg-descr @@ -0,0 +1,23 @@ +The crawl utility starts a depth-first traversal of the web at the +specified URLs. It stores all JPEG images that match the configured +constraints. Crawl is fairly fast and allows for graceful termination. +After terminating crawl, it is possible to restart it at exactly +the same spot where it was terminated. Crawl keeps a persistent +database that allows multiple crawls without revisiting sites. + +The main reason for writing crawl was the lack of simple open source +web crawlers. Crawl is only a few thousand lines of code and fairly +easy to debug and customize. + +Some of the main features: + - Saves encountered JPEG images + - Image selection based on regular expressions and size contrainsts + - Resume previous crawl after graceful termination + - Persistent database of visited URLs + - Very small and efficient code + - Supports robots.txt + +WWW: http://www.monkey.org/~provos/crawl/ + +- Pete +petef@databits.net |