Class bdd.search.spider.Crawler
All Packages Class Hierarchy This Package Previous Next Index
Class bdd.search.spider.Crawler
java.lang.Object
|
+----java.lang.Thread
|
+----bdd.search.spider.Crawler
- public class Crawler
- extends Thread
Written by Tim Macinta 1997
Distributed under the GNU Public License
(a copy of which is enclosed with the source).
Calling the Crawler's start() method will cause the Crawler to
index all of the sites in its queue and then replace the main
index with the updated index when it completes. The Crawler's
queue should be filled with the starting URLs before calling
start().
-
Crawler(File, EnginePrefs)
- "working_dir" should be a directory that only this
Crawler and a given Indexer will be
accessing.
-
addURL(URL)
- Takes "url_to_queue" and adds it to this Crawler's queue of URLs.
-
main(File, EnginePrefs)
-
-
main(File, EnginePrefs, boolean)
-
-
main(String[])
- This is the method that is called when this class is invoked from
the command line.
-
run()
- This is where the actual crawling occurs.
Crawler
public Crawler(File working_dir,
EnginePrefs eng_prefs)
- "working_dir" should be a directory that only this
Crawler and a given Indexer will be
accessing. This means that if several Crawlers are running
simultaneously, they should all be given different "working_dir"
directories. Also, no other threads should write to this
directory (except for the selected Indexer).
addURL
public void addURL(URL url_to_queue)
- Takes "url_to_queue" and adds it to this Crawler's queue of URLs.
This method should be used to add all of the desired starting URLs to
the queue before the Crawler is started. If the URL has already
been processed or if it is an unallowed URL it is not added.
run
public void run()
- This is where the actual crawling occurs.
- Overrides:
- run in class Thread
main
public static void main(String arg[])
- This is the method that is called when this class is invoked from
the command line. calling this method will cause a Crawler to be
created and started with the starting URLs being listed in a file
specified by the first argument (arg[0]). The file listing the URLs
should contain only the URLs with each URL on a line by itself. Blank
lines are allowed and lines beginning with "#" are considered comments
and are ignored.
main
public static void main(File file,
EnginePrefs prefs)
main
public static void main(File file,
EnginePrefs prefs,
boolean exit)
All Packages Class Hierarchy This Package Previous Next Index