Running the BDDBot

For your convenience there are two ways to start the BDDBot. You can start the BDDBot using a graphical interface, which makes for very intuitive interaction. You can also start the individual components from the command line, which means that you can start the BDDBot remotely via a telnet session, through cron for scheduled crawling, or through a cgi interface.

Please note that you should never run more than one Crawler at the same time because the Crawler needs exclusive access to a particular temporary directory ("searchtmp").

Graphical Monitor

The easiest method to use in order to run the BDDBot is to use the graphical interface method. You can start the graphical interface with the command


This will pop up a window that looks like figure 2 (in the book), which will allow you to control the BDDBot. It will also start up the stand alone web server for search queries, although you won't have any visual queues for this.

Once you have started the monitor, the top list allows you to watch queries that are made to the custom web server. Only the most recent 200 or so queries are shown (older queries are removed so that the list doesn't explode).

Unchecking the check boxes will temporarily turn keep the lists from growing.

The bottom two lists show the URLs that have been indexed and the URLs that produced errors when indexing was attempted.

The buttons clear their respective lists.

The button labeled "Start Crawler" causes the crawler to re-index your site using the file called "urls.txt" in your "searchdb" directory.

Command Line

The alternative method to control the BDDBot is from the command line. This method does not provide logging and it's not as easy as the graphical monitor, but it can be run without any windowing system (e.g., in case you're in a telnet session) and the web server can be set up to start at system start up and the crawler can be scheduled via cron.

To start the web server


To start the crawler

	java starting-urls

where you should replace starting-urls with the name of the file that contains the starting URLs (this will probably be "searchdb/urls.txt" or "searchdb\urls.txt).