Class bdd.search.EnginePrefs

All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class bdd.search.EnginePrefs

java.lang.Object
   |
   +----bdd.search.EnginePrefs

public class EnginePrefs
extends Object

Written by Tim Macinta 1997
Distributed under the GNU Public License (a copy of which is enclosed with the source).

Encapsulates the preferences for the crawler and the search engine.

pause_time: The time to pause between URL fetches (in seconds).
port

EnginePrefs()

getEmailAddress()
getFooterFile()
getHeaderFile()
getMainDir()
getMainIndex()
getMonitor()
getNotFoundFile()
getRulesFile(): The rules file contains rules which determine what URLs are allowed and what URLs whould be excluded.
getStartingFile()
getUserAgent()
getWorkingDir(): Returns the working directory for use by a crawler.
pauseBetweenURLs(): Pauses for the amount of time that has been specified for pausing between URL fetches.
readRobotsDotText(String, int): Reads the "robots.txt" file on the given host and uses the results to determine what files on "host" are crawlable.
readRulesFile(): Causes the inclusion/exclusion rules to be read.
URLAllowed(URL): Returns true if "url" is allowed to be indexed and false otherwise.

pause_time

  public int pause_time

The time to pause between URL fetches (in seconds).

port

  public static int port

EnginePrefs

  public EnginePrefs()

URLAllowed

  public boolean URLAllowed(URL url)

Returns true if "url" is allowed to be indexed and false otherwise.

pauseBetweenURLs

  public void pauseBetweenURLs()

Pauses for the amount of time that has been specified for pausing between URL fetches.

getMainIndex

  public File getMainIndex()

getMainDir

  public File getMainDir()

getWorkingDir

  public File getWorkingDir()

Returns the working directory for use by a crawler. If more than one crawler is running at the same time they should be given different working directories.

getHeaderFile

  public File getHeaderFile()

getFooterFile

  public File getFooterFile()

getNotFoundFile

  public File getNotFoundFile()

getStartingFile

  public File getStartingFile()

getRulesFile

  public File getRulesFile()

The rules file contains rules which determine what URLs are allowed and what URLs whould be excluded. A line that is in the form:

 include http://gsd.mit.edu/

will cause all URLs that start with "http://gsd.mit.edu/" to be included. Similarly, to exclude URLs, use the keyword "exclude" instead of "include". Blank lines and lines starting with "#" are ignored.

When an URL is checked against the inclusion/exclusion rules the exclusion rules are checked first and if the URL matches an exclusion rule it is not included. If an URL is not covered by either rule it is not included, unless it is a "file://" URL in which case it is included by default.

readRulesFile

  public void readRulesFile() throws IOException

Causes the inclusion/exclusion rules to be read. This method should be called if the rules file is changed.

readRobotsDotText

  public void readRobotsDotText(String host,
                                int port)

Reads the "robots.txt" file on the given host and uses the results to determine what files on "host" are crawlable.

getUserAgent

  public String getUserAgent()

getEmailAddress

  public String getEmailAddress()

getMonitor

  public Monitor getMonitor()

All Packages  Class Hierarchy  This Package  Previous  Next  Index