Class bdd.search.EnginePrefs
All Packages Class Hierarchy This Package Previous Next Index
Class bdd.search.EnginePrefs
java.lang.Object
|
+----bdd.search.EnginePrefs
- public class EnginePrefs
- extends Object
Written by Tim Macinta 1997
Distributed under the GNU Public License
(a copy of which is enclosed with the source).
Encapsulates the preferences for the crawler and the search
engine.
-
pause_time
- The time to pause between URL fetches (in seconds).
-
port
-
-
EnginePrefs()
-
-
getEmailAddress()
-
-
getFooterFile()
-
-
getHeaderFile()
-
-
getMainDir()
-
-
getMainIndex()
-
-
getMonitor()
-
-
getNotFoundFile()
-
-
getRulesFile()
- The rules file contains rules which determine what URLs are allowed
and what URLs whould be excluded.
-
getStartingFile()
-
-
getUserAgent()
-
-
getWorkingDir()
- Returns the working directory for use by a crawler.
-
pauseBetweenURLs()
- Pauses for the amount of time that has been specified for pausing
between URL fetches.
-
readRobotsDotText(String, int)
- Reads the "robots.txt" file on the given host and uses the results
to determine what files on "host" are crawlable.
-
readRulesFile()
- Causes the inclusion/exclusion rules to be read.
-
URLAllowed(URL)
- Returns true if "url" is allowed to be indexed and false otherwise.
pause_time
public int pause_time
- The time to pause between URL fetches (in seconds).
port
public static int port
EnginePrefs
public EnginePrefs()
URLAllowed
public boolean URLAllowed(URL url)
- Returns true if "url" is allowed to be indexed and false otherwise.
pauseBetweenURLs
public void pauseBetweenURLs()
- Pauses for the amount of time that has been specified for pausing
between URL fetches.
getMainIndex
public File getMainIndex()
getMainDir
public File getMainDir()
getWorkingDir
public File getWorkingDir()
- Returns the working directory for use by a crawler. If more than
one crawler is running at the same time they should be given different
working directories.
getHeaderFile
public File getHeaderFile()
getFooterFile
public File getFooterFile()
getNotFoundFile
public File getNotFoundFile()
getStartingFile
public File getStartingFile()
getRulesFile
public File getRulesFile()
- The rules file contains rules which determine what URLs are allowed
and what URLs whould be excluded. A line that is in the form:
include http://gsd.mit.edu/
will cause all URLs that start with "http://gsd.mit.edu/" to be
included. Similarly, to exclude URLs, use the keyword "exclude"
instead of "include". Blank lines and lines starting with "#" are
ignored.
When an URL is checked against the inclusion/exclusion rules the
exclusion rules are checked first and if the URL matches an
exclusion rule it is not included. If an URL is not covered by
either rule it is not included, unless it is a "file://" URL in
which case it is included by default.
readRulesFile
public void readRulesFile() throws IOException
- Causes the inclusion/exclusion rules to be read. This method should
be called if the rules file is changed.
readRobotsDotText
public void readRobotsDotText(String host,
int port)
- Reads the "robots.txt" file on the given host and uses the results
to determine what files on "host" are crawlable.
getUserAgent
public String getUserAgent()
getEmailAddress
public String getEmailAddress()
getMonitor
public Monitor getMonitor()
All Packages Class Hierarchy This Package Previous Next Index