Class bdd.search.spider.URLStatus

All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class bdd.search.spider.URLStatus

java.lang.Object
   |
   +----bdd.search.spider.URLStatus

public class URLStatus
extends Object

Written by Tim Macinta 1997
Distributed under the GNU Public License (a copy of which is enclosed with the source).

This class holds information about the content at a particular URL. It can also be used to fetch and parse an URL.

URLStatus(URL, File, EnginePrefs): "url" is the location of the information and "temp_file" is the temporary file that can be used to store the contents of this url.

dumpToDatabase(DataOutputStream): Creates a database containing just this URL.
finalize(): Gets rid of the temporary file.
getCacheFile(): Returns the file that is used to cache the contents of this URL.
getLinkExtractor(): Returns a LinkExtractor that can handle this URL's mime type.
getWordExtractor(): Returns a WordExtractor that can handle this URL's mime type.
loaded(): Returns true if and only if this URL was loaded without an error.
mimeTypeUnderstood(String): Returns true if and only if this mime type can be processed.
moved(): Returns true if and only if this URL causes a redirection.
readContent(): Downloads the content of the given URL and stores it in a temporary cache file.

URLStatus

  public URLStatus(URL url,
                   File temp_file,
                   EnginePrefs eng_prefs)

"url" is the location of the information and "temp_file" is the temporary file that can be used to store the contents of this url.

loaded

  public boolean loaded()

Returns true if and only if this URL was loaded without an error.

dumpToDatabase

  public void dumpToDatabase(DataOutputStream out) throws IOException

Creates a database containing just this URL.

getWordExtractor

  public WordExtractor getWordExtractor() throws IOException

Returns a WordExtractor that can handle this URL's mime type. To add support for new mime types add a WordExtractor that handles those mime types here and add appropriate LinkExtractors to the getLinkExtractor() method. Also, add the mime type to the list in the mimeTypeUnderstood() method.

getLinkExtractor

  public LinkExtractor getLinkExtractor() throws IOException

Returns a LinkExtractor that can handle this URL's mime type. To add support for new mime types add a LinkExtractor that handles those mime types here and add appropriate WordExtractors to the getWordExtractor() method. Also, add the mime type to the list in the mimeTypeUnderstood() method.

mimeTypeUnderstood

  public boolean mimeTypeUnderstood(String mime_type)

Returns true if and only if this mime type can be processed.

getCacheFile

  public File getCacheFile()

Returns the file that is used to cache the contents of this URL.

readContent

  public void readContent()

Downloads the content of the given URL and stores it in a temporary cache file.

finalize

  public void finalize() throws Throwable

Gets rid of the temporary file.

Overrides:: finalize in class Object

moved

  public boolean moved()

Returns true if and only if this URL causes a redirection.

All Packages  Class Hierarchy  This Package  Previous  Next  Index