Class bdd.search.spider.WordExtractor
All Packages Class Hierarchy This Package Previous Next Index
Class bdd.search.spider.WordExtractor
java.lang.Object
|
+----bdd.search.spider.WordExtractor
- public class WordExtractor
- extends Object
Written by Tim Macinta 1997
Distributed under the GNU Public License
(a copy of which is enclosed with the source).
A WordExtractor should be able to extract the words from
a given file. This class should be subclassed by classes which
understand different document types.
-
WordExtractor()
-
-
addWord(String)
- Used internally to add a word to the list of words as they are found
in the document.
-
countOccurances(String)
- Returns a count of the number of times that "word" appears in the
the document.
-
countWords()
- Returns the number of words in this document.
-
firstOccurance(String)
- Returns the index of "word".
-
getWords()
- Returns an Enumeration that returns each word in the document in
no particular order.
WordExtractor
public WordExtractor()
getWords
public Enumeration getWords()
- Returns an Enumeration that returns each word in the document in
no particular order. A word is returned once at most regardless of
the number of times it appears in the document. The Enumeration
returns a String for each call to nextElement().
countWords
public int countWords()
- Returns the number of words in this document.
countOccurances
public int countOccurances(String word)
- Returns a count of the number of times that "word" appears in the
the document.
firstOccurance
public int firstOccurance(String word)
- Returns the index of "word". The index is determined by counting
the words in the document until the first occurance of "word" is
found. For instance, firstOccurance("the") would return 5 if the
document started like this "Once upon a time the giant tomato of...".
Returns -1 if the word is not in the document.
addWord
public void addWord(String word)
- Used internally to add a word to the list of words as they are found
in the document.
All Packages Class Hierarchy This Package Previous Next Index