Interface Document

All Superinterfaces:
AnchorText, Text
All Known Implementing Classes:
SimpleDocument

public interface Document extends Text, AnchorText
The terms in a text.
  • Method Details

    • id

      String id()
      Returns the id of document, which must be unique in the corpus.
      Returns:
      the id of document.
    • size

      int size()
      Returns the number of words.
      Returns:
      the number of words.
    • words

      Iterable<String> words()
      Returns the iterator of the words of the document. The stop words and punctuations may be removed.
      Returns:
      the iterator of the words of the document.
    • unique

      Iterable<String> unique()
      Returns the iterator of unique words.
      Returns:
      the iterator of unique words.
    • tf

      int tf(String term)
      Returns the term frequency.
      Parameters:
      term - the term.
      Returns:
      the term frequency.
    • maxtf

      int maxtf()
      Returns the maximum term frequency over all terms in the document.
      Returns:
      the maximum term frequency.