Class SimpleDocument

java.lang.Object
smile.nlp.SimpleDocument
All Implemented Interfaces:
AnchorText, Document, Text

public class SimpleDocument extends Object implements Text, Document, AnchorText
A list-of-words representation of documents.
  • Constructor Details

    • SimpleDocument

      public SimpleDocument(String id, String title, String content, String[] words)
      Constructor.
      Parameters:
      id - the id of document.
      title - the text title.
      content - the text content.
      words - the word list of document.
  • Method Details

    • id

      public String id()
      Description copied from interface: Document
      Returns the id of document, which must be unique in the corpus.
      Specified by:
      id in interface Document
      Returns:
      the id of document.
    • title

      public String title()
      Description copied from interface: Text
      Returns the title of text, if there is one.
      Specified by:
      title in interface Text
      Returns:
      the title of text, if there is one.
    • content

      public String content()
      Description copied from interface: Text
      Returns the text content.
      Specified by:
      content in interface Text
      Returns:
      the text content.
    • size

      public int size()
      Description copied from interface: Document
      Returns the number of words.
      Specified by:
      size in interface Document
      Returns:
      the number of words.
    • words

      public Iterable<String> words()
      Description copied from interface: Document
      Returns the iterator of the words of the document. The stop words and punctuations may be removed.
      Specified by:
      words in interface Document
      Returns:
      the iterator of the words of the document.
    • unique

      public Iterable<String> unique()
      Description copied from interface: Document
      Returns the iterator of unique words.
      Specified by:
      unique in interface Document
      Returns:
      the iterator of unique words.
    • tf

      public int tf(String term)
      Description copied from interface: Document
      Returns the term frequency.
      Specified by:
      tf in interface Document
      Parameters:
      term - the term.
      Returns:
      the term frequency.
    • maxtf

      public int maxtf()
      Description copied from interface: Document
      Returns the maximum term frequency over all terms in the document.
      Specified by:
      maxtf in interface Document
      Returns:
      the maximum term frequency.
    • getAnchor

      public String getAnchor()
      Returns the anchor text if any. The anchor text is the visible, clickable text in a hyperlink. The anchor text is all the anchor text in the corpus pointing to this text.
      Specified by:
      getAnchor in interface AnchorText
      Returns:
      the anchor text.
    • setAnchor

      public SimpleDocument setAnchor(String anchor)
      Sets the anchor text. Note that anchor is all link labels in the corpus pointing to this text. So addAnchor is more appropriate in most cases.
      Specified by:
      setAnchor in interface AnchorText
      Parameters:
      anchor - the anchor text.
      Returns:
      this object.
    • addAnchor

      public SimpleDocument addAnchor(String linkLabel)
      Description copied from interface: AnchorText
      Adds a link label to the anchor text.
      Specified by:
      addAnchor in interface AnchorText
      Parameters:
      linkLabel - the link label.
      Returns:
      this object.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • equals

      public boolean equals(Object obj)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object