Package smile.nlp

Class SimpleText

java.lang.Object
smile.nlp.Text
smile.nlp.SimpleText
All Implemented Interfaces:
AnchorText, TextTerms

public class SimpleText extends Text implements TextTerms, AnchorText
A list-of-words representation of documents.
  • Constructor Details

    • SimpleText

      public SimpleText(String id, String title, String body, String[] words)
      Constructor.
      Parameters:
      id - the id of document.
      title - the title of document.
      body - the text body of document.
      words - the word list of document.
  • Method Details

    • size

      public int size()
      Description copied from interface: TextTerms
      Returns the number of words.
      Specified by:
      size in interface TextTerms
      Returns:
      the number of words.
    • words

      public Iterable<String> words()
      Description copied from interface: TextTerms
      Returns the iterator of the words of the document. The stop words and punctuations may be removed.
      Specified by:
      words in interface TextTerms
      Returns:
      the iterator of the words of the document.
    • unique

      public Iterable<String> unique()
      Description copied from interface: TextTerms
      Returns the iterator of unique words.
      Specified by:
      unique in interface TextTerms
      Returns:
      the iterator of unique words.
    • tf

      public int tf(String term)
      Description copied from interface: TextTerms
      Returns the term frequency.
      Specified by:
      tf in interface TextTerms
      Parameters:
      term - the term.
      Returns:
      the term frequency.
    • maxtf

      public int maxtf()
      Description copied from interface: TextTerms
      Returns the maximum term frequency over all terms in the document.
      Specified by:
      maxtf in interface TextTerms
      Returns:
      the maximum term frequency.
    • getAnchor

      public String getAnchor()
      Returns the anchor text if any. The anchor text is the visible, clickable text in a hyperlink. The anchor text is all the anchor text in the corpus pointing to this text.
      Specified by:
      getAnchor in interface AnchorText
      Returns:
      the anchor text.
    • setAnchor

      public SimpleText setAnchor(String anchor)
      Sets the anchor text. Note that anchor is all link labels in the corpus pointing to this text. So addAnchor is more appropriate in most cases.
      Specified by:
      setAnchor in interface AnchorText
      Parameters:
      anchor - the anchor text.
      Returns:
      this object.
    • addAnchor

      public SimpleText addAnchor(String linkLabel)
      Description copied from interface: AnchorText
      Adds a link label to the anchor text.
      Specified by:
      addAnchor in interface AnchorText
      Parameters:
      linkLabel - the link label.
      Returns:
      this object.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • equals

      public boolean equals(Object obj)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object