Record Class NGram

Record Components:
words - the word sequence.
count - the total number of occurrences of n-gram in the corpus.
All Implemented Interfaces:
Comparable<NGram>

public record NGram(String[] words, int count) extends Record implements Comparable<NGram>
An n-gram is a contiguous sequence of n words from a given sequence of text. An n-gram of size 1 is referred to as a unigram; size 2 is a bigram; size 3 is a trigram.
  • Constructor Summary

    Constructors
    Constructor
    Description
    NGram(String[] words)
    Constructor.
    NGram(String[] words, int count)
    Creates an instance of a NGram record class.
  • Method Summary

    Modifier and Type
    Method
    Description
    static NGram[][]
    apriori(Collection<String[]> sentences, int maxNGramSize, int minFrequency)
    Extracts n-gram phrases by an Apriori-like algorithm.
    int
     
    int
    Returns the value of the count record component.
    boolean
    Indicates whether some other object is "equal to" this one.
    int
    Returns a hash code value for this object.
    Returns a string representation of this record class.
    Returns the value of the words record component.

    Methods inherited from class Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • NGram

      public NGram(String[] words)
      Constructor.
      Parameters:
      words - the n-gram word sequence.
    • NGram

      public NGram(String[] words, int count)
      Creates an instance of a NGram record class.
      Parameters:
      words - the value for the words record component
      count - the value for the count record component
  • Method Details

    • toString

      public String toString()
      Returns a string representation of this record class. The representation contains the name of the class, followed by the name and value of each of the record components.
      Specified by:
      toString in class Record
      Returns:
      a string representation of this object
    • hashCode

      public int hashCode()
      Returns a hash code value for this object. The value is derived from the hash code of each of the record components.
      Specified by:
      hashCode in class Record
      Returns:
      a hash code value for this object
    • equals

      public boolean equals(Object obj)
      Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. Reference components are compared with Objects::equals(Object,Object); primitive components are compared with the compare method from their corresponding wrapper classes.
      Specified by:
      equals in class Record
      Parameters:
      obj - the object with which to compare
      Returns:
      true if this object is the same as the obj argument; false otherwise.
    • compareTo

      public int compareTo(NGram o)
      Specified by:
      compareTo in interface Comparable<NGram>
    • apriori

      public static NGram[][] apriori(Collection<String[]> sentences, int maxNGramSize, int minFrequency)
      Extracts n-gram phrases by an Apriori-like algorithm. The algorithm was proposed in "A Study Using n-gram Features for Text Categorization" by Johannes Furnkranz.

      The algorithm takes a collection of sentences and generates all n-grams of length at most maxNGramSize that occur at least minFrequency times in the sentences.

      Parameters:
      sentences - A collection of sentences (already split).
      maxNGramSize - The maximum length of n-gram
      minFrequency - The minimum frequency of n-gram in the sentences.
      Returns:
      An array of n-gram sets. The i-th entry is the set of i-grams.
    • words

      public String[] words()
      Returns the value of the words record component.
      Returns:
      the value of the words record component
    • count

      public int count()
      Returns the value of the count record component.
      Returns:
      the value of the count record component