Class Bigram

java.lang.Object
smile.nlp.Bigram
smile.nlp.collocation.Bigram
All Implemented Interfaces:
Comparable<Bigram>

public class Bigram extends Bigram implements Comparable<Bigram>
Collocations are expressions of multiple words which commonly co-occur. A bigram collocation is a pair of words w1 w2 that appear together with statistically significance.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    final int
    The frequency of bigram in the corpus.
    final double
    The chi-square statistical score of the collocation.

    Fields inherited from class smile.nlp.Bigram

    w1, w2
  • Constructor Summary

    Constructors
    Constructor
    Description
    Bigram(String w1, String w2, int count, double score)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    int
     
    static Bigram[]
    of(Corpus corpus, double p, int minFrequency)
    Finds bigram collocations in the given corpus whose p-value is less than the given threshold.
    static Bigram[]
    of(Corpus corpus, int k, int minFrequency)
    Finds top k bigram collocations in the given corpus.
     

    Methods inherited from class smile.nlp.Bigram

    equals, hashCode

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait
  • Field Details

    • count

      public final int count
      The frequency of bigram in the corpus.
    • score

      public final double score
      The chi-square statistical score of the collocation.
  • Constructor Details

    • Bigram

      public Bigram(String w1, String w2, int count, double score)
      Constructor.
      Parameters:
      w1 - the first word of bigram.
      w2 - the second word of bigram.
      count - the frequency of bigram in the corpus.
      score - the chi-square statistical score of collocation in a corpus.
  • Method Details

    • toString

      public String toString()
      Overrides:
      toString in class Bigram
    • compareTo

      public int compareTo(Bigram o)
      Specified by:
      compareTo in interface Comparable<Bigram>
    • of

      public static Bigram[] of(Corpus corpus, int k, int minFrequency)
      Finds top k bigram collocations in the given corpus.
      Parameters:
      corpus - the corpus.
      k - the top k bigram to compute.
      minFrequency - The minimum frequency of bigram in the corpus.
      Returns:
      the significant bigram collocations in the descending order of likelihood ratio.
    • of

      public static Bigram[] of(Corpus corpus, double p, int minFrequency)
      Finds bigram collocations in the given corpus whose p-value is less than the given threshold.
      Parameters:
      corpus - the corpus.
      p - the p-value threshold
      minFrequency - The minimum frequency of bigram in the corpus.
      Returns:
      the significant bigram collocations in descending order of likelihood ratio.