bigram

fun bigram(k: Int, minFreq: Int, text: List<String>): Array<Bigram>

Identify bigram collocations (words that often appear consecutively) within corpora. They may also be used to find other associations between word occurrences.

Finding collocations requires first calculating the frequencies of words and their appearance in the context of other words. Often the collection of words will then requiring filtering to only retain useful content terms. Each n-gram of words may then be scored according to some association measure, in order to determine the relative likelihood of each n-gram being a collocation.

Return

significant bigram collocations in descending order of likelihood ratio.

Parameters

k

finds top k bigram.

minFreq

the minimum frequency of collocation.

text

input text.


fun bigram(p: Double, minFreq: Int, text: List<String>): Array<Bigram>

Identify bigram collocations whose p-value is less than the given threshold.

Return

significant bigram collocations in descending order of likelihood ratio.

Parameters

p

the p-value threshold

minFreq

the minimum frequency of collocation.

text

input text.