tfidf

fun tfidf(tf: Double, maxtf: Double, n: Int, df: Int): Double

TF-IDF relevance score between a term and a document based on a corpus.

Parameters

tf

the frequency of searching term in the document to rank.

maxtf

the maximum frequency over all terms in the document.

n

the number of documents in the corpus.

df

the number of documents containing the given term in the corpus.


Converts a corpus to TF-IDF feature vectors, which are normalized to L2 norm 1.

Return

a matrix of which each row is the TF-IDF feature vector.

Parameters

corpus

the corpus of documents in bag-of-words representation.


Converts a bag of words to a feature vector by TF-IDF, which is normalized to L2 norm 1.

Return

TF-IDF feature vector

Parameters

bag

the bag-of-words feature vector of a document.

n

the number of documents in training corpus.

df

the number of documents containing the given term in the corpus.