Interface Corpus
- All Known Implementing Classes:
SimpleCorpus
public interface Corpus
A corpus is a collection of documents.
-
Method Summary
Modifier and TypeMethodDescriptionintReturns the average size of documents in the corpus.longReturns the number of bigrams in the corpus.bigrams()Returns the iterator over the bigrams in the corpus.intReturns the total frequency of the term in the corpus.intReturns the total frequency of the bigram in the corpus.intdocCount()Returns the number of documents in the corpus.Returns the iterator over the set of documents containing the given term.search(RelevanceRanker ranker, String term) Returns the iterator over the set of documents containing the given term in descending order of relevance.search(RelevanceRanker ranker, String[] terms) Returns the iterator over the set of documents containing (at least one of) the given terms in descending order of relevance.longsize()Returns the number of words in the corpus.intReturns the number of unique terms in the corpus.terms()Returns the iterator over the terms in the corpus.
-
Method Details
-
size
long size()Returns the number of words in the corpus.- Returns:
- the number of words in the corpus.
-
docCount
int docCount()Returns the number of documents in the corpus.- Returns:
- the number of documents in the corpus.
-
termCount
int termCount()Returns the number of unique terms in the corpus.- Returns:
- the number of unique terms in the corpus.
-
bigramCount
long bigramCount()Returns the number of bigrams in the corpus.- Returns:
- the number of bigrams in the corpus.
-
avgDocSize
int avgDocSize()Returns the average size of documents in the corpus.- Returns:
- the average size of documents in the corpus.
-
count
Returns the total frequency of the term in the corpus.- Parameters:
term- the term.- Returns:
- the total frequency of the term in the corpus.
-
count
Returns the total frequency of the bigram in the corpus.- Parameters:
bigram- the bigram.- Returns:
- the total frequency of the bigram in the corpus.
-
terms
-
bigrams
-
search
-
search
Returns the iterator over the set of documents containing the given term in descending order of relevance.- Parameters:
ranker- the relevance ranker.term- the search term.- Returns:
- the iterator of documents in descending order of relevance.
-
search
Returns the iterator over the set of documents containing (at least one of) the given terms in descending order of relevance.- Parameters:
ranker- the relevance ranker.terms- the search terms.- Returns:
- the iterator of documents in descending order of relevance.
-