The anchor text is the visible, clickable text in a hyperlink.
A corpus is a collection of documents.
The terms in a text.
Bigrams or digrams are groups of two words, and are very commonly used as the basis for simple statistical analysis of text.
An n-gram is a contiguous sequence of n words from a given sequence of text.
An in-memory text corpus.
A list-of-words representation of documents.
A minimal interface of text in the corpus.
A trie, also called digital tree or prefix tree, is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings.