Package smile.llm.llama
Class Tokenizer
java.lang.Object
smile.llm.tokenizer.Tiktoken
smile.llm.llama.Tokenizer
- All Implemented Interfaces:
Tokenizer
Custom tokenizer for Llama 3 models.
-
Field Summary
Fields inherited from class smile.llm.tokenizer.Tiktoken
ranks, specialTokens
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionint[]
encodeDialog
(Message... dialog) Encodes the messages of a dialog.int[]
encodeMessage
(Message message) Encodes a message.static Tokenizer
Loads a llama3 tokenizer model.Methods inherited from class smile.llm.tokenizer.Tiktoken
allowSpecialTokens, decode, encode, encode, isSpecialTokenAllowed, load, tokenize
-
Constructor Details
-
Tokenizer
Constructor with default BOS, EOS, and special tokens.- Parameters:
ranks
- The token to rank map.
-
Tokenizer
Constructor.- Parameters:
ranks
- The token to id map.bos
- beginning of sequence token.eos
- end of sequence token.specialTokens
- Optional special tokens.
-
-
Method Details
-
encodeMessage
Encodes a message.- Parameters:
message
- the message.- Returns:
- the tokens.
-
encodeDialog
Encodes the messages of a dialog.- Parameters:
dialog
- the messages.- Returns:
- the tokens.
-
of
Loads a llama3 tokenizer model.- Parameters:
path
- The llama3 model file path.- Returns:
- a llama3 tokenizer.
- Throws:
IOException
- if fail to load the model.
-