Package smile.llm.llama
Class Llama
java.lang.Object
smile.llm.llama.Llama
LLaMA model specification.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic Llama
Builds a Llama instance by initializing and loading a model checkpoint.static Llama
build
(String checkpointDir, String tokenizerPath, int maxBatchSize, int maxSeqLen, Integer deviceId) Builds a Llama instance by initializing and loading a model checkpoint.chat
(Message[][] dialogs, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher) Generates assistant responses for a list of conversational dialogs.complete
(String[] prompts, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher) Performs text completion for a list of promptsfamily()
Returns the model family name.generate
(int[][] prompts, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher) Generates text sequences based on provided prompts.name()
Returns the model instance name.toString()
-
Constructor Details
-
Llama
Constructor.- Parameters:
name
- the model name.model
- the transformer model.tokenizer
- the tokenizer.
-
-
Method Details
-
toString
-
family
Returns the model family name.- Returns:
- the model family name.
-
name
Returns the model instance name.- Returns:
- the model instance name.
-
build
public static Llama build(String checkpointDir, String tokenizerPath, int maxBatchSize, int maxSeqLen) throws IOException Builds a Llama instance by initializing and loading a model checkpoint.- Parameters:
checkpointDir
- the directory path of checkpoint files.tokenizerPath
- the path of tokenizer model file.maxBatchSize
- the maximum batch size for inference.maxSeqLen
- the maximum sequence length for input text.- Returns:
- an instance of Llama model.
- Throws:
IOException
-
build
public static Llama build(String checkpointDir, String tokenizerPath, int maxBatchSize, int maxSeqLen, Integer deviceId) throws IOException Builds a Llama instance by initializing and loading a model checkpoint.- Parameters:
checkpointDir
- the directory path of checkpoint files.tokenizerPath
- the path of tokenizer model file.maxBatchSize
- the maximum batch size for inference.maxSeqLen
- the maximum sequence length for input text.deviceId
- the optional CUDA device ID.- Returns:
- an instance of Llama model.
- Throws:
IOException
-
generate
public CompletionPrediction[] generate(int[][] prompts, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher) Generates text sequences based on provided prompts. This method uses the provided prompts as a basis for generating text. It employs nucleus sampling to produce text with controlled randomness.- Parameters:
prompts
- List of tokenized prompts, where each prompt is represented as a list of integers.maxGenLen
- Maximum length of the generated text sequence.temperature
- Temperature value for controlling randomness in sampling.topp
- Top-p probability threshold for nucleus sampling.logprobs
- Flag indicating whether to compute token log probabilities.seed
- the optional random number generation seed to sample deterministically.publisher
- an optional flow publisher that asynchronously issues generated chunks. The batch size must be 1.- Returns:
- The generated text completion.
-
complete
public CompletionPrediction[] complete(String[] prompts, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher) Performs text completion for a list of prompts- Parameters:
prompts
- List of text prompts.maxGenLen
- Maximum length of the generated text sequence.temperature
- Temperature value for controlling randomness in sampling.topp
- Top-p probability threshold for nucleus sampling.logprobs
- Flag indicating whether to compute token log probabilities.seed
- the optional random number generation seed to sample deterministically.publisher
- an optional flow publisher that asynchronously issues generated chunks. The batch size must be 1.- Returns:
- The generated text completion.
-
chat
public CompletionPrediction[] chat(Message[][] dialogs, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher) Generates assistant responses for a list of conversational dialogs.- Parameters:
dialogs
- List of conversational dialogs, where each dialog is a list of messages.maxGenLen
- Maximum length of the generated text sequence.temperature
- Temperature value for controlling randomness in sampling.topp
- Top-p probability threshold for nucleus sampling.logprobs
- Flag indicating whether to compute token log probabilities.seed
- the optional random number generation seed to sample deterministically.publisher
- an optional flow publisher that asynchronously issues generated chunks. The batch size must be 1.- Returns:
- The generated chat responses.
-