Class Llama
java.lang.Object
smile.llm.llama.Llama
LLaMA model specification.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic LlamaBuilds a Llama instance by initializing and loading a model checkpoint.static Llamabuild(String checkpointDir, String tokenizerPath, int maxBatchSize, int maxSeqLen, Integer deviceId) Builds a Llama instance by initializing and loading a model checkpoint.chat(Message[][] dialogs, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher) Generates assistant responses for a list of conversational dialogs.complete(String[] prompts, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher) Performs text completion for a list of promptsfamily()Returns the model family name.generate(int[][] prompts, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher) Generates text sequences based on provided prompts.name()Returns the model instance name.toString()
-
Constructor Details
-
Llama
Constructor.- Parameters:
name- the model name.model- the transformer model.tokenizer- the tokenizer.
-
-
Method Details
-
toString
-
family
-
name
-
build
public static Llama build(String checkpointDir, String tokenizerPath, int maxBatchSize, int maxSeqLen) throws IOException Builds a Llama instance by initializing and loading a model checkpoint.- Parameters:
checkpointDir- the directory path of checkpoint files.tokenizerPath- the path of tokenizer model file.maxBatchSize- the maximum batch size for inference.maxSeqLen- the maximum sequence length for input text.- Returns:
- an instance of Llama model.
- Throws:
IOException- if fail to open model checkpoint.
-
build
public static Llama build(String checkpointDir, String tokenizerPath, int maxBatchSize, int maxSeqLen, Integer deviceId) throws IOException Builds a Llama instance by initializing and loading a model checkpoint.- Parameters:
checkpointDir- the directory path of checkpoint files.tokenizerPath- the path of tokenizer model file.maxBatchSize- the maximum batch size for inference.maxSeqLen- the maximum sequence length for input text.deviceId- the optional CUDA device ID.- Returns:
- an instance of Llama model.
- Throws:
IOException- if fail to open model checkpoint.
-
generate
public CompletionPrediction[] generate(int[][] prompts, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher) Generates text sequences based on provided prompts. This method uses the provided prompts as a basis for generating text. It employs nucleus sampling to produce text with controlled randomness.- Parameters:
prompts- List of tokenized prompts, where each prompt is represented as a list of integers.maxGenLen- Maximum length of the generated text sequence.temperature- Temperature value for controlling randomness in sampling.topp- Top-p probability threshold for nucleus sampling.logprobs- Flag indicating whether to compute token log probabilities.seed- the optional random number generation seed to sample deterministically.publisher- an optional flow publisher that asynchronously issues generated chunks. The batch size must be 1.- Returns:
- The generated text completion.
-
complete
public CompletionPrediction[] complete(String[] prompts, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher) Performs text completion for a list of prompts- Parameters:
prompts- List of text prompts.maxGenLen- Maximum length of the generated text sequence.temperature- Temperature value for controlling randomness in sampling.topp- Top-p probability threshold for nucleus sampling.logprobs- Flag indicating whether to compute token log probabilities.seed- the optional random number generation seed to sample deterministically.publisher- an optional flow publisher that asynchronously issues generated chunks. The batch size must be 1.- Returns:
- The generated text completion.
-
chat
public CompletionPrediction[] chat(Message[][] dialogs, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher) Generates assistant responses for a list of conversational dialogs.- Parameters:
dialogs- List of conversational dialogs, where each dialog is a list of messages.maxGenLen- Maximum length of the generated text sequence.temperature- Temperature value for controlling randomness in sampling.topp- Top-p probability threshold for nucleus sampling.logprobs- Flag indicating whether to compute token log probabilities.seed- the optional random number generation seed to sample deterministically.publisher- an optional flow publisher that asynchronously issues generated chunks. The batch size must be 1.- Returns:
- The generated chat responses.
-