Class Llama

java.lang.Object
smile.llm.llama.Llama

public class Llama extends Object
LLaMA model specification.
  • Constructor Details

    • Llama

      public Llama(String name, Transformer model, Tokenizer tokenizer)
      Constructor.
      Parameters:
      name - the model name.
      model - the transformer model.
      tokenizer - the tokenizer.
  • Method Details

    • toString

      public String toString()
      Overrides:
      toString in class Object
    • family

      public String family()
      Returns the model family name.
      Returns:
      the model family name.
    • name

      public String name()
      Returns the model instance name.
      Returns:
      the model instance name.
    • build

      public static Llama build(String checkpointDir, String tokenizerPath, int maxBatchSize, int maxSeqLen) throws IOException
      Builds a Llama instance by initializing and loading a model checkpoint.
      Parameters:
      checkpointDir - the directory path of checkpoint files.
      tokenizerPath - the path of tokenizer model file.
      maxBatchSize - the maximum batch size for inference.
      maxSeqLen - the maximum sequence length for input text.
      Returns:
      an instance of Llama model.
      Throws:
      IOException
    • build

      public static Llama build(String checkpointDir, String tokenizerPath, int maxBatchSize, int maxSeqLen, Integer deviceId) throws IOException
      Builds a Llama instance by initializing and loading a model checkpoint.
      Parameters:
      checkpointDir - the directory path of checkpoint files.
      tokenizerPath - the path of tokenizer model file.
      maxBatchSize - the maximum batch size for inference.
      maxSeqLen - the maximum sequence length for input text.
      deviceId - the optional CUDA device ID.
      Returns:
      an instance of Llama model.
      Throws:
      IOException
    • generate

      public CompletionPrediction[] generate(int[][] prompts, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher)
      Generates text sequences based on provided prompts. This method uses the provided prompts as a basis for generating text. It employs nucleus sampling to produce text with controlled randomness.
      Parameters:
      prompts - List of tokenized prompts, where each prompt is represented as a list of integers.
      maxGenLen - Maximum length of the generated text sequence.
      temperature - Temperature value for controlling randomness in sampling.
      topp - Top-p probability threshold for nucleus sampling.
      logprobs - Flag indicating whether to compute token log probabilities.
      seed - the optional random number generation seed to sample deterministically.
      publisher - an optional flow publisher that asynchronously issues generated chunks. The batch size must be 1.
      Returns:
      The generated text completion.
    • complete

      public CompletionPrediction[] complete(String[] prompts, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher)
      Performs text completion for a list of prompts
      Parameters:
      prompts - List of text prompts.
      maxGenLen - Maximum length of the generated text sequence.
      temperature - Temperature value for controlling randomness in sampling.
      topp - Top-p probability threshold for nucleus sampling.
      logprobs - Flag indicating whether to compute token log probabilities.
      seed - the optional random number generation seed to sample deterministically.
      publisher - an optional flow publisher that asynchronously issues generated chunks. The batch size must be 1.
      Returns:
      The generated text completion.
    • chat

      public CompletionPrediction[] chat(Message[][] dialogs, int maxGenLen, double temperature, double topp, boolean logprobs, Long seed, SubmissionPublisher<String> publisher)
      Generates assistant responses for a list of conversational dialogs.
      Parameters:
      dialogs - List of conversational dialogs, where each dialog is a list of messages.
      maxGenLen - Maximum length of the generated text sequence.
      temperature - Temperature value for controlling randomness in sampling.
      topp - Top-p probability threshold for nucleus sampling.
      logprobs - Flag indicating whether to compute token log probabilities.
      seed - the optional random number generation seed to sample deterministically.
      publisher - an optional flow publisher that asynchronously issues generated chunks. The batch size must be 1.
      Returns:
      The generated chat responses.