Class TransformerBlock

java.lang.Object
smile.llm.llama.TransformerBlock

public class TransformerBlock extends Object
A block in Transformer model. It consists of an attention mechanism followed by a feedforward neural network. This module can be stacked multiple times to create a complete Transformer model.
  • Constructor Details

    • TransformerBlock

      public TransformerBlock(int layerId, ModelArgs args)
      Constructor.
      Parameters:
      layerId - the identifier of the block.
      args - the model configuration parameters.
  • Method Details

    • forward

      public Tensor forward(Tensor x, int startPos, Tensor cis, Tensor mask)
      Forward pass through the block.
      Parameters:
      x - the input tensor.
      startPos - the starting position for attention caching.
      cis - the precomputed frequency tensor.
      mask - the attention mask tensor.
      Returns:
      the output tensor.