smile.llm.Transformer

public class Transformer extends Object

A transformer is a deep learning architecture developed based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need". It has no recurrent units, and thus requires less training time than previous recurrent neural architectures.

Its later variation has been prevalently adopted for training large language models (LLM). Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished.

This architecture is now used not only in natural language processing and computer vision, but also in audio and multi-modal processing. It has also led to the development of pre-trained systems, such as GPTs (Generative Pre-trained Transformers) and BERT (Bidirectional Encoder Representations from Transformers).

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final record

Transformer.Options

Transformer architecture configuration.
Constructor Summary

Constructors

Constructor

Description

Transformer(int numTokens)

Creates a Transformer model with default architecture configuration.

Transformer(Transformer.Options options)

Creates a Transformer model with custom architecture configuration.
Method Summary

Modifier and Type

Method

Description

Tensor

forward(Tensor source)

Forward propagation (or forward pass).

void

init()

Initializes the model weights.

Transformer

to(Device device)

Moves the model to a device.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- Transformer
  
  public Transformer(int numTokens)
  
  Creates a Transformer model with default architecture configuration.
  
  Parameters:
  
  numTokens - the number of tokens in the vocabulary.
- Transformer
  
  public Transformer(Transformer.Options options)
  
  Creates a Transformer model with custom architecture configuration.
  
  Parameters:
  
  options - Transformer architecture configuration.
Method Details
- init
  
  public void init()
  
  Initializes the model weights.
- forward
  
  public Tensor forward(Tensor source)
  
  Forward propagation (or forward pass).
  
  Parameters:
  
  source - the source sequence.
  
  Returns:
  
  the log probability of prediction.
- to
  
  public Transformer to(Device device)
  
  Moves the model to a device.
  
  Parameters:
  
  device - the compute device.
  
  Returns:
  
  this model.

Class Transformer

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

Transformer

Transformer

Method Details

init

forward

to