smile.regression.GLM

All Implemented Interfaces:: Serializable, ToDoubleFunction<Tuple>, DataFrameRegression, Regression<Tuple>

public class GLM extends Object implements DataFrameRegression, Serializable

Generalized linear models. The generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In GLM, each outcome Y of the dependent variables is assumed to be generated from a particular distribution in an exponential family. The mean, μ, of the distribution depends on the independent variables, X, through:

E(Y) = μ = g^-1(Xβ)

where E(Y) is the expected value of Y; Xβ is the linear combination of linear predictors and unknown parameters β; g is the link function that is a monotonic, differentiable function. THe link function that transforms the mean to the natural parameter is called the canonical link.

In this framework, the variance is typically a function, V, of the mean:

Var(Y) = V(μ) = V(g^-1(Xβ))

It is convenient if V follows from an exponential family of distributions, but it may simply be that the variance is a function of the predicted value, such as V(μ_i) = μ_i for the Poisson, V(μ_i) = μ_i(1 - μ_i) for the Bernoulli, and V(μ_i) = σ² (i.e., constant) for the normal.

The unknown parameters, β, are typically estimated with maximum likelihood, maximum quasi-likelihood, or Bayesian techniques.

See Also:

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final record

GLM.Options

GLM hyperparameters.

Nested classes/interfaces inherited from interface DataFrameRegression
DataFrameRegression.Trainer<M>
Constructor Summary

Constructors

Constructor

Description

GLM(Formula formula, StructType schema, Model model, double[] beta, double logLikelihood, double deviance, double nullDeviance, double[] mu, double[] residuals, double[][] ztest)

Constructor.
Method Summary

Modifier and Type

Method

Description

double

AIC()

Returns the AIC score.

double

BIC()

Returns the BIC score.

double[]

coefficients()

Returns an array of size (p+1) containing the linear weights of binary logistic regression, where p is the dimension of feature vectors.

double

deviance()

Returns the deviance of model.

double[]

devianceResiduals()

Returns the deviance residuals.

static GLM

fit(Formula formula, DataFrame data, Model model)

Fits the generalized linear model with IWLS (iteratively reweighted least squares).

static GLM

fit(Formula formula, DataFrame data, Model model, GLM.Options options)

Fits the generalized linear model with IWLS (iteratively reweighted least squares).

double[]

fittedValues()

Returns the fitted mean values.

Formula

formula()

Returns the model formula.

double

logLikelihood()

Returns the log-likelihood of model.

double[]

predict(DataFrame data)

Predicts the dependent variables of a data frame.

double

predict(Tuple x)

Predicts the dependent variable of an instance.

StructType

schema()

Returns the schema of predictors.

String

toString()

double[][]

ztest()

Returns the z-test of the coefficients (including intercept).

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface Regression
applyAsDouble, online, predict, predict, predict, update, update, update

Constructor Details
- GLM
  
  public GLM(Formula formula, StructType schema, Model model, double[] beta, double logLikelihood, double deviance, double nullDeviance, double[] mu, double[] residuals, double[][] ztest)
  
  Constructor.
  
  Parameters:
  
  formula - the model formula.
  
  schema - the schema of design matrix.
  
  model - the generalized linear model specification.
  
  beta - the linear weights.
  
  logLikelihood - the log-likelihood.
  
  deviance - the deviance.
  
  nullDeviance - the null deviance.
  
  mu - the fitted mean values.
  
  residuals - the residuals of fitted values of training data.
  
  ztest - the z-test of the coefficients.
Method Details
- coefficients
  
  public double[] coefficients()
  
  Returns an array of size (p+1) containing the linear weights of binary logistic regression, where p is the dimension of feature vectors. The last element is the weight of bias.
  
  Returns:
  
  the linear weights.
- ztest
  
  public double[][] ztest()
  
  Returns the z-test of the coefficients (including intercept). The first column is the coefficients, the second column is the standard error of coefficients, the third column is the z-score of the hypothesis test if the coefficient is zero, the fourth column is the p-values of test. The last row is of intercept.
  
  Returns:
  
  the z-test of the coefficients.
- devianceResiduals
  
  public double[] devianceResiduals()
  
  Returns the deviance residuals.
  
  Returns:
  
  the deviance residuals.
- fittedValues
  
  public double[] fittedValues()
  
  Returns the fitted mean values.
  
  Returns:
  
  the fitted mean values.
- deviance
  
  public double deviance()
  
  Returns the deviance of model.
  
  Returns:
  
  the deviance of model.
- logLikelihood
  
  public double logLikelihood()
  
  Returns the log-likelihood of model.
  
  Returns:
  
  the log-likelihood of model.
- AIC
  
  public double AIC()
  
  Returns the AIC score.
  
  Returns:
  
  the AIC score.
- BIC
  
  public double BIC()
  
  Returns the BIC score.
  
  Returns:
  
  the BIC score.
- formula
  
  public Formula formula()
  
  Description copied from interface: DataFrameRegression
  
  Returns the model formula.
  
  Specified by:
  
  formula in interface DataFrameRegression
  
  Returns:
  
  the model formula.
- schema
  
  public StructType schema()
  
  Description copied from interface: DataFrameRegression
  
  Returns the schema of predictors.
  
  Specified by:
  
  schema in interface DataFrameRegression
  
  Returns:
  
  the schema of predictors.
- predict
  
  public double predict(Tuple x)
  
  Description copied from interface: Regression
  
  Predicts the dependent variable of an instance.
  
  Specified by:
  
  predict in interface Regression<Tuple>
  
  Parameters:
  
  x - an instance.
  
  Returns:
  
  the predicted value of dependent variable.
- predict
  
  public double[] predict(DataFrame data)
  
  Description copied from interface: DataFrameRegression
  
  Predicts the dependent variables of a data frame.
  
  Specified by:
  
  predict in interface DataFrameRegression
  
  Parameters:
  
  data - the data frame.
  
  Returns:
  
  the predicted values.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- fit
  
  public static GLM fit(Formula formula, DataFrame data, Model model)
  
  Fits the generalized linear model with IWLS (iteratively reweighted least squares).
  
  Parameters:
  
  formula - a symbolic description of the model to be fitted.
  
  data - the data frame of the explanatory and response variables.
  
  model - the generalized linear model specification.
  
  Returns:
  
  the model.
- fit
  
  public static GLM fit(Formula formula, DataFrame data, Model model, GLM.Options options)
  
  Fits the generalized linear model with IWLS (iteratively reweighted least squares).
  
  Parameters:
  
  formula - a symbolic description of the model to be fitted.
  
  data - the data frame of the explanatory and response variables.
  
  model - the generalized linear model specification.
  
  options - the hyperparameters.
  
  Returns:
  
  the model.

Class GLM

Nested Class Summary

Nested classes/interfaces inherited from interface DataFrameRegression

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface Regression

Constructor Details

GLM

Method Details

coefficients

ztest

devianceResiduals

fittedValues

deviance

logLikelihood

AIC

BIC

formula

schema

predict

predict

toString

fit

fit