Class GLM

java.lang.Object
smile.regression.GLM
All Implemented Interfaces:
Serializable, ToDoubleFunction<Tuple>, DataFrameRegression, Regression<Tuple>

public class GLM extends Object implements DataFrameRegression, Serializable
Generalized linear models. The generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In GLM, each outcome Y of the dependent variables is assumed to be generated from a particular distribution in an exponential family. The mean, μ, of the distribution depends on the independent variables, X, through:

E(Y) = μ = g-1(Xβ)

where E(Y) is the expected value of Y; is the linear combination of linear predictors and unknown parameters β; g is the link function that is a monotonic, differentiable function. THe link function that transforms the mean to the natural parameter is called the canonical link.

In this framework, the variance is typically a function, V, of the mean:

Var(Y) = V(μ) = V(g-1(Xβ))

It is convenient if V follows from an exponential family of distributions, but it may simply be that the variance is a function of the predicted value, such as V(μi) = μi for the Poisson, V(μi) = μi(1 - μi) for the Bernoulli, and V(μi) = σ2 (i.e., constant) for the normal.

The unknown parameters, β, are typically estimated with maximum likelihood, maximum quasi-likelihood, or Bayesian techniques.

See Also:
  • Constructor Details

    • GLM

      public GLM(Formula formula, StructType schema, Model model, double[] beta, double logLikelihood, double deviance, double nullDeviance, double[] mu, double[] residuals, double[][] ztest)
      Constructor.
      Parameters:
      formula - the model formula.
      schema - the schema of design matrix.
      model - the generalized linear model specification.
      beta - the linear weights.
      logLikelihood - the log-likelihood.
      deviance - the deviance.
      nullDeviance - the null deviance.
      mu - the fitted mean values.
      residuals - the residuals of fitted values of training data.
      ztest - the z-test of the coefficients.
  • Method Details

    • coefficients

      public double[] coefficients()
      Returns an array of size (p+1) containing the linear weights of binary logistic regression, where p is the dimension of feature vectors. The last element is the weight of bias.
      Returns:
      the linear weights.
    • ztest

      public double[][] ztest()
      Returns the z-test of the coefficients (including intercept). The first column is the coefficients, the second column is the standard error of coefficients, the third column is the z-score of the hypothesis test if the coefficient is zero, the fourth column is the p-values of test. The last row is of intercept.
      Returns:
      the z-test of the coefficients.
    • devianceResiduals

      public double[] devianceResiduals()
      Returns the deviance residuals.
      Returns:
      the deviance residuals.
    • fittedValues

      public double[] fittedValues()
      Returns the fitted mean values.
      Returns:
      the fitted mean values.
    • deviance

      public double deviance()
      Returns the deviance of model.
      Returns:
      the deviance of model.
    • logLikelihood

      public double logLikelihood()
      Returns the log-likelihood of model.
      Returns:
      the log-likelihood of model.
    • AIC

      public double AIC()
      Returns the AIC score.
      Returns:
      the AIC score.
    • BIC

      public double BIC()
      Returns the BIC score.
      Returns:
      the BIC score.
    • formula

      public Formula formula()
      Description copied from interface: DataFrameRegression
      Returns the model formula.
      Specified by:
      formula in interface DataFrameRegression
      Returns:
      the model formula.
    • schema

      public StructType schema()
      Description copied from interface: DataFrameRegression
      Returns the schema of predictors.
      Specified by:
      schema in interface DataFrameRegression
      Returns:
      the schema of predictors.
    • predict

      public double predict(Tuple x)
      Description copied from interface: Regression
      Predicts the dependent variable of an instance.
      Specified by:
      predict in interface Regression<Tuple>
      Parameters:
      x - an instance.
      Returns:
      the predicted value of dependent variable.
    • predict

      public double[] predict(DataFrame data)
      Description copied from interface: DataFrameRegression
      Predicts the dependent variables of a data frame.
      Specified by:
      predict in interface DataFrameRegression
      Parameters:
      data - the data frame.
      Returns:
      the predicted values.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • fit

      public static GLM fit(Formula formula, DataFrame data, Model model)
      Fits the generalized linear model with IWLS (iteratively reweighted least squares).
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of the explanatory and response variables.
      model - the generalized linear model specification.
      Returns:
      the model.
    • fit

      public static GLM fit(Formula formula, DataFrame data, Model model, GLM.Options options)
      Fits the generalized linear model with IWLS (iteratively reweighted least squares).
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of the explanatory and response variables.
      model - the generalized linear model specification.
      options - the hyperparameters.
      Returns:
      the model.