Package smile.glm

Class GLM

java.lang.Object
smile.glm.GLM
All Implemented Interfaces:
Serializable

public class GLM extends Object implements Serializable
Generalized linear models. The generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In GLM, each outcome Y of the dependent variables is assumed to be generated from a particular distribution in an exponential family. The mean, μ, of the distribution depends on the independent variables, X, through:

E(Y) = μ = g-1(Xβ)

where E(Y) is the expected value of Y; is the linear combination of linear predictors and unknown parameters β; g is the link function that is a monotonic, differentiable function. THe link function that transforms the mean to the natural parameter is called the canonical link.

In this framework, the variance is typically a function, V, of the mean:

Var(Y) = V(μ) = V(g-1(Xβ))

It is convenient if V follows from an exponential family of distributions, but it may simply be that the variance is a function of the predicted value, such as V(μi) = μi for the Poisson, V(μi) = μi(1 - μi) for the Bernoulli, and V(μi) = σ2 (i.e., constant) for the normal.

The unknown parameters, β, are typically estimated with maximum likelihood, maximum quasi-likelihood, or Bayesian techniques.

See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected final double[]
    The linear weights.
    protected final double
    The deviance = 2 * (LogLikelihood(Saturated Model) - LogLikelihood(Proposed Model)).
    protected double[]
    The deviance residuals.
    protected final int
    The degrees of freedom of the residual deviance.
    protected final Formula
    The symbolic description of the model to be fitted.
    protected final double
    Log-likelihood.
    protected final Model
    The model specifications (link function, deviance, etc.).
    protected final double[]
    The fitted mean values.
    protected final double
    The null deviance = 2 * (LogLikelihood(Saturated Model) - LogLikelihood(Null Model)).
    protected final double[][]
    The coefficients, their standard errors, z-scores, and p-values.
  • Constructor Summary

    Constructors
    Constructor
    Description
    GLM(Formula formula, String[] predictors, Model model, double[] beta, double logLikelihood, double deviance, double nullDeviance, double[] mu, double[] residuals, double[][] ztest)
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    double
    AIC()
    Returns the AIC score.
    double
    BIC()
    Returns the BIC score.
    double[]
    Returns an array of size (p+1) containing the linear weights of binary logistic regression, where p is the dimension of feature vectors.
    double
    Returns the deviance of model.
    double[]
    Returns the deviance residuals.
    static GLM
    fit(Formula formula, DataFrame data, Model model)
    Fits the generalized linear model with IWLS (iteratively reweighted least squares).
    static GLM
    fit(Formula formula, DataFrame data, Model model, double tol, int maxIter)
    Fits the generalized linear model with IWLS (iteratively reweighted least squares).
    static GLM
    fit(Formula formula, DataFrame data, Model model, Properties params)
    Fits the generalized linear model with IWLS (iteratively reweighted least squares).
    double[]
    Returns the fitted mean values.
    double
    Returns the log-likelihood of model.
    double[]
    Predicts the mean response.
    double
    Predicts the mean response.
     
    double[][]
    Returns the z-test of the coefficients (including intercept).

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

    • formula

      protected final Formula formula
      The symbolic description of the model to be fitted.
    • model

      protected final Model model
      The model specifications (link function, deviance, etc.).
    • beta

      protected final double[] beta
      The linear weights.
    • ztest

      protected final double[][] ztest
      The coefficients, their standard errors, z-scores, and p-values.
    • mu

      protected final double[] mu
      The fitted mean values.
    • nullDeviance

      protected final double nullDeviance
      The null deviance = 2 * (LogLikelihood(Saturated Model) - LogLikelihood(Null Model)).

      The saturated model, also referred to as the full model or maximal model, allows a different mean response for each group of replicates. One can think of the saturated model as having the most general possible mean structure for the data since the means are unconstrained.

      The null model assumes that all observations have the same distribution with common parameter. Like the saturated model, the null model does not depend on predictor variables. While the saturated most is the most general model, the null model is the most restricted model.

    • deviance

      protected final double deviance
      The deviance = 2 * (LogLikelihood(Saturated Model) - LogLikelihood(Proposed Model)).
    • devianceResiduals

      protected double[] devianceResiduals
      The deviance residuals.
    • df

      protected final int df
      The degrees of freedom of the residual deviance.
    • logLikelihood

      protected final double logLikelihood
      Log-likelihood.
  • Constructor Details

    • GLM

      public GLM(Formula formula, String[] predictors, Model model, double[] beta, double logLikelihood, double deviance, double nullDeviance, double[] mu, double[] residuals, double[][] ztest)
      Constructor.
      Parameters:
      formula - the model formula.
      predictors - the predictors of design matrix.
      model - the generalized linear model specification.
      beta - the linear weights.
      logLikelihood - the log-likelihood.
      deviance - the deviance.
      nullDeviance - the null deviance.
      mu - the fitted mean values.
      residuals - the residuals of fitted values of training data.
      ztest - the z-test of the coefficients.
  • Method Details

    • coefficients

      public double[] coefficients()
      Returns an array of size (p+1) containing the linear weights of binary logistic regression, where p is the dimension of feature vectors. The last element is the weight of bias.
      Returns:
      the linear weights.
    • ztest

      public double[][] ztest()
      Returns the z-test of the coefficients (including intercept). The first column is the coefficients, the second column is the standard error of coefficients, the third column is the z-score of the hypothesis test if the coefficient is zero, the fourth column is the p-values of test. The last row is of intercept.
      Returns:
      the z-test of the coefficients.
    • devianceResiduals

      public double[] devianceResiduals()
      Returns the deviance residuals.
      Returns:
      the deviance residuals.
    • fittedValues

      public double[] fittedValues()
      Returns the fitted mean values.
      Returns:
      the fitted mean values.
    • deviance

      public double deviance()
      Returns the deviance of model.
      Returns:
      the deviance of model.
    • logLikelihood

      public double logLikelihood()
      Returns the log-likelihood of model.
      Returns:
      the log-likelihood of model.
    • AIC

      public double AIC()
      Returns the AIC score.
      Returns:
      the AIC score.
    • BIC

      public double BIC()
      Returns the BIC score.
      Returns:
      the BIC score.
    • predict

      public double predict(Tuple x)
      Predicts the mean response.
      Parameters:
      x - the instance.
      Returns:
      the mean response.
    • predict

      public double[] predict(DataFrame data)
      Predicts the mean response.
      Parameters:
      data - the data frame.
      Returns:
      the mean response.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • fit

      public static GLM fit(Formula formula, DataFrame data, Model model)
      Fits the generalized linear model with IWLS (iteratively reweighted least squares).
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of the explanatory and response variables.
      model - the generalized linear model specification.
      Returns:
      the model.
    • fit

      public static GLM fit(Formula formula, DataFrame data, Model model, Properties params)
      Fits the generalized linear model with IWLS (iteratively reweighted least squares).
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of the explanatory and response variables.
      model - the generalized linear model specification.
      params - the hyper-parameters.
      Returns:
      the model.
    • fit

      public static GLM fit(Formula formula, DataFrame data, Model model, double tol, int maxIter)
      Fits the generalized linear model with IWLS (iteratively reweighted least squares).
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of the explanatory and response variables.
      model - the generalized linear model specification.
      tol - the tolerance for stopping iterations.
      maxIter - the maximum number of iterations.
      Returns:
      the model.