Class GLM
- All Implemented Interfaces:
Serializable
In GLM, each outcome Y
of the dependent variables is assumed
to be generated from a particular distribution in an exponential family.
The mean, μ
, of the distribution depends on the
independent variables, X
, through:
E(Y) = μ = g-1(Xβ)
where E(Y)
is the expected value of Y
;
Xβ
is the linear combination of linear predictors
and unknown parameters β; g is the link function that is a monotonic,
differentiable function. THe link function that transforms the mean to
the natural parameter is called the canonical link.
In this framework, the variance is typically a function, V
,
of the mean:
Var(Y) = V(μ) = V(g-1(Xβ))
It is convenient if V
follows from an exponential family
of distributions, but it may simply be that the variance is a function
of the predicted value, such as V(μi) = μi
for the Poisson, V(μi) = μi(1 - μi)
for the Bernoulli, and V(μi) = σ2
(i.e., constant) for the normal.
The unknown parameters, β
, are typically estimated
with maximum likelihood, maximum quasi-likelihood, or Bayesian techniques.
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionprotected final double[]
The linear weights.protected final double
The deviance = 2 * (LogLikelihood(Saturated Model) - LogLikelihood(Proposed Model)).protected final double[]
The deviance residuals.protected final int
The degrees of freedom of the residual deviance.protected final Formula
The symbolic description of the model to be fitted.protected final double
Log-likelihood.protected final Model
The model specifications (link function, deviance, etc.).protected final double[]
The fitted mean values.protected final double
The null deviance = 2 * (LogLikelihood(Saturated Model) - LogLikelihood(Null Model)).protected final double[][]
The coefficients, their standard errors, z-scores, and p-values. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptiondouble
AIC()
Returns the AIC score.double
BIC()
Returns the BIC score.double[]
Returns an array of size (p+1) containing the linear weights of binary logistic regression, where p is the dimension of feature vectors.double
deviance()
Returns the deviance of model.double[]
Returns the deviance residuals.static GLM
Fits the generalized linear model with IWLS (iteratively reweighted least squares).static GLM
Fits the generalized linear model with IWLS (iteratively reweighted least squares).static GLM
fit
(Formula formula, DataFrame data, Model model, Properties params) Fits the generalized linear model with IWLS (iteratively reweighted least squares).double[]
Returns the fitted mean values.double
Returns the log-likelihood of model.double[]
Predicts the mean response.double
Predicts the mean response.toString()
double[][]
ztest()
Returns the z-test of the coefficients (including intercept).
-
Field Details
-
formula
The symbolic description of the model to be fitted. -
model
The model specifications (link function, deviance, etc.). -
beta
protected final double[] betaThe linear weights. -
ztest
protected final double[][] ztestThe coefficients, their standard errors, z-scores, and p-values. -
mu
protected final double[] muThe fitted mean values. -
nullDeviance
protected final double nullDevianceThe null deviance = 2 * (LogLikelihood(Saturated Model) - LogLikelihood(Null Model)).The saturated model, also referred to as the full model or maximal model, allows a different mean response for each group of replicates. One can think of the saturated model as having the most general possible mean structure for the data since the means are unconstrained.
The null model assumes that all observations have the same distribution with common parameter. Like the saturated model, the null model does not depend on predictor variables. While the saturated most is the most general model, the null model is the most restricted model.
-
deviance
protected final double devianceThe deviance = 2 * (LogLikelihood(Saturated Model) - LogLikelihood(Proposed Model)). -
devianceResiduals
protected final double[] devianceResidualsThe deviance residuals. -
df
protected final int dfThe degrees of freedom of the residual deviance. -
logLikelihood
protected final double logLikelihoodLog-likelihood.
-
-
Constructor Details
-
GLM
public GLM(Formula formula, String[] predictors, Model model, double[] beta, double logLikelihood, double deviance, double nullDeviance, double[] mu, double[] residuals, double[][] ztest) Constructor.- Parameters:
formula
- the model formula.predictors
- the predictors of design matrix.model
- the generalized linear model specification.beta
- the linear weights.logLikelihood
- the log-likelihood.deviance
- the deviance.nullDeviance
- the null deviance.mu
- the fitted mean values.residuals
- the residuals of fitted values of training data.ztest
- the z-test of the coefficients.
-
-
Method Details
-
coefficients
public double[] coefficients()Returns an array of size (p+1) containing the linear weights of binary logistic regression, where p is the dimension of feature vectors. The last element is the weight of bias.- Returns:
- the linear weights.
-
ztest
public double[][] ztest()Returns the z-test of the coefficients (including intercept). The first column is the coefficients, the second column is the standard error of coefficients, the third column is the z-score of the hypothesis test if the coefficient is zero, the fourth column is the p-values of test. The last row is of intercept.- Returns:
- the z-test of the coefficients.
-
devianceResiduals
public double[] devianceResiduals()Returns the deviance residuals.- Returns:
- the deviance residuals.
-
fittedValues
public double[] fittedValues()Returns the fitted mean values.- Returns:
- the fitted mean values.
-
deviance
public double deviance()Returns the deviance of model.- Returns:
- the deviance of model.
-
logLikelihood
public double logLikelihood()Returns the log-likelihood of model.- Returns:
- the log-likelihood of model.
-
AIC
public double AIC()Returns the AIC score.- Returns:
- the AIC score.
-
BIC
public double BIC()Returns the BIC score.- Returns:
- the BIC score.
-
predict
Predicts the mean response.- Parameters:
x
- the instance.- Returns:
- the mean response.
-
predict
Predicts the mean response.- Parameters:
data
- the data frame.- Returns:
- the mean response.
-
toString
-
fit
Fits the generalized linear model with IWLS (iteratively reweighted least squares).- Parameters:
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.model
- the generalized linear model specification.- Returns:
- the model.
-
fit
Fits the generalized linear model with IWLS (iteratively reweighted least squares).- Parameters:
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.model
- the generalized linear model specification.params
- the hyperparameters.- Returns:
- the model.
-
fit
Fits the generalized linear model with IWLS (iteratively reweighted least squares).- Parameters:
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.model
- the generalized linear model specification.tol
- the tolerance for stopping iterations.maxIter
- the maximum number of iterations.- Returns:
- the model.
-