Class GLM
- All Implemented Interfaces:
Serializable, ToDoubleFunction<Tuple>, DataFrameRegression, Regression<Tuple>
In GLM, each outcome Y of the dependent variables is assumed
to be generated from a particular distribution in an exponential family.
The mean, μ, of the distribution depends on the
independent variables, X, through:
E(Y) = μ = g-1(Xβ)
where E(Y) is the expected value of Y;
Xβ is the linear combination of linear predictors
and unknown parameters β; g is the link function that is a monotonic,
differentiable function. THe link function that transforms the mean to
the natural parameter is called the canonical link.
In this framework, the variance is typically a function, V,
of the mean:
Var(Y) = V(μ) = V(g-1(Xβ))
It is convenient if V follows from an exponential family
of distributions, but it may simply be that the variance is a function
of the predicted value, such as V(μi) = μi
for the Poisson, V(μi) = μi(1 - μi)
for the Bernoulli, and V(μi) = σ2
(i.e., constant) for the normal.
The unknown parameters, β, are typically estimated
with maximum likelihood, maximum quasi-likelihood, or Bayesian techniques.
- See Also:
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from interface DataFrameRegression
DataFrameRegression.Trainer<M> -
Constructor Summary
ConstructorsConstructorDescriptionGLM(Formula formula, StructType schema, Model model, double[] beta, double logLikelihood, double deviance, double nullDeviance, double[] mu, double[] residuals, double[][] ztest) Constructor. -
Method Summary
Modifier and TypeMethodDescriptiondoubleAIC()Returns the AIC score.doubleBIC()Returns the BIC score.double[]Returns an array of size (p+1) containing the linear weights of binary logistic regression, where p is the dimension of feature vectors.doubledeviance()Returns the deviance of model.double[]Returns the deviance residuals.static GLMFits the generalized linear model with IWLS (iteratively reweighted least squares).static GLMfit(Formula formula, DataFrame data, Model model, GLM.Options options) Fits the generalized linear model with IWLS (iteratively reweighted least squares).double[]Returns the fitted mean values.formula()Returns the model formula.doubleReturns the log-likelihood of model.double[]Predicts the dependent variables of a data frame.doublePredicts the dependent variable of an instance.schema()Returns the schema of predictors.toString()double[][]ztest()Returns the z-test of the coefficients (including intercept).Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface Regression
applyAsDouble, online, predict, predict, predict, update, update, update
-
Constructor Details
-
GLM
public GLM(Formula formula, StructType schema, Model model, double[] beta, double logLikelihood, double deviance, double nullDeviance, double[] mu, double[] residuals, double[][] ztest) Constructor.- Parameters:
formula- the model formula.schema- the schema of design matrix.model- the generalized linear model specification.beta- the linear weights.logLikelihood- the log-likelihood.deviance- the deviance.nullDeviance- the null deviance.mu- the fitted mean values.residuals- the residuals of fitted values of training data.ztest- the z-test of the coefficients.
-
-
Method Details
-
coefficients
public double[] coefficients()Returns an array of size (p+1) containing the linear weights of binary logistic regression, where p is the dimension of feature vectors. The last element is the weight of bias.- Returns:
- the linear weights.
-
ztest
public double[][] ztest()Returns the z-test of the coefficients (including intercept). The first column is the coefficients, the second column is the standard error of coefficients, the third column is the z-score of the hypothesis test if the coefficient is zero, the fourth column is the p-values of test. The last row is of intercept.- Returns:
- the z-test of the coefficients.
-
devianceResiduals
public double[] devianceResiduals()Returns the deviance residuals.- Returns:
- the deviance residuals.
-
fittedValues
public double[] fittedValues()Returns the fitted mean values.- Returns:
- the fitted mean values.
-
deviance
public double deviance()Returns the deviance of model.- Returns:
- the deviance of model.
-
logLikelihood
public double logLikelihood()Returns the log-likelihood of model.- Returns:
- the log-likelihood of model.
-
AIC
public double AIC()Returns the AIC score.- Returns:
- the AIC score.
-
BIC
public double BIC()Returns the BIC score.- Returns:
- the BIC score.
-
formula
Description copied from interface:DataFrameRegressionReturns the model formula.- Specified by:
formulain interfaceDataFrameRegression- Returns:
- the model formula.
-
schema
Description copied from interface:DataFrameRegressionReturns the schema of predictors.- Specified by:
schemain interfaceDataFrameRegression- Returns:
- the schema of predictors.
-
predict
Description copied from interface:RegressionPredicts the dependent variable of an instance.- Specified by:
predictin interfaceRegression<Tuple>- Parameters:
x- an instance.- Returns:
- the predicted value of dependent variable.
-
predict
Description copied from interface:DataFrameRegressionPredicts the dependent variables of a data frame.- Specified by:
predictin interfaceDataFrameRegression- Parameters:
data- the data frame.- Returns:
- the predicted values.
-
toString
-
fit
Fits the generalized linear model with IWLS (iteratively reweighted least squares).- Parameters:
formula- a symbolic description of the model to be fitted.data- the data frame of the explanatory and response variables.model- the generalized linear model specification.- Returns:
- the model.
-
fit
Fits the generalized linear model with IWLS (iteratively reweighted least squares).- Parameters:
formula- a symbolic description of the model to be fitted.data- the data frame of the explanatory and response variables.model- the generalized linear model specification.options- the hyperparameters.- Returns:
- the model.
-