Class GAM
java.lang.Object
smile.regression.GAM
- All Implemented Interfaces:
Serializable, ToDoubleFunction<Tuple>, DataFrameRegression, Regression<Tuple>
Generalized Additive Models (GAM). A GAM is a flexible statistical model
that extends Generalized Linear Models (GLMs) by replacing the linear
predictor with a sum of smooth functions of the predictors:
g(E[Y]) = alpha + f_1(x_1) + f_2(x_2) + ... + f_p(x_p)
where g is the link function (the same as in GLM), alpha
is the intercept, and f_j are smooth (non-parametric) functions
estimated from the data. Each f_j is represented as a penalized
B-spline (P-spline), which balances fidelity to the data against smoothness
of the fitted curve via a smoothing parameter lambda_j.
Algorithm
GAMs are fitted by a combination of two iterative procedures:- PIRLS (Penalized Iteratively Reweighted Least Squares): The outer loop is identical to the IWLS algorithm used for GLMs, but each weighted least squares step is replaced by a penalized weighted least squares step that incorporates the smoothing penalties.
- Backfitting: Within each PIRLS iteration, the smooth
functions
f_1, ..., f_pare estimated by cycling through each predictor and fitting a weighted penalized spline to the partial residuals, keeping the other smooth functions fixed.
Identifiability
Because the intercept is estimated separately, each smooth is constrained to be centered: the mean off_j(x_{ij}) over the training data
is zero. This ensures that the intercept has a unique interpretation as
the grand mean of the linear predictor.
Smoothing Parameter
Each smoothf_j has its own smoothing parameter lambda_j >= 0.
A larger lambda_j forces f_j to be smoother (closer to
linear), while lambda_j = 0 gives an unpenalized spline.
By default, a single shared lambda is used for all smooths.
Users can override the per-predictor lambdas via GAM.Options.
Degrees of Freedom
The basis dimension (number of B-spline basis functions) per predictor is controlled byGAM.Options.df(). More basis functions allow a richer
class of smooth functions but increase the risk of overfitting without
sufficient penalization.
References
- Hastie, T., & Tibshirani, R. (1986). Generalized Additive Models. Statistical Science, 1(3), 297–310.
- Wood, S.N. (2017). Generalized Additive Models: An Introduction with R (2nd ed.). CRC Press.
- Eilers, P.H.C. & Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–102.
- See Also:
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from interface DataFrameRegression
DataFrameRegression.Trainer<M> -
Constructor Summary
ConstructorsConstructorDescriptionGAM(Formula formula, StructType schema, Model model, double intercept, SmoothingSpline[] smooths, double[] mu, double[] devianceResiduals, double nullDeviance, double deviance, double logLikelihood, double totalEdf) Constructor. -
Method Summary
Modifier and TypeMethodDescriptiondoubleAIC()Returns the AIC score.doubleBIC()Returns the BIC score.doubledeviance()Returns the deviance of the model.double[]Returns the deviance residuals.static GAMFits a GAM with default options.static GAMfit(Formula formula, DataFrame data, Model model, GAM.Options options) Fits a GAM with PIRLS (Penalized Iteratively Reweighted Least Squares) and backfitting.double[]Returns the fitted mean values on the training data.formula()Returns the model formula.doubleReturns the intercept of the model.doubleReturns the log-likelihood of the model.doubleReturns the null deviance.double[]Predicts the dependent variables of a data frame.doublePredicts the dependent variable of an instance.schema()Returns the schema of predictors.smooths()Returns the smooth terms of the model.toString()doubletotalEdf()Returns the total effective degrees of freedom (intercept + sum of smooth EDFs).Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface Regression
applyAsDouble, online, predict, predict, predict, update, update, update
-
Constructor Details
-
GAM
public GAM(Formula formula, StructType schema, Model model, double intercept, SmoothingSpline[] smooths, double[] mu, double[] devianceResiduals, double nullDeviance, double deviance, double logLikelihood, double totalEdf) Constructor.- Parameters:
formula- the model formula.schema- the schema of design matrix.model- the GLM family specification.intercept- the fitted intercept.smooths- the fitted smooth terms.mu- the fitted mean values.devianceResiduals- the deviance residuals.nullDeviance- the null deviance.deviance- the residual deviance.logLikelihood- the log-likelihood.totalEdf- the total effective degrees of freedom.
-
-
Method Details
-
intercept
public double intercept()Returns the intercept of the model.- Returns:
- the intercept.
-
smooths
Returns the smooth terms of the model.- Returns:
- the smooth terms.
-
fittedValues
public double[] fittedValues()Returns the fitted mean values on the training data.- Returns:
- the fitted mean values.
-
devianceResiduals
public double[] devianceResiduals()Returns the deviance residuals.- Returns:
- the deviance residuals.
-
deviance
public double deviance()Returns the deviance of the model.- Returns:
- the deviance.
-
nullDeviance
public double nullDeviance()Returns the null deviance.- Returns:
- the null deviance.
-
logLikelihood
public double logLikelihood()Returns the log-likelihood of the model.- Returns:
- the log-likelihood.
-
totalEdf
public double totalEdf()Returns the total effective degrees of freedom (intercept + sum of smooth EDFs).- Returns:
- the total EDF.
-
AIC
public double AIC()Returns the AIC score.- Returns:
- the AIC score.
-
BIC
public double BIC()Returns the BIC score.- Returns:
- the BIC score.
-
formula
Description copied from interface:DataFrameRegressionReturns the model formula.- Specified by:
formulain interfaceDataFrameRegression- Returns:
- the model formula.
-
schema
Description copied from interface:DataFrameRegressionReturns the schema of predictors.- Specified by:
schemain interfaceDataFrameRegression- Returns:
- the schema of predictors.
-
predict
Description copied from interface:RegressionPredicts the dependent variable of an instance.- Specified by:
predictin interfaceRegression<Tuple>- Parameters:
x- an instance.- Returns:
- the predicted value of dependent variable.
-
predict
Description copied from interface:DataFrameRegressionPredicts the dependent variables of a data frame.- Specified by:
predictin interfaceDataFrameRegression- Parameters:
data- the data frame.- Returns:
- the predicted values.
-
toString
-
fit
Fits a GAM with default options.- Parameters:
formula- a symbolic description of the model to be fitted. The formula must have a response variable.data- the data frame of predictor and response variables.model- the GLM family specification (link function, variance, etc.).- Returns:
- the fitted GAM.
-
fit
Fits a GAM with PIRLS (Penalized Iteratively Reweighted Least Squares) and backfitting.The algorithm is:
- Initialize: set
mu_i = mustart(y_i), compute linear predictoreta_i = link(mu_i). - Outer PIRLS loop (until deviance converges):
- Compute working weights
w_iand adjusted dependent variablez_i = eta_i + (y_i - mu_i) * g'(mu_i). - Run backfitting on the working response to update the intercept and all smooth terms.
- Update
eta_iandmu_i.
- Compute working weights
- Parameters:
formula- a symbolic description of the model to be fitted.data- the data frame of predictor and response variables.model- the GLM family specification.options- the hyperparameters.- Returns:
- the fitted GAM.
- Initialize: set
-