smile.regression.GAM

All Implemented Interfaces:: Serializable, ToDoubleFunction<Tuple>, DataFrameRegression, Regression<Tuple>

public class GAM extends Object implements DataFrameRegression, Serializable

Generalized Additive Models (GAM). A GAM is a flexible statistical model that extends Generalized Linear Models (GLMs) by replacing the linear predictor with a sum of smooth functions of the predictors:

    g(E[Y]) = alpha + f_1(x_1) + f_2(x_2) + ... + f_p(x_p)

where g is the link function (the same as in GLM), alpha is the intercept, and f_j are smooth (non-parametric) functions estimated from the data. Each f_j is represented as a penalized B-spline (P-spline), which balances fidelity to the data against smoothness of the fitted curve via a smoothing parameter lambda_j.

Algorithm

GAMs are fitted by a combination of two iterative procedures:

PIRLS (Penalized Iteratively Reweighted Least Squares): The outer loop is identical to the IWLS algorithm used for GLMs, but each weighted least squares step is replaced by a penalized weighted least squares step that incorporates the smoothing penalties.
Backfitting: Within each PIRLS iteration, the smooth functions f_1, ..., f_p are estimated by cycling through each predictor and fitting a weighted penalized spline to the partial residuals, keeping the other smooth functions fixed.

Identifiability

Because the intercept is estimated separately, each smooth is constrained to be centered: the mean of f_j(x_{ij}) over the training data is zero. This ensures that the intercept has a unique interpretation as the grand mean of the linear predictor.

Smoothing Parameter

Each smooth f_j has its own smoothing parameter lambda_j >= 0. A larger lambda_j forces f_j to be smoother (closer to linear), while lambda_j = 0 gives an unpenalized spline. By default, a single shared lambda is used for all smooths. Users can override the per-predictor lambdas via GAM.Options.

Degrees of Freedom

The basis dimension (number of B-spline basis functions) per predictor is controlled by GAM.Options.df(). More basis functions allow a richer class of smooth functions but increase the risk of overfitting without sufficient penalization.

References

Hastie, T., & Tibshirani, R. (1986). Generalized Additive Models. Statistical Science, 1(3), 297–310.
Wood, S.N. (2017). Generalized Additive Models: An Introduction with R (2nd ed.). CRC Press.
Eilers, P.H.C. & Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–102.

See Also:

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final record

GAM.Options

GAM hyperparameters.

Nested classes/interfaces inherited from interface DataFrameRegression
DataFrameRegression.Trainer<M>
Constructor Summary

Constructors

Constructor

Description

GAM(Formula formula, StructType schema, Model model, double intercept, SmoothingSpline[] smooths, double[] mu, double[] devianceResiduals, double nullDeviance, double deviance, double logLikelihood, double totalEdf)

Constructor.
Method Summary

Modifier and Type

Method

Description

double

AIC()

Returns the AIC score.

double

BIC()

Returns the BIC score.

double

deviance()

Returns the deviance of the model.

double[]

devianceResiduals()

Returns the deviance residuals.

static GAM

fit(Formula formula, DataFrame data, Model model)

Fits a GAM with default options.

static GAM

fit(Formula formula, DataFrame data, Model model, GAM.Options options)

Fits a GAM with PIRLS (Penalized Iteratively Reweighted Least Squares) and backfitting.

double[]

fittedValues()

Returns the fitted mean values on the training data.

Formula

formula()

Returns the model formula.

double

intercept()

Returns the intercept of the model.

double

logLikelihood()

Returns the log-likelihood of the model.

double

nullDeviance()

Returns the null deviance.

double[]

predict(DataFrame data)

Predicts the dependent variables of a data frame.

double

predict(Tuple x)

Predicts the dependent variable of an instance.

StructType

schema()

Returns the schema of predictors.

SmoothingSpline[]

smooths()

Returns the smooth terms of the model.

String

toString()

double

totalEdf()

Returns the total effective degrees of freedom (intercept + sum of smooth EDFs).

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface Regression
applyAsDouble, online, predict, predict, predict, update, update, update

Constructor Details
- GAM
  
  public GAM(Formula formula, StructType schema, Model model, double intercept, SmoothingSpline[] smooths, double[] mu, double[] devianceResiduals, double nullDeviance, double deviance, double logLikelihood, double totalEdf)
  
  Constructor.
  
  Parameters:
  
  formula - the model formula.
  
  schema - the schema of design matrix.
  
  model - the GLM family specification.
  
  intercept - the fitted intercept.
  
  smooths - the fitted smooth terms.
  
  mu - the fitted mean values.
  
  devianceResiduals - the deviance residuals.
  
  nullDeviance - the null deviance.
  
  deviance - the residual deviance.
  
  logLikelihood - the log-likelihood.
  
  totalEdf - the total effective degrees of freedom.
Method Details
- intercept
  
  public double intercept()
  
  Returns the intercept of the model.
  
  Returns:
  
  the intercept.
- smooths
  
  public SmoothingSpline[] smooths()
  
  Returns the smooth terms of the model.
  
  Returns:
  
  the smooth terms.
- fittedValues
  
  public double[] fittedValues()
  
  Returns the fitted mean values on the training data.
  
  Returns:
  
  the fitted mean values.
- devianceResiduals
  
  public double[] devianceResiduals()
  
  Returns the deviance residuals.
  
  Returns:
  
  the deviance residuals.
- deviance
  
  public double deviance()
  
  Returns the deviance of the model.
  
  Returns:
  
  the deviance.
- nullDeviance
  
  public double nullDeviance()
  
  Returns the null deviance.
  
  Returns:
  
  the null deviance.
- logLikelihood
  
  public double logLikelihood()
  
  Returns the log-likelihood of the model.
  
  Returns:
  
  the log-likelihood.
- totalEdf
  
  public double totalEdf()
  
  Returns the total effective degrees of freedom (intercept + sum of smooth EDFs).
  
  Returns:
  
  the total EDF.
- AIC
  
  public double AIC()
  
  Returns the AIC score.
  
  Returns:
  
  the AIC score.
- BIC
  
  public double BIC()
  
  Returns the BIC score.
  
  Returns:
  
  the BIC score.
- formula
  
  public Formula formula()
  
  Description copied from interface: DataFrameRegression
  
  Returns the model formula.
  
  Specified by:
  
  formula in interface DataFrameRegression
  
  Returns:
  
  the model formula.
- schema
  
  public StructType schema()
  
  Description copied from interface: DataFrameRegression
  
  Returns the schema of predictors.
  
  Specified by:
  
  schema in interface DataFrameRegression
  
  Returns:
  
  the schema of predictors.
- predict
  
  public double predict(Tuple x)
  
  Description copied from interface: Regression
  
  Predicts the dependent variable of an instance.
  
  Specified by:
  
  predict in interface Regression<Tuple>
  
  Parameters:
  
  x - an instance.
  
  Returns:
  
  the predicted value of dependent variable.
- predict
  
  public double[] predict(DataFrame data)
  
  Description copied from interface: DataFrameRegression
  
  Predicts the dependent variables of a data frame.
  
  Specified by:
  
  predict in interface DataFrameRegression
  
  Parameters:
  
  data - the data frame.
  
  Returns:
  
  the predicted values.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- fit
  
  public static GAM fit(Formula formula, DataFrame data, Model model)
  
  Fits a GAM with default options.
  
  Parameters:
  
  formula - a symbolic description of the model to be fitted. The formula must have a response variable.
  
  data - the data frame of predictor and response variables.
  
  model - the GLM family specification (link function, variance, etc.).
  
  Returns:
  
  the fitted GAM.
- fit
  public static GAM fit(Formula formula, DataFrame data, Model model, GAM.Options options)
  
  Fits a GAM with PIRLS (Penalized Iteratively Reweighted Least Squares) and backfitting.
  The algorithm is:
  
  Initialize: set mu_i = mustart(y_i), compute linear predictor eta_i = link(mu_i).
  
  Outer PIRLS loop (until deviance converges):
  
  Compute working weights w_i and adjusted dependent variable z_i = eta_i + (y_i - mu_i) * g'(mu_i).
  
  Run backfitting on the working response to update the intercept and all smooth terms.
  
  Update eta_i and mu_i.
  
  Parameters:
  
  formula - a symbolic description of the model to be fitted.
  
  data - the data frame of predictor and response variables.
  
  model - the GLM family specification.
  
  options - the hyperparameters.
  
  Returns:
  
  the fitted GAM.

Class GAM

Algorithm

Identifiability

Smoothing Parameter

Degrees of Freedom

References

Nested Class Summary

Nested classes/interfaces inherited from interface DataFrameRegression

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface Regression

Constructor Details

GAM

Method Details

intercept

smooths

fittedValues

devianceResiduals

deviance

nullDeviance

logLikelihood

totalEdf

AIC

BIC

formula

schema

predict

predict

toString

fit

fit