Class GAM

java.lang.Object
smile.regression.GAM
All Implemented Interfaces:
Serializable, ToDoubleFunction<Tuple>, DataFrameRegression, Regression<Tuple>

public class GAM extends Object implements DataFrameRegression, Serializable
Generalized Additive Models (GAM). A GAM is a flexible statistical model that extends Generalized Linear Models (GLMs) by replacing the linear predictor with a sum of smooth functions of the predictors:
    g(E[Y]) = alpha + f_1(x_1) + f_2(x_2) + ... + f_p(x_p)
where g is the link function (the same as in GLM), alpha is the intercept, and f_j are smooth (non-parametric) functions estimated from the data. Each f_j is represented as a penalized B-spline (P-spline), which balances fidelity to the data against smoothness of the fitted curve via a smoothing parameter lambda_j.

Algorithm

GAMs are fitted by a combination of two iterative procedures:
  1. PIRLS (Penalized Iteratively Reweighted Least Squares): The outer loop is identical to the IWLS algorithm used for GLMs, but each weighted least squares step is replaced by a penalized weighted least squares step that incorporates the smoothing penalties.
  2. Backfitting: Within each PIRLS iteration, the smooth functions f_1, ..., f_p are estimated by cycling through each predictor and fitting a weighted penalized spline to the partial residuals, keeping the other smooth functions fixed.

Identifiability

Because the intercept is estimated separately, each smooth is constrained to be centered: the mean of f_j(x_{ij}) over the training data is zero. This ensures that the intercept has a unique interpretation as the grand mean of the linear predictor.

Smoothing Parameter

Each smooth f_j has its own smoothing parameter lambda_j >= 0. A larger lambda_j forces f_j to be smoother (closer to linear), while lambda_j = 0 gives an unpenalized spline. By default, a single shared lambda is used for all smooths. Users can override the per-predictor lambdas via GAM.Options.

Degrees of Freedom

The basis dimension (number of B-spline basis functions) per predictor is controlled by GAM.Options.df(). More basis functions allow a richer class of smooth functions but increase the risk of overfitting without sufficient penalization.

References

  1. Hastie, T., & Tibshirani, R. (1986). Generalized Additive Models. Statistical Science, 1(3), 297–310.
  2. Wood, S.N. (2017). Generalized Additive Models: An Introduction with R (2nd ed.). CRC Press.
  3. Eilers, P.H.C. & Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–102.
See Also:
  • Constructor Details

    • GAM

      public GAM(Formula formula, StructType schema, Model model, double intercept, SmoothingSpline[] smooths, double[] mu, double[] devianceResiduals, double nullDeviance, double deviance, double logLikelihood, double totalEdf)
      Constructor.
      Parameters:
      formula - the model formula.
      schema - the schema of design matrix.
      model - the GLM family specification.
      intercept - the fitted intercept.
      smooths - the fitted smooth terms.
      mu - the fitted mean values.
      devianceResiduals - the deviance residuals.
      nullDeviance - the null deviance.
      deviance - the residual deviance.
      logLikelihood - the log-likelihood.
      totalEdf - the total effective degrees of freedom.
  • Method Details

    • intercept

      public double intercept()
      Returns the intercept of the model.
      Returns:
      the intercept.
    • smooths

      public SmoothingSpline[] smooths()
      Returns the smooth terms of the model.
      Returns:
      the smooth terms.
    • fittedValues

      public double[] fittedValues()
      Returns the fitted mean values on the training data.
      Returns:
      the fitted mean values.
    • devianceResiduals

      public double[] devianceResiduals()
      Returns the deviance residuals.
      Returns:
      the deviance residuals.
    • deviance

      public double deviance()
      Returns the deviance of the model.
      Returns:
      the deviance.
    • nullDeviance

      public double nullDeviance()
      Returns the null deviance.
      Returns:
      the null deviance.
    • logLikelihood

      public double logLikelihood()
      Returns the log-likelihood of the model.
      Returns:
      the log-likelihood.
    • totalEdf

      public double totalEdf()
      Returns the total effective degrees of freedom (intercept + sum of smooth EDFs).
      Returns:
      the total EDF.
    • AIC

      public double AIC()
      Returns the AIC score.
      Returns:
      the AIC score.
    • BIC

      public double BIC()
      Returns the BIC score.
      Returns:
      the BIC score.
    • formula

      public Formula formula()
      Description copied from interface: DataFrameRegression
      Returns the model formula.
      Specified by:
      formula in interface DataFrameRegression
      Returns:
      the model formula.
    • schema

      public StructType schema()
      Description copied from interface: DataFrameRegression
      Returns the schema of predictors.
      Specified by:
      schema in interface DataFrameRegression
      Returns:
      the schema of predictors.
    • predict

      public double predict(Tuple x)
      Description copied from interface: Regression
      Predicts the dependent variable of an instance.
      Specified by:
      predict in interface Regression<Tuple>
      Parameters:
      x - an instance.
      Returns:
      the predicted value of dependent variable.
    • predict

      public double[] predict(DataFrame data)
      Description copied from interface: DataFrameRegression
      Predicts the dependent variables of a data frame.
      Specified by:
      predict in interface DataFrameRegression
      Parameters:
      data - the data frame.
      Returns:
      the predicted values.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • fit

      public static GAM fit(Formula formula, DataFrame data, Model model)
      Fits a GAM with default options.
      Parameters:
      formula - a symbolic description of the model to be fitted. The formula must have a response variable.
      data - the data frame of predictor and response variables.
      model - the GLM family specification (link function, variance, etc.).
      Returns:
      the fitted GAM.
    • fit

      public static GAM fit(Formula formula, DataFrame data, Model model, GAM.Options options)
      Fits a GAM with PIRLS (Penalized Iteratively Reweighted Least Squares) and backfitting.

      The algorithm is:

      1. Initialize: set mu_i = mustart(y_i), compute linear predictor eta_i = link(mu_i).
      2. Outer PIRLS loop (until deviance converges):
        1. Compute working weights w_i and adjusted dependent variable z_i = eta_i + (y_i - mu_i) * g'(mu_i).
        2. Run backfitting on the working response to update the intercept and all smooth terms.
        3. Update eta_i and mu_i.
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of predictor and response variables.
      model - the GLM family specification.
      options - the hyperparameters.
      Returns:
      the fitted GAM.