public class RidgeRegression
extends java.lang.Object
X'X
becomes close to singular.
As a result, the leastsquares estimate becomes highly sensitive to random
errors in the observed response Y
, producing a large variance.
Ridge regression is one method to address these issues. In ridge regression,
the matrix X'X
is perturbed so as to make its determinant
appreciably different from 0.
Ridge regression is a kind of Tikhonov regularization, which is the most commonly used method of regularization of illposed problems. Ridge regression shrinks the regression coefficients by imposing a penalty on their size. By allowing a small amount of bias in the estimates, more reasonable coefficients may often be obtained. Often, small amounts of bias lead to dramatic reductions in the variance of the estimated model coefficients.
Another interpretation of ridge regression is available through Bayesian estimation. In this setting the belief that weight should be small is coded into a prior distribution.
The penalty term is unfair if the predictor variables are not on the same scale. Therefore, if we know that the variables are not measured in the same units, we typically scale the columns of X (to have sample variance 1), and then we perform ridge regression.
When including an intercept term in the regression, we usually leave
this coefficient unpenalized. Otherwise we could add some constant
amount to the vector y
, and this would not result in
the same solution. If we center the columns of X
, then
the intercept estimate ends up just being the mean of y
.
Ridge regression doesn’t set coefficients exactly to zero unless
λ = ∞
, in which case they’re all zero.
Hence ridge regression cannot perform variable selection, and
even though it performs well in terms of prediction accuracy,
it does poorly in terms of offering a clear interpretation.
Constructor and Description 

RidgeRegression() 
Modifier and Type  Method and Description 

static LinearModel 
fit(Formula formula,
DataFrame data)
Fits a ridge regression model.

static LinearModel 
fit(Formula formula,
DataFrame data,
double lambda)
Fits a ridge regression model.

static LinearModel 
fit(Formula formula,
DataFrame data,
double[] weights,
double[] lambda,
double[] beta0)
Fits a generalized ridge regression model that minimizes a
weighted least squares criterion augmented with a
generalized ridge penalty:

static LinearModel 
fit(Formula formula,
DataFrame data,
java.util.Properties prop)
Fits a ridge regression model.

public static LinearModel fit(Formula formula, DataFrame data)
formula
 a symbolic description of the model to be fitted.data
 the data frame of the explanatory and response variables.
NO NEED to include a constant column of 1s for bias.public static LinearModel fit(Formula formula, DataFrame data, java.util.Properties prop)
prop
include
smile.ridge.lambda
is the shrinkage/regularization parameter. Large lambda means more shrinkage.
Choosing an appropriate value of lambda is important, and also difficult.
smile.ridge.standard.error
is a boolean. If true, compute the estimated standard
errors of the estimate of parameters
formula
 a symbolic description of the model to be fitted.data
 the data frame of the explanatory and response variables.
NO NEED to include a constant column of 1s for bias.prop
 Training algorithm hyperparameters and properties.public static LinearModel fit(Formula formula, DataFrame data, double lambda)
formula
 a symbolic description of the model to be fitted.data
 the data frame of the explanatory and response variables.
NO NEED to include a constant column of 1s for bias.lambda
 the shrinkage/regularization parameter. Large lambda means more shrinkage.
Choosing an appropriate value of lambda is important, and also difficult.public static LinearModel fit(Formula formula, DataFrame data, double[] weights, double[] lambda, double[] beta0)
(Y  X'*beta)' * W * (Y  X'*beta) + (beta  beta0)' * lambda * (beta  beta0)
formula
 a symbolic description of the model to be fitted.data
 the data frame of the explanatory and response variables.
NO NEED to include a constant column of 1s for bias.weights
 sample weights.lambda
 the shrinkage/regularization parameter. Large lambda
means more shrinkage. Choosing an appropriate value of
lambda is important, and also difficult. Its length may
be 1 so that its value is applied to all variables.beta0
 generalized ridge penalty target. Its length may
be 1 so that its value is applied to all variables.