Class LinearModel
- All Implemented Interfaces:
Serializable, ToDoubleFunction<Tuple>, DataFrameRegression, Regression<Tuple>
Once a regression model has been constructed, it may be important to confirm the goodness of fit of the model and the statistical significance of the estimated parameters. Commonly used checks of goodness of fit include the R-squared, analysis of the pattern of residuals and hypothesis testing. Statistical significance can be checked by an F-test of the overall fit, followed by t-tests of individual parameters.
Interpretations of these diagnostic tests rest heavily on the model assumptions. Although examination of the residuals can be used to invalidate a model, the results of a t-test or F-test are sometimes more difficult to interpret if the model's assumptions are violated. For example, if the error term does not have a normal distribution, in small samples the estimated parameters will not follow normal distributions and complicate inference. With relatively large samples, however, a central limit theorem can be invoked such that hypothesis testing may proceed using asymptotic approximations.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface DataFrameRegression
DataFrameRegression.Trainer<M> -
Constructor Summary
ConstructorsConstructorDescriptionLinearModel(Formula formula, StructType schema, DenseMatrix X, double[] y, Vector w, double b) Constructor. -
Method Summary
Modifier and TypeMethodDescriptiondoubleReturns adjusted R2 statistic.Returns the linear coefficients without intercept.intdf()Returns the degree-of-freedom of residual standard error.doubleerror()Returns the residual standard error.Returns the fitted values.formula()Returns the model formula.doubleftest()Returns the F-statistic of goodness-of-fit.doubleReturns the intercept.booleanonline()Returns true if this is an online learner.doublepredict(double[] x) Predicts the dependent variable of an instance.double[]Predicts the dependent variables of a data frame.doublePredicts the dependent variable of an instance.doublepvalue()Returns the p-value of goodness-of-fit test.Returns the residuals, which is response minus fitted values.doubleRSquared()Returns R2 statistic.doubleRSS()Returns the residual sum of squares.schema()Returns the schema of predictors.toString()double[][]ttest()Returns the t-test of the coefficients (including intercept).voidupdate(double[] x, double y) Growing window recursive least squares with lambda = 1.voidupdate(double[] x, double y, double lambda) Recursive least squares.voidOnline update the regression model with a new data frame.voidOnline update the regression model with a new training instance.Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface Regression
applyAsDouble, predict, predict, predict, update, update, update
-
Constructor Details
-
LinearModel
public LinearModel(Formula formula, StructType schema, DenseMatrix X, double[] y, Vector w, double b) Constructor.- Parameters:
formula- a symbolic description of the model to be fitted.schema- the schema of input data.X- the design matrix.y- the responsible variable.w- the linear weights.b- the intercept.
-
-
Method Details
-
formula
Description copied from interface:DataFrameRegressionReturns the model formula.- Specified by:
formulain interfaceDataFrameRegression- Returns:
- the model formula.
-
schema
Description copied from interface:DataFrameRegressionReturns the schema of predictors.- Specified by:
schemain interfaceDataFrameRegression- Returns:
- the schema of predictors.
-
ttest
public double[][] ttest()Returns the t-test of the coefficients (including intercept). The first column is the coefficients, the second column is the standard error of coefficients, the third column is the t-score of the hypothesis test if the coefficient is zero, the fourth column is the p-values of test. The last row is of intercept.- Returns:
- the t-test of the coefficients.
-
coefficients
Returns the linear coefficients without intercept.- Returns:
- the linear coefficients without intercept.
-
intercept
public double intercept()Returns the intercept.- Returns:
- the intercept.
-
residuals
Returns the residuals, which is response minus fitted values.- Returns:
- the residuals
-
fittedValues
-
RSS
public double RSS()Returns the residual sum of squares.- Returns:
- the residual sum of squares.
-
error
public double error()Returns the residual standard error.- Returns:
- the residual standard error.
-
df
public int df()Returns the degree-of-freedom of residual standard error.- Returns:
- the degree-of-freedom of residual standard error.
-
RSquared
public double RSquared()Returns R2 statistic. In regression, the R2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1.0 indicates that the regression line perfectly fits the data.In the case of ordinary least-squares regression, R2 increases as we increase the number of variables in the model (R2 will not decrease). This illustrates a drawback to one possible use of R2, where one might try to include more variables in the model until "there is no more improvement". This leads to the alternative approach of looking at the adjusted R2.
- Returns:
- R2 statistic.
-
adjustedRSquared
public double adjustedRSquared()Returns adjusted R2 statistic. The adjusted R2 has almost same explanation as R2, but it penalizes the statistic as extra variables are included in the model.- Returns:
- adjusted R2 statistic.
-
ftest
public double ftest()Returns the F-statistic of goodness-of-fit.- Returns:
- the F-statistic of goodness-of-fit.
-
pvalue
public double pvalue()Returns the p-value of goodness-of-fit test.- Returns:
- the p-value of goodness-of-fit test.
-
predict
public double predict(double[] x) Predicts the dependent variable of an instance.- Parameters:
x- an instance.- Returns:
- the predicted value of dependent variable.
-
predict
Description copied from interface:RegressionPredicts the dependent variable of an instance.- Specified by:
predictin interfaceRegression<Tuple>- Parameters:
x- an instance.- Returns:
- the predicted value of dependent variable.
-
predict
Description copied from interface:DataFrameRegressionPredicts the dependent variables of a data frame.- Specified by:
predictin interfaceDataFrameRegression- Parameters:
df- the data frame.- Returns:
- the predicted values.
-
update
Online update the regression model with a new training instance.- Parameters:
data- the training data.
-
update
Online update the regression model with a new data frame.- Parameters:
data- the training data.
-
online
public boolean online()Description copied from interface:RegressionReturns true if this is an online learner.- Specified by:
onlinein interfaceRegression<Tuple>- Returns:
- true if online learner.
-
update
public void update(double[] x, double y) Growing window recursive least squares with lambda = 1. RLS updates an ordinary least squares with samples that arrive sequentially.- Parameters:
x- training instance.y- response variable.
-
update
public void update(double[] x, double y, double lambda) Recursive least squares. RLS updates an ordinary least squares with samples that arrive sequentially.In some adaptive configurations it can be useful not to give equal importance to all the historical data but to assign higher weights to the most recent data (and then to forget the oldest one). This may happen when the phenomenon underlying the data is non-stationary or when we want to approximate a nonlinear dependence by using a linear model which is local in time. Both these situations are common in adaptive control problems.
- Parameters:
x- training instance.y- response variable.lambda- The forgetting factor in (0, 1]. The smaller lambda is, the smaller is the contribution of previous samples to the covariance matrix. This makes the filter more sensitive to recent samples, which means more fluctuations in the filter coefficients. The lambda = 1 case is referred to as the growing window RLS algorithm. In practice, lambda is usually chosen between 0.98 and 1.
-
toString
-