smile.regression.LinearModel

All Implemented Interfaces:: Serializable, ToDoubleFunction<Tuple>, DataFrameRegression, Regression<Tuple>

public class LinearModel extends Object implements DataFrameRegression

Linear model. In linear regression, the model specification is that the dependent variable is a linear combination of the parameters (but need not be linear in the independent variables). The residual is the difference between the value of the dependent variable predicted by the model, and the true value of the dependent variable.

Once a regression model has been constructed, it may be important to confirm the goodness of fit of the model and the statistical significance of the estimated parameters. Commonly used checks of goodness of fit include the R-squared, analysis of the pattern of residuals and hypothesis testing. Statistical significance can be checked by an F-test of the overall fit, followed by t-tests of individual parameters.

Interpretations of these diagnostic tests rest heavily on the model assumptions. Although examination of the residuals can be used to invalidate a model, the results of a t-test or F-test are sometimes more difficult to interpret if the model's assumptions are violated. For example, if the error term does not have a normal distribution, in small samples the estimated parameters will not follow normal distributions and complicate inference. With relatively large samples, however, a central limit theorem can be invoked such that hypothesis testing may proceed using asymptotic approximations.

See Also:

Nested Class Summary

Nested classes/interfaces inherited from interface smile.regression.DataFrameRegression
DataFrameRegression.Trainer<M extends DataFrameRegression>
Constructor Summary

Constructors

Constructor

Description

LinearModel(Formula formula, StructType schema, Matrix X, double[] y, double[] w, double b)

Constructor.
Method Summary

Modifier and Type

Method

Description

double

adjustedRSquared()

Returns adjusted R² statistic.

double[]

coefficients()

Returns the linear coefficients without intercept.

int

df()

Returns the degree-of-freedom of residual standard error.

double

error()

Returns the residual standard error.

double[]

fittedValues()

Returns the fitted values.

Formula

formula()

Returns the model formula.

double

ftest()

Returns the F-statistic of goodness-of-fit.

double

intercept()

Returns the intercept.

boolean

online()

Returns true if this is an online learner.

double

predict(double[] x)

Predicts the dependent variable of an instance.

double[]

predict(DataFrame df)

Predicts the dependent variables of a data frame.

double

predict(Tuple x)

Predicts the dependent variable of an instance.

double

pvalue()

Returns the p-value of goodness-of-fit test.

double[]

residuals()

Returns the residuals, which is response minus fitted values.

double

RSquared()

Returns R² statistic.

double

RSS()

Returns the residual sum of squares.

StructType

schema()

Returns the schema of predictors.

String

toString()

double[][]

ttest()

Returns the t-test of the coefficients (including intercept).

void

update(double[] x, double y)

Growing window recursive least squares with lambda = 1.

void

update(double[] x, double y, double lambda)

Recursive least squares.

void

update(DataFrame data)

Online update the regression model with a new data frame.

void

update(Tuple data)

Online update the regression model with a new training instance.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface smile.regression.Regression
applyAsDouble, predict, predict, predict, update, update, update

Constructor Details
- LinearModel
  
  public LinearModel(Formula formula, StructType schema, Matrix X, double[] y, double[] w, double b)
  
  Constructor.
  
  Parameters:
  
  formula - a symbolic description of the model to be fitted.
  
  schema - the schema of input data.
  
  X - the design matrix.
  
  y - the responsible variable.
  
  w - the linear weights.
  
  b - the intercept.
Method Details
- formula
  
  public Formula formula()
  
  Description copied from interface: DataFrameRegression
  
  Returns the model formula.
  
  Specified by:
  
  formula in interface DataFrameRegression
  
  Returns:
  
  the model formula.
- schema
  
  public StructType schema()
  
  Description copied from interface: DataFrameRegression
  
  Returns the schema of predictors.
  
  Specified by:
  
  schema in interface DataFrameRegression
  
  Returns:
  
  the schema of predictors.
- ttest
  
  public double[][] ttest()
  
  Returns the t-test of the coefficients (including intercept). The first column is the coefficients, the second column is the standard error of coefficients, the third column is the t-score of the hypothesis test if the coefficient is zero, the fourth column is the p-values of test. The last row is of intercept.
  
  Returns:
  
  the t-test of the coefficients.
- coefficients
  
  public double[] coefficients()
  
  Returns the linear coefficients without intercept.
  
  Returns:
  
  the linear coefficients without intercept.
- intercept
  
  public double intercept()
  
  Returns the intercept.
  
  Returns:
  
  the intercept.
- residuals
  
  public double[] residuals()
  
  Returns the residuals, which is response minus fitted values.
  
  Returns:
  
  the residuals
- fittedValues
  
  public double[] fittedValues()
  
  Returns the fitted values.
  
  Returns:
  
  the fitted values.
- RSS
  
  public double RSS()
  
  Returns the residual sum of squares.
  
  Returns:
  
  the residual sum of squares.
- error
  
  public double error()
  
  Returns the residual standard error.
  
  Returns:
  
  the residual standard error.
- df
  
  public int df()
  
  Returns the degree-of-freedom of residual standard error.
  
  Returns:
  
  the degree-of-freedom of residual standard error.
- RSquared
  
  public double RSquared()
  
  Returns R² statistic. In regression, the R² coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R² of 1.0 indicates that the regression line perfectly fits the data.
  In the case of ordinary least-squares regression, R² increases as we increase the number of variables in the model (R² will not decrease). This illustrates a drawback to one possible use of R², where one might try to include more variables in the model until "there is no more improvement". This leads to the alternative approach of looking at the adjusted R².
  
  Returns:
  
  R² statistic.
- adjustedRSquared
  
  public double adjustedRSquared()
  
  Returns adjusted R² statistic. The adjusted R² has almost same explanation as R², but it penalizes the statistic as extra variables are included in the model.
  
  Returns:
  
  adjusted R² statistic.
- ftest
  
  public double ftest()
  
  Returns the F-statistic of goodness-of-fit.
  
  Returns:
  
  the F-statistic of goodness-of-fit.
- pvalue
  
  public double pvalue()
  
  Returns the p-value of goodness-of-fit test.
  
  Returns:
  
  the p-value of goodness-of-fit test.
- predict
  
  public double predict(double[] x)
  
  Predicts the dependent variable of an instance.
  
  Parameters:
  
  x - an instance.
  
  Returns:
  
  the predicted value of dependent variable.
- predict
  
  public double predict(Tuple x)
  
  Description copied from interface: Regression
  
  Predicts the dependent variable of an instance.
  
  Specified by:
  
  predict in interface Regression<Tuple>
  
  Parameters:
  
  x - an instance.
  
  Returns:
  
  the predicted value of dependent variable.
- predict
  
  public double[] predict(DataFrame df)
  
  Description copied from interface: DataFrameRegression
  
  Predicts the dependent variables of a data frame.
  
  Specified by:
  
  predict in interface DataFrameRegression
  
  Parameters:
  
  df - the data frame.
  
  Returns:
  
  the predicted values.
- update
  
  public void update(Tuple data)
  
  Online update the regression model with a new training instance.
  
  Parameters:
  
  data - the training data.
- update
  
  public void update(DataFrame data)
  
  Online update the regression model with a new data frame.
  
  Parameters:
  
  data - the training data.
- online
  
  public boolean online()
  
  Description copied from interface: Regression
  
  Returns true if this is an online learner.
  
  Specified by:
  
  online in interface Regression<Tuple>
  
  Returns:
  
  true if online learner.
- update
  
  public void update(double[] x, double y)
  
  Growing window recursive least squares with lambda = 1. RLS updates an ordinary least squares with samples that arrive sequentially.
  
  Parameters:
  
  x - training instance.
  
  y - response variable.
- update
  
  public void update(double[] x, double y, double lambda)
  
  Recursive least squares. RLS updates an ordinary least squares with samples that arrive sequentially.
  In some adaptive configurations it can be useful not to give equal importance to all the historical data but to assign higher weights to the most recent data (and then to forget the oldest one). This may happen when the phenomenon underlying the data is non-stationary or when we want to approximate a nonlinear dependence by using a linear model which is local in time. Both these situations are common in adaptive control problems.
  
  Parameters:
  
  x - training instance.
  
  y - response variable.
  
  lambda - The forgetting factor in (0, 1]. The smaller lambda is, the smaller is the contribution of previous samples to the covariance matrix. This makes the filter more sensitive to recent samples, which means more fluctuations in the filter coefficients. The lambda = 1 case is referred to as the growing window RLS algorithm. In practice, lambda is usually chosen between 0.98 and 1.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object

Class LinearModel

Nested Class Summary

Nested classes/interfaces inherited from interface smile.regression.DataFrameRegression

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface smile.regression.Regression

Constructor Details

LinearModel

Method Details

formula

schema

ttest

coefficients

intercept

residuals

fittedValues

RSS

error

df

RSquared

adjustedRSquared

ftest

pvalue

predict

predict

predict

update

update

online

update

update

toString