Class LinearModel

java.lang.Object
smile.regression.LinearModel
All Implemented Interfaces:
Serializable, ToDoubleFunction<Tuple>, DataFrameRegression, Regression<Tuple>

public class LinearModel extends Object implements DataFrameRegression
Linear model. In linear regression, the model specification is that the dependent variable is a linear combination of the parameters (but need not be linear in the independent variables). The residual is the difference between the value of the dependent variable predicted by the model, and the true value of the dependent variable.

Once a regression model has been constructed, it may be important to confirm the goodness of fit of the model and the statistical significance of the estimated parameters. Commonly used checks of goodness of fit include the R-squared, analysis of the pattern of residuals and hypothesis testing. Statistical significance can be checked by an F-test of the overall fit, followed by t-tests of individual parameters.

Interpretations of these diagnostic tests rest heavily on the model assumptions. Although examination of the residuals can be used to invalidate a model, the results of a t-test or F-test are sometimes more difficult to interpret if the model's assumptions are violated. For example, if the error term does not have a normal distribution, in small samples the estimated parameters will not follow normal distributions and complicate inference. With relatively large samples, however, a central limit theorem can be invoked such that hypothesis testing may proceed using asymptotic approximations.

See Also:
  • Constructor Details

    • LinearModel

      public LinearModel(Formula formula, StructType schema, Matrix X, double[] y, double[] w, double b)
      Constructor.
      Parameters:
      formula - a symbolic description of the model to be fitted.
      schema - the schema of input data.
      X - the design matrix.
      y - the responsible variable.
      w - the linear weights.
      b - the intercept.
  • Method Details

    • formula

      public Formula formula()
      Description copied from interface: DataFrameRegression
      Returns the model formula.
      Specified by:
      formula in interface DataFrameRegression
      Returns:
      the model formula.
    • schema

      public StructType schema()
      Description copied from interface: DataFrameRegression
      Returns the schema of predictors.
      Specified by:
      schema in interface DataFrameRegression
      Returns:
      the schema of predictors.
    • ttest

      public double[][] ttest()
      Returns the t-test of the coefficients (including intercept). The first column is the coefficients, the second column is the standard error of coefficients, the third column is the t-score of the hypothesis test if the coefficient is zero, the fourth column is the p-values of test. The last row is of intercept.
      Returns:
      the t-test of the coefficients.
    • coefficients

      public double[] coefficients()
      Returns the linear coefficients without intercept.
      Returns:
      the linear coefficients without intercept.
    • intercept

      public double intercept()
      Returns the intercept.
      Returns:
      the intercept.
    • residuals

      public double[] residuals()
      Returns the residuals, which is response minus fitted values.
      Returns:
      the residuals
    • fittedValues

      public double[] fittedValues()
      Returns the fitted values.
      Returns:
      the fitted values.
    • RSS

      public double RSS()
      Returns the residual sum of squares.
      Returns:
      the residual sum of squares.
    • error

      public double error()
      Returns the residual standard error.
      Returns:
      the residual standard error.
    • df

      public int df()
      Returns the degree-of-freedom of residual standard error.
      Returns:
      the degree-of-freedom of residual standard error.
    • RSquared

      public double RSquared()
      Returns R2 statistic. In regression, the R2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1.0 indicates that the regression line perfectly fits the data.

      In the case of ordinary least-squares regression, R2 increases as we increase the number of variables in the model (R2 will not decrease). This illustrates a drawback to one possible use of R2, where one might try to include more variables in the model until "there is no more improvement". This leads to the alternative approach of looking at the adjusted R2.

      Returns:
      R2 statistic.
    • adjustedRSquared

      public double adjustedRSquared()
      Returns adjusted R2 statistic. The adjusted R2 has almost same explanation as R2, but it penalizes the statistic as extra variables are included in the model.
      Returns:
      adjusted R2 statistic.
    • ftest

      public double ftest()
      Returns the F-statistic of goodness-of-fit.
      Returns:
      the F-statistic of goodness-of-fit.
    • pvalue

      public double pvalue()
      Returns the p-value of goodness-of-fit test.
      Returns:
      the p-value of goodness-of-fit test.
    • predict

      public double predict(double[] x)
      Predicts the dependent variable of an instance.
      Parameters:
      x - an instance.
      Returns:
      the predicted value of dependent variable.
    • predict

      public double predict(Tuple x)
      Description copied from interface: Regression
      Predicts the dependent variable of an instance.
      Specified by:
      predict in interface Regression<Tuple>
      Parameters:
      x - an instance.
      Returns:
      the predicted value of dependent variable.
    • predict

      public double[] predict(DataFrame df)
      Description copied from interface: DataFrameRegression
      Predicts the dependent variables of a data frame.
      Specified by:
      predict in interface DataFrameRegression
      Parameters:
      df - the data frame.
      Returns:
      the predicted values.
    • update

      public void update(Tuple data)
      Online update the regression model with a new training instance.
      Parameters:
      data - the training data.
    • update

      public void update(DataFrame data)
      Online update the regression model with a new data frame.
      Parameters:
      data - the training data.
    • online

      public boolean online()
      Description copied from interface: Regression
      Returns true if this is an online learner.
      Specified by:
      online in interface Regression<Tuple>
      Returns:
      true if online learner.
    • update

      public void update(double[] x, double y)
      Growing window recursive least squares with lambda = 1. RLS updates an ordinary least squares with samples that arrive sequentially.
      Parameters:
      x - training instance.
      y - response variable.
    • update

      public void update(double[] x, double y, double lambda)
      Recursive least squares. RLS updates an ordinary least squares with samples that arrive sequentially.

      In some adaptive configurations it can be useful not to give equal importance to all the historical data but to assign higher weights to the most recent data (and then to forget the oldest one). This may happen when the phenomenon underlying the data is non-stationary or when we want to approximate a nonlinear dependence by using a linear model which is local in time. Both these situations are common in adaptive control problems.

      Parameters:
      x - training instance.
      y - response variable.
      lambda - The forgetting factor in (0, 1]. The smaller lambda is, the smaller is the contribution of previous samples to the covariance matrix. This makes the filter more sensitive to recent samples, which means more fluctuations in the filter coefficients. The lambda = 1 case is referred to as the growing window RLS algorithm. In practice, lambda is usually chosen between 0.98 and 1.
    • toString

      public String toString()
      Overrides:
      toString in class Object