Class OLS

java.lang.Object
smile.regression.OLS

public class OLS extends Object
Ordinary least squares. In linear regression, the model specification is that the dependent variable is a linear combination of the parameters (but need not be linear in the independent variables). The residual is the difference between the value of the dependent variable predicted by the model, and the true value of the dependent variable. Ordinary least squares obtains parameter estimates that minimize the sum of squared residuals, SSE (also denoted RSS).

The OLS estimator is consistent when the independent variables are exogenous and there is no multicollinearity, and optimal in the class of linear unbiased estimators when the errors are homoscedastic and serially uncorrelated. Under these conditions, the method of OLS provides minimum-variance mean-unbiased estimation when the errors have finite variances.

There are several different frameworks in which the linear regression model can be cast in order to make the OLS technique applicable. Each of these settings produces the same formulas and same results, the only difference is the interpretation and the assumptions which have to be imposed in order for the method to give meaningful results. The choice of the applicable framework depends mostly on the nature of data at hand, and on the inference task which has to be performed.

Least squares corresponds to the maximum likelihood criterion if the experimental errors have a normal distribution and can also be derived as a method of moments estimator.

Once a regression model has been constructed, it may be important to confirm the goodness of fit of the model and the statistical significance of the estimated parameters. Commonly used checks of goodness of fit include the R-squared, analysis of the pattern of residuals and hypothesis testing. Statistical significance can be checked by an F-test of the overall fit, followed by t-tests of individual parameters.

Interpretations of these diagnostic tests rest heavily on the model assumptions. Although examination of the residuals can be used to invalidate a model, the results of a t-test or F-test are sometimes more difficult to interpret if the model's assumptions are violated. For example, if the error term does not have a normal distribution, in small samples the estimated parameters will not follow normal distributions and complicate inference. With relatively large samples, however, a central limit theorem can be invoked such that hypothesis testing may proceed using asymptotic approximations.

  • Constructor Details

    • OLS

      public OLS()
  • Method Details

    • fit

      public static LinearModel fit(Formula formula, DataFrame data)
      Fits an ordinary least squares model.
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of the explanatory and response variables. NO NEED to include a constant column of 1s for bias.
      Returns:
      the model.
    • fit

      public static LinearModel fit(Formula formula, DataFrame data, Properties params)
      Fits an ordinary least squares model. The hyper-parameters in prop include
      • smile.ols.method (default "svd") is a string (svd or qr) for the fitting method
      • smile.ols.standard.error (default true) is a boolean. If true, compute the estimated standard errors of the estimate of parameters
      • smile.ols.recursive (default true) is a boolean. If true, the return model supports recursive least squares
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of the explanatory and response variables. NO NEED to include a constant column of 1s for bias.
      params - the hyper-parameters.
      Returns:
      the model.
    • fit

      public static LinearModel fit(Formula formula, DataFrame data, String method, boolean stderr, boolean recursive)
      Fits an ordinary least squares model.
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of the explanatory and response variables. NO NEED to include a constant column of 1s for bias.
      method - the fitting method ("svd" or "qr").
      stderr - if true, compute the standard errors of the estimate of parameters.
      recursive - if true, the return model supports recursive least squares.
      Returns:
      the model.