Class Formula

java.lang.Object
smile.data.formula.Formula
All Implemented Interfaces:
Serializable

public class Formula extends Object implements Serializable
The model fitting formula in a compact symbolic form. An expression of the form y ~ model is interpreted as a specification that the response y is modelled by a linear predictor specified symbolically by model. Such a model consists of a series of terms separated by + operators. The terms themselves consist of variable and factor names separated by :: operators. Such a term is interpreted as the interaction of all the variables and factors appearing in the term. The special term "." means all columns not otherwise in the formula in the context of a data frame.

In addition to + and ::, a number of other operators are useful in model formulae. The && operator denotes factor crossing: a && b interpreted as a+b+a::b. The ^ operator indicates crossing to the specified degree. For example (a+b+c)^2 is identical to :(a+b+c)*(a+b+c) which in turn expands to a formula containing the main effects for a, b and c together with their second-order interactions. The - operator removes the specified terms, so that (a+b+c)^2 - a::b is identical to a + b + c + b::c + a::c. It can also be used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. A model with no intercept can be also specified as y ~ x + 0.

While formulae usually involve just variable and factor names, they can also involve arithmetic expressions. The formula log(y) ~ a + log(x), for example, is legal.

Note that the operators ~, +, ::, ^ are only available in Scala API.

See Also:
  • Constructor Details

    • Formula

      public Formula(Term response, Term... predictors)
      Constructor.
      Parameters:
      response - the left-hand side of formula, i.e. dependent variable.
      predictors - the right-hand side of formula, i.e. independent/predictor variables.
  • Method Details

    • predictors

      public Term[] predictors()
      Returns the predictors.
      Returns:
      the predictors.
    • response

      public Term response()
      Returns the response term.
      Returns:
      the response term.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • lhs

      public static Formula lhs(String lhs)
      Factory method. The predictors will be all the columns not otherwise in the formula in the context of a data frame.
      Parameters:
      lhs - the left-hand side of formula, i.e. dependent variable.
      Returns:
      the formula.
    • lhs

      public static Formula lhs(Term lhs)
      Factory method. The predictors will be all the columns not otherwise in the formula in the context of a data frame.
      Parameters:
      lhs - the left-hand side of formula, i.e. dependent variable.
      Returns:
      the formula.
    • rhs

      public static Formula rhs(String... predictors)
      Factory method. No response variable.
      Parameters:
      predictors - the right-hand side of formula, i.e. independent/predictor variables.
      Returns:
      the formula.
    • rhs

      public static Formula rhs(Term... predictors)
      Factory method. No response variable.
      Parameters:
      predictors - the right-hand side of formula, i.e. independent/predictor variables.
      Returns:
      the formula.
    • of

      public static Formula of(String response, String... predictors)
      Factory method.
      Parameters:
      response - the left-hand side of formula, i.e. dependent variable.
      predictors - the right-hand side of formula, i.e. independent/predictor variables.
      Returns:
      the formula.
    • of

      public static Formula of(String response, Term... predictors)
      Factory method.
      Parameters:
      response - the left-hand side of formula, i.e. dependent variable.
      predictors - the right-hand side of formula, i.e. independent/predictor variables.
      Returns:
      the formula.
    • of

      public static Formula of(Term response, Term... predictors)
      Factory method.
      Parameters:
      response - the left-hand side of formula, i.e. dependent variable.
      predictors - the right-hand side of formula, i.e. independent/predictor variables.
      Returns:
      the formula.
    • of

      public static Formula of(String s)
      Parses a formula string.
      Parameters:
      s - the string representation of formula.
      Returns:
      the formula.
    • expand

      public Formula expand(StructType inputSchema)
      Expands the Dot and FactorCrossing terms on the given schema.
      Parameters:
      inputSchema - the schema to expand on
      Returns:
      the expanded formula.
    • bind

      public StructType bind(StructType inputSchema)
      Binds the formula to a schema and returns the schema of predictors.
      Parameters:
      inputSchema - the schema to bind with
      Returns:
      the data structure of output data frame.
    • apply

      public Tuple apply(Tuple tuple)
      Apply the formula on a tuple to generate the model data.
      Parameters:
      tuple - the input tuple.
      Returns:
      the output tuple.
    • x

      public Tuple x(Tuple tuple)
      Apply the formula on a tuple to generate the predictor data.
      Parameters:
      tuple - the input tuple.
      Returns:
      the output tuple.
    • frame

      public DataFrame frame(DataFrame data)
      Returns a data frame of predictors and optionally response variable (if input data frame has the related variable(s)).
      Parameters:
      data - The input data frame.
      Returns:
      the output data frame.
    • x

      public DataFrame x(DataFrame data)
      Returns a data frame of predictors.
      Parameters:
      data - The input data frame.
      Returns:
      the data frame of predictors.
    • matrix

      public Matrix matrix(DataFrame data)
      Returns the design matrix of predictors. All categorical variables will be dummy encoded. If the formula doesn't have an Intercept term, the bias column will be included. Otherwise, it is based on the setting of Intercept term.
      Parameters:
      data - The input data frame.
      Returns:
      the design matrix.
    • matrix

      public Matrix matrix(DataFrame data, boolean bias)
      Returns the design matrix of predictors. All categorical variables will be dummy encoded.
      Parameters:
      data - The input data frame.
      bias - If true, include the bias column.
      Returns:
      the design matrix.
    • y

      public BaseVector y(DataFrame data)
      Returns the response vector.
      Parameters:
      data - The input data frame.
      Returns:
      the response vector.
    • y

      public double y(Tuple tuple)
      Returns the real-valued response value.
      Parameters:
      tuple - the input tuple.
      Returns:
      the response variable.
    • yint

      public int yint(Tuple tuple)
      Returns the integer-valued response value.
      Parameters:
      tuple - the input tuple.
      Returns:
      the response variable.