Class Formula
- All Implemented Interfaces:
Serializable, AutoCloseable
y ~ model is interpreted as a
specification that the response y is modelled by a linear predictor
specified symbolically by model. Such a model consists of a series
of terms separated by + operators. The terms themselves
consist of variable and factor names separated by :: operators.
Such a term is interpreted as the interaction of all the variables and
factors appearing in the term. The special term "." means
all columns not otherwise in the formula in the context of a data frame.
In addition to + and ::, a number of other operators
are useful in model formulae. The && operator denotes factor
crossing: a && b interpreted as a+b+a::b. The ^
operator indicates crossing to the specified degree. For example
(a+b+c)^2 is identical to :(a+b+c)*(a+b+c) which in turn
expands to a formula containing the main effects for a,
b and c together with their second-order interactions.
The - operator removes the specified terms, so that
(a+b+c)^2 - a::b is identical to a + b + c + b::c + a::c.
It can also be used to remove the intercept term: when fitting a linear model
y ~ x - 1 specifies a line through the origin. A model with
no intercept can be also specified as y ~ x + 0.
While formulae usually involve just variable and factor names, they
can also involve arithmetic expressions. The formula
log(y) ~ a + log(x), for example, is legal.
Note that the operators ~, +, ::, ^
are only available in Scala API.
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionApply the formula on a tuple to generate the model data.bind(StructType inputSchema) Binds the formula to a schema and returns the schema of predictors.voidclose()booleanexpand(StructType inputSchema) Expands the Dot and FactorCrossing terms on the given schema.Returns a data frame of predictors and optionally response variable (if input data frame has the related variable(s)).static FormulaFactory method.static FormulaFactory method.Returns the design matrix of predictors.Returns the design matrix of predictors.static FormulaParses a formula string.static FormulaFactory method.static FormulaFactory method.static FormulaFactory method.Term[]Returns the predictors.response()Returns the response term.static FormulaFactory method.static FormulaFactory method.toString()Returns a data frame of predictors.Apply the formula on a tuple to generate the predictor data.Returns the response vector.doubleReturns the real-valued response value.intReturns the integer-valued response value.
-
Constructor Details
-
Formula
-
-
Method Details
-
predictors
-
response
-
close
public void close()- Specified by:
closein interfaceAutoCloseable
-
toString
-
equals
-
lhs
-
lhs
-
rhs
-
rhs
-
of
-
of
-
of
-
of
-
expand
Expands the Dot and FactorCrossing terms on the given schema.- Parameters:
inputSchema- the schema to expand on- Returns:
- the expanded formula.
-
bind
Binds the formula to a schema and returns the schema of predictors.- Parameters:
inputSchema- the schema to bind with- Returns:
- the data structure of output data frame.
-
apply
-
x
-
frame
-
x
-
matrix
Returns the design matrix of predictors. All categorical variables will be dummy encoded. If the formula doesn't have an Intercept term, the bias column will be included. Otherwise, it is based on the setting of Intercept term.- Parameters:
data- The input data frame.- Returns:
- the design matrix.
-
matrix
Returns the design matrix of predictors. All categorical variables will be dummy encoded.- Parameters:
data- The input data frame.bias- If true, include the bias column.- Returns:
- the design matrix.
-
y
Returns the response vector.- Parameters:
data- The input data frame.- Returns:
- the response vector.
-
y
Returns the real-valued response value.- Parameters:
tuple- the input tuple.- Returns:
- the response variable.
-
yint
Returns the integer-valued response value.- Parameters:
tuple- the input tuple.- Returns:
- the response variable.
-