Class Formula
- All Implemented Interfaces:
Serializable
y ~ model
is interpreted as a
specification that the response y is modelled by a linear predictor
specified symbolically by model. Such a model consists of a series
of terms separated by +
operators. The terms themselves
consist of variable and factor names separated by ::
operators.
Such a term is interpreted as the interaction of all the variables and
factors appearing in the term. The special term "."
means
all columns not otherwise in the formula in the context of a data frame.
In addition to +
and ::
, a number of other operators
are useful in model formulae. The &&
operator denotes factor
crossing: a && b
interpreted as a+b+a::b
. The ^
operator indicates crossing to the specified degree. For example
(a+b+c)^2
is identical to :(a+b+c)*(a+b+c)
which in turn
expands to a formula containing the main effects for a
,
b
and c
together with their second-order interactions.
The -
operator removes the specified terms, so that
(a+b+c)^2 - a::b
is identical to a + b + c + b::c + a::c
.
It can also be used to remove the intercept term: when fitting a linear model
y ~ x - 1
specifies a line through the origin. A model with
no intercept can be also specified as y ~ x + 0
.
While formulae usually involve just variable and factor names, they
can also involve arithmetic expressions. The formula
log(y) ~ a + log(x)
, for example, is legal.
Note that the operators ~
, +
, ::
, ^
are only available in Scala API.
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionApply the formula on a tuple to generate the model data.bind
(StructType inputSchema) Binds the formula to a schema and returns the schema of predictors.boolean
expand
(StructType inputSchema) Expands the Dot and FactorCrossing terms on the given schema.Returns a data frame of predictors and optionally response variable (if input data frame has the related variable(s)).static Formula
Factory method.static Formula
Factory method.Returns the design matrix of predictors.Returns the design matrix of predictors.static Formula
Parses a formula string.static Formula
Factory method.static Formula
Factory method.static Formula
Factory method.Term[]
Returns the predictors.response()
Returns the response term.static Formula
Factory method.static Formula
Factory method.toString()
Returns a data frame of predictors.Apply the formula on a tuple to generate the predictor data.Returns the response vector.double
Returns the real-valued response value.int
Returns the integer-valued response value.
-
Constructor Details
-
Formula
Constructor.- Parameters:
response
- the left-hand side of formula, i.e. dependent variable.predictors
- the right-hand side of formula, i.e. independent/predictor variables.
-
-
Method Details
-
predictors
Returns the predictors.- Returns:
- the predictors.
-
response
Returns the response term.- Returns:
- the response term.
-
toString
-
equals
-
lhs
Factory method. The predictors will be all the columns not otherwise in the formula in the context of a data frame.- Parameters:
lhs
- the left-hand side of formula, i.e. dependent variable.- Returns:
- the formula.
-
lhs
Factory method. The predictors will be all the columns not otherwise in the formula in the context of a data frame.- Parameters:
lhs
- the left-hand side of formula, i.e. dependent variable.- Returns:
- the formula.
-
rhs
Factory method. No response variable.- Parameters:
predictors
- the right-hand side of formula, i.e. independent/predictor variables.- Returns:
- the formula.
-
rhs
Factory method. No response variable.- Parameters:
predictors
- the right-hand side of formula, i.e. independent/predictor variables.- Returns:
- the formula.
-
of
Factory method.- Parameters:
response
- the left-hand side of formula, i.e. dependent variable.predictors
- the right-hand side of formula, i.e. independent/predictor variables.- Returns:
- the formula.
-
of
Factory method.- Parameters:
response
- the left-hand side of formula, i.e. dependent variable.predictors
- the right-hand side of formula, i.e. independent/predictor variables.- Returns:
- the formula.
-
of
Factory method.- Parameters:
response
- the left-hand side of formula, i.e. dependent variable.predictors
- the right-hand side of formula, i.e. independent/predictor variables.- Returns:
- the formula.
-
of
Parses a formula string.- Parameters:
s
- the string representation of formula.- Returns:
- the formula.
-
expand
Expands the Dot and FactorCrossing terms on the given schema.- Parameters:
inputSchema
- the schema to expand on- Returns:
- the expanded formula.
-
bind
Binds the formula to a schema and returns the schema of predictors.- Parameters:
inputSchema
- the schema to bind with- Returns:
- the data structure of output data frame.
-
apply
Apply the formula on a tuple to generate the model data.- Parameters:
tuple
- the input tuple.- Returns:
- the output tuple.
-
x
Apply the formula on a tuple to generate the predictor data.- Parameters:
tuple
- the input tuple.- Returns:
- the output tuple.
-
frame
Returns a data frame of predictors and optionally response variable (if input data frame has the related variable(s)).- Parameters:
data
- The input data frame.- Returns:
- the output data frame.
-
x
Returns a data frame of predictors.- Parameters:
data
- The input data frame.- Returns:
- the data frame of predictors.
-
matrix
Returns the design matrix of predictors. All categorical variables will be dummy encoded. If the formula doesn't have an Intercept term, the bias column will be included. Otherwise, it is based on the setting of Intercept term.- Parameters:
data
- The input data frame.- Returns:
- the design matrix.
-
matrix
Returns the design matrix of predictors. All categorical variables will be dummy encoded.- Parameters:
data
- The input data frame.bias
- If true, include the bias column.- Returns:
- the design matrix.
-
y
Returns the response vector.- Parameters:
data
- The input data frame.- Returns:
- the response vector.
-
y
Returns the real-valued response value.- Parameters:
tuple
- the input tuple.- Returns:
- the response variable.
-
yint
Returns the integer-valued response value.- Parameters:
tuple
- the input tuple.- Returns:
- the response variable.
-