public class Formula
extends java.lang.Object
implements java.io.Serializable
In addition to + and ::, a number of other operators are useful in model formulae. The && operator denotes factor crossing: a && b interpreted as a+b+a::b. The ^ operator indicates crossing to the specified degree. For example (a+b+c)^2 is identical to (a+b+c)*(a+b+c) which in turn expands to a formula containing the main effects for a, b and c together with their second-order interactions. The - operator removes the specified terms, so that (a+b+c)^2 - a::b is identical to a + b + c + b::c + a::c. It can also used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. A model with no intercept can be also specified as y ~ x + 0 or y ~ 0 + x.
While formulae usually involve just variable and factor names, they can also involve arithmetic expressions. The formula log(y) ~ a + log(x) is quite legal.
Note that the operators ~, +, ::, ^ are only available in Scala API.
Constructor and Description |
---|
Formula(Term response,
Term... predictors)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
Tuple |
apply(Tuple t)
Apply the formula on a tuple to generate the model data.
|
StructType |
bind(StructType inputSchema)
Binds the formula to a schema and returns the schema of predictors.
|
boolean |
equals(java.lang.Object o) |
Formula |
expand(StructType inputSchema)
Expands the Dot and FactorCrossing terms on the given schema.
|
DataFrame |
frame(DataFrame df)
Returns a data frame of predictors and optionally response variable
(if input data frame has the related variable(s)).
|
static Formula |
lhs(java.lang.String lhs)
Factory method.
|
static Formula |
lhs(Term lhs)
Factory method.
|
Matrix |
matrix(DataFrame df)
Returns the design matrix of predictors.
|
Matrix |
matrix(DataFrame df,
boolean bias)
Returns the design matrix of predictors.
|
static Formula |
of(java.lang.String response,
java.lang.String... predictors)
Factory method.
|
static Formula |
of(java.lang.String response,
Term... predictors)
Factory method.
|
static Formula |
of(Term response,
Term... predictors)
Factory method.
|
Term[] |
predictors()
Returns the predictors.
|
Term |
response()
Returns the response term.
|
static Formula |
rhs(java.lang.String... predictors)
Factory method.
|
static Formula |
rhs(Term... predictors)
Factory method.
|
java.lang.String |
toString() |
DataFrame |
x(DataFrame df)
Returns a data frame of predictors.
|
Tuple |
x(Tuple t)
Apply the formula on a tuple to generate the predictors data.
|
BaseVector |
y(DataFrame df)
Returns the response vector.
|
double |
y(Tuple t)
Returns the real-valued response value.
|
int |
yint(Tuple t)
Returns the integer-valued response value.
|
public Term[] predictors()
public Term response()
public java.lang.String toString()
toString
in class java.lang.Object
public boolean equals(java.lang.Object o)
equals
in class java.lang.Object
public static Formula lhs(java.lang.String lhs)
lhs
- the left-hand side of formula, i.e. dependent variable.public static Formula lhs(Term lhs)
lhs
- the left-hand side of formula, i.e. dependent variable.public static Formula rhs(java.lang.String... predictors)
predictors
- the right-hand side of formula, i.e. independent/predictor variables.public static Formula rhs(Term... predictors)
predictors
- the right-hand side of formula, i.e. independent/predictor variables.public static Formula of(java.lang.String response, java.lang.String... predictors)
response
- the left-hand side of formula, i.e. dependent variable.predictors
- the right-hand side of formula, i.e. independent/predictor variables.public static Formula of(java.lang.String response, Term... predictors)
response
- the left-hand side of formula, i.e. dependent variable.predictors
- the right-hand side of formula, i.e. independent/predictor variables.public static Formula of(Term response, Term... predictors)
response
- the left-hand side of formula, i.e. dependent variable.predictors
- the right-hand side of formula, i.e. independent/predictor variables.public Formula expand(StructType inputSchema)
inputSchema
- the schema to expand onpublic StructType bind(StructType inputSchema)
inputSchema
- the schema to bind withpublic DataFrame frame(DataFrame df)
df
- The input DataFrame.public DataFrame x(DataFrame df)
df
- The input DataFrame.public Matrix matrix(DataFrame df)
df
- The input DataFrame.public Matrix matrix(DataFrame df, boolean bias)
df
- The input DataFrame.bias
- If true, include the bias column.public BaseVector y(DataFrame df)
df
- The input DataFrame.public double y(Tuple t)
public int yint(Tuple t)