Record Class DataFrame
java.lang.Object
java.lang.Record
smile.data.DataFrame
- Record Components:
schema- the schema of DataFrame.columns- the columns of DataFrame.index- the optional row index.
- All Implemented Interfaces:
Serializable, Iterable<Row>
public record DataFrame(StructType schema, List<ValueVector> columns, RowIndex index)
extends Record
implements Iterable<Row>, Serializable
Two-dimensional, potentially heterogeneous tabular data.
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionDataFrame(RowIndex index, ValueVector... columns) Constructor.DataFrame(StructType schema, List<ValueVector> columns, RowIndex index) Constructor.DataFrame(ValueVector... columns) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionadd(ValueVector... vectors) Adds columns to this data frame.apply(boolean[] index) Returns a new data frame with boolean indexing.apply(int i) Returns the row at the specified index.apply(int i, int j) Returns the cell at (i, j).Returns the column of given name.Returns a new DataFrame with selected columns.Returns a new data frame with row indexing.column(int j) Returns the j-th column.Returns the column of given name.columns()Returns the value of thecolumnsrecord component.Concatenates data frames vertically by rows.describe()Returns the data structure and statistics.drop(int... indices) Returns a new DataFrame without selected columns.Returns a new DataFrame without selected columns.dropna()Returns a new data frame without rows that have null/missing values.DataType[]dtypes()Returns the column data types.final booleanIndicates whether some other object is "equal to" this one.Returns a new DataFrame with given columns converted to nominal.fillna(double value) Fills null/NaN/Inf values of numeric columns with the specified value.get(boolean[] index) Returns a new data frame with boolean indexing.get(int i) Returns the row at the specified index.get(int i, int j) Returns the cell at (i, j).Returns a new data frame with row indexing.doublegetDouble(int i, int j) Returns the double value at position (i, j).floatgetFloat(int i, int j) Returns the float value at position (i, j).intgetInt(int i, int j) Returns the int value at position (i, j).longgetLong(int i, int j) Returns the long value at position (i, j).getScale(int i, int j) Returns the value at position (i, j) of NominalScale or OrdinalScale.getString(int i, int j) Returns the string representation of the value at position (i, j).final inthashCode()Returns a hash code value for this object.head(int numRows) Returns the string representation of top rows.index()Returns the value of theindexrecord component.booleanisEmpty()Returns true if the data frame is empty.booleanisNullAt(int i, int j) Checks whether the value at position (i, j) is null or missing value.iterator()Joins two data frames on their index.Returns the row with the specified index.Returns a new data frame with specified rows.Measure[]measures()Returns the column's level of measurements.Merges data frames horizontally by columns.String[]names()Returns the column names.intncol()Returns the number of columns.intnrow()Returns the number of rows.static DataFrameCreates a DataFrame from a 2-dimensional array.static DataFrameCreates a DataFrame from a 2-dimensional array.static DataFrameCreates a DataFrame from a 2-dimensional array.static <T> DataFrameCreates a DataFrame from a collection of objects.static DataFrameCreates a DataFrame from a JDBC ResultSet.static DataFrameof(StructType schema, List<? extends Tuple> data) Creates a DataFrame from a set of tuples.static DataFrameof(StructType schema, Stream<? extends Tuple> data) Creates a DataFrame from a stream of tuples.schema()Returns the value of theschemarecord component.select(int... indices) Returns a new DataFrame with selected columns.Returns a new DataFrame with selected columns.voidSets the value at position (i, j).set(String name, ValueVector column) Sets the column values.Sets the DataFrame index.Sets the DataFrame index using existing column.intshape(int dim) Returns the size of given dimension.intsize()Returns the number of rows.stream()Returns a (possibly parallel) Stream of rows.tail(int numRows) Returns the string representation of bottom rows.double[][]toArray(boolean bias, CategoricalEncoder encoder, String... names) Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix.double[][]Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix.toList()Returns theListof rows.toMatrix()Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix.toMatrix(boolean bias, CategoricalEncoder encoder, String rowNames) Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix.toString()Returns a string representation of this record class.toString(int from, int to, boolean truncate) Returns the string representation of rows in specified range.voidUpdates the value at position (i, j).update(String name, ValueVector column) Sets the column values.Methods inherited from interface Iterable
forEach, spliterator
-
Constructor Details
-
DataFrame
Constructor. -
DataFrame
Constructor.- Parameters:
columns- the columns of DataFrame.
-
DataFrame
Constructor.- Parameters:
index- the row index.columns- the columns of DataFrame.
-
-
Method Details
-
toString
-
names
-
dtypes
-
measures
Returns the column's level of measurements.- Returns:
- the column's level of measurements.
-
shape
public int shape(int dim) Returns the size of given dimension. For pandas user's convenience.- Parameters:
dim- the dimension index.- Returns:
- the size of given dimension.
-
size
public int size()Returns the number of rows. This is an alias tonrowfor Java's convention.- Returns:
- the number of rows.
-
nrow
public int nrow()Returns the number of rows.- Returns:
- the number of rows.
-
ncol
public int ncol()Returns the number of columns.- Returns:
- the number of columns.
-
isEmpty
public boolean isEmpty()Returns true if the data frame is empty.- Returns:
- true if the data frame is empty.
-
setIndex
-
setIndex
-
column
Returns the j-th column.- Parameters:
j- the column index.- Returns:
- the column vector.
-
column
Returns the column of given name.- Parameters:
name- the column name.- Returns:
- the column vector.
-
apply
Returns the column of given name. This is an alias tocolumnfor Scala's convenience.- Parameters:
name- the column name.- Returns:
- the column vector.
-
apply
-
get
Returns the row at the specified index.- Parameters:
i- the row index.- Returns:
- the i-th row.
-
apply
-
loc
-
loc
-
get
-
apply
-
get
Returns a new data frame with boolean indexing.- Parameters:
index- the boolean indexing.- Returns:
- the data frame of selected rows.
-
apply
-
isNullAt
public boolean isNullAt(int i, int j) Checks whether the value at position (i, j) is null or missing value.- Parameters:
i- the row index.j- the column index.- Returns:
- true if the cell value is null.
-
get
Returns the cell at (i, j).- Parameters:
i- the row index.j- the column index.- Returns:
- the cell value.
-
apply
-
getInt
public int getInt(int i, int j) Returns the int value at position (i, j).- Parameters:
i- the row index.j- the column index.- Returns:
- the int value of cell.
-
getLong
public long getLong(int i, int j) Returns the long value at position (i, j).- Parameters:
i- the row index.j- the column index.- Returns:
- the long value of cell.
-
getFloat
public float getFloat(int i, int j) Returns the float value at position (i, j).- Parameters:
i- the row index.j- the column index.- Returns:
- the float value of cell.
-
getDouble
public double getDouble(int i, int j) Returns the double value at position (i, j).- Parameters:
i- the row index.j- the column index.- Returns:
- the double value of cell.
-
getString
Returns the string representation of the value at position (i, j).- Parameters:
i- the row index.j- the column index.- Returns:
- the string representation of cell value.
-
getScale
Returns the value at position (i, j) of NominalScale or OrdinalScale.- Parameters:
i- the row index.j- the column index.- Returns:
- the cell scale.
- Throws:
ClassCastException- when the data is not nominal or ordinal.
-
set
Sets the value at position (i, j).- Parameters:
i- the row index.j- the column index.value- the new value.
-
update
-
stream
-
iterator
-
toList
-
dropna
Returns a new data frame without rows that have null/missing values.- Returns:
- the data frame without null/missing values.
-
fillna
Fills null/NaN/Inf values of numeric columns with the specified value.- Parameters:
value- the value to replace NAs.- Returns:
- this data frame.
-
select
Returns a new DataFrame with selected columns.- Parameters:
indices- the column indices.- Returns:
- a new DataFrame with selected columns.
-
select
-
drop
Returns a new DataFrame without selected columns.- Parameters:
indices- the column indices.- Returns:
- a new DataFrame without selected columns.
-
drop
-
add
Adds columns to this data frame.- Parameters:
vectors- the columns to add.- Returns:
- this dataframe.
-
set
Sets the column values. If the column does not exist, adds it as the last column of the dataframe.- Parameters:
name- the column name.column- the new column value.- Returns:
- this dataframe.
-
update
Sets the column values. If the column does not exist, adds it as the last column of the dataframe. This is an alias tosetfor Scala's convenience.- Parameters:
name- the column name.column- the new column value.- Returns:
- this dataframe.
-
join
-
merge
-
concat
-
factorize
-
toArray
Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN. No bias term and uses level encoding for categorical variables.- Parameters:
columns- the columns to export. If empty, all columns will be used.- Returns:
- the numeric array.
-
toArray
Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN.- Parameters:
bias- if true, add the first column of all 1's.encoder- the categorical variable encoder.names- the columns to export. If empty, all columns will be used.- Returns:
- the numeric array.
-
toMatrix
Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN.- Returns:
- the numeric matrix.
-
toMatrix
Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN. No bias term and uses level encoding for categorical variables.- Parameters:
bias- if true, add the first column of all 1's.encoder- the categorical variable encoder.rowNames- the column to be used as row names.- Returns:
- the numeric matrix.
-
describe
Returns the data structure and statistics.- Returns:
- the data structure and statistics.
-
head
Returns the string representation of top rows.- Parameters:
numRows- the number of rows to show.- Returns:
- the string representation of top rows.
-
tail
Returns the string representation of bottom rows.- Parameters:
numRows- the number of rows to show.- Returns:
- the string representation of bottom rows.
-
toString
Returns the string representation of rows in specified range.- Parameters:
from- the initial index of the range to show, inclusiveto- the final index of the range to show, exclusive.truncate- Whether truncate long strings and align cells right.- Returns:
- the string representation of rows in specified range.
-
of
-
of
-
of
-
of
-
of
Creates a DataFrame from a stream of tuples.- Parameters:
schema- the schema of data frame.data- the data stream.- Returns:
- the data frame.
-
of
Creates a DataFrame from a set of tuples.- Parameters:
schema- The schema of tuple.data- The data collection.- Returns:
- the data frame.
-
of
Creates a DataFrame from a JDBC ResultSet.- Parameters:
rs- The JDBC result set.- Returns:
- the data frame.
- Throws:
SQLException- when JDBC operation fails.
-
hashCode
-
equals
Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared withObjects::equals(Object,Object). -
schema
-
columns
-
index
-