Package smile.data

Interface DataFrame

All Superinterfaces:
Dataset<Tuple>, Iterable<BaseVector>
All Known Implementing Classes:
IndexDataFrame

public interface DataFrame extends Dataset<Tuple>, Iterable<BaseVector>
An immutable collection of data organized into named columns.
  • Method Details

    • schema

      StructType schema()
      Returns the schema of DataFrame.
      Returns:
      the schema.
    • names

      default String[] names()
      Returns the column names.
      Returns:
      the column names.
    • types

      default DataType[] types()
      Returns the column data types.
      Returns:
      the column data types.
    • measures

      default Measure[] measures()
      Returns the column's level of measurements.
      Returns:
      the column's level of measurements.
    • nrow

      default int nrow()
      Returns the number of rows.
      Returns:
      the number of rows.
    • ncol

      int ncol()
      Returns the number of columns.
      Returns:
      the number of columns.
    • structure

      default DataFrame structure()
      Returns the structure of data frame.
      Returns:
      the structure of data frame.
    • omitNullRows

      default DataFrame omitNullRows()
      Returns a new data frame without rows that have null/missing values.
      Returns:
      the data frame without nulls.
    • get

      default Object get(int i, int j)
      Returns the cell at (i, j).
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
    • get

      default Object get(int i, String column)
      Returns the cell at (i, j).
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
    • of

      default DataFrame of(int... index)
      Returns a new data frame with row indexing.
      Parameters:
      index - the row indices.
      Returns:
      the data frame of selected rows.
    • of

      default DataFrame of(boolean... index)
      Returns a new data frame with boolean indexing.
      Parameters:
      index - the boolean index.
      Returns:
      the data frame of selected rows.
    • slice

      default DataFrame slice(int from, int to)
      Copies the specified range into a new data frame.
      Parameters:
      from - the initial index of the range to be copied, inclusive
      to - the final index of the range to be copied, exclusive.
      Returns:
      the data frame of selected range of rows.
    • isNullAt

      default boolean isNullAt(int i, int j)
      Checks whether the value at position (i, j) is null.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      true if the cell value is null.
    • isNullAt

      default boolean isNullAt(int i, String column)
      Checks whether the field value is null.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      true if the cell value is null.
    • getBoolean

      default boolean getBoolean(int i, int j)
      Returns the value at position (i, j) as a primitive boolean.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getBoolean

      default boolean getBoolean(int i, String column)
      Returns the field value as a primitive boolean.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getChar

      default char getChar(int i, int j)
      Returns the value at position (i, j) as a primitive byte.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getChar

      default char getChar(int i, String column)
      Returns the field value as a primitive byte.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getByte

      default byte getByte(int i, int j)
      Returns the value at position (i, j) as a primitive byte.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getByte

      default byte getByte(int i, String column)
      Returns the field value as a primitive byte.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getShort

      default short getShort(int i, int j)
      Returns the value at position (i, j) as a primitive short.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getShort

      default short getShort(int i, String column)
      Returns the field value as a primitive short.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getInt

      default int getInt(int i, int j)
      Returns the value at position (i, j) as a primitive int.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getInt

      default int getInt(int i, String column)
      Returns the field value as a primitive int.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getLong

      default long getLong(int i, int j)
      Returns the value at position (i, j) as a primitive long.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getLong

      default long getLong(int i, String column)
      Returns the field value as a primitive long.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getFloat

      default float getFloat(int i, int j)
      Returns the value at position (i, j) as a primitive float. Throws an exception if the type mismatches or if the value is null.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getFloat

      default float getFloat(int i, String column)
      Returns the field value as a primitive float. Throws an exception if the type mismatches or if the value is null.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getDouble

      default double getDouble(int i, int j)
      Returns the value at position (i, j) as a primitive double.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getDouble

      default double getDouble(int i, String column)
      Returns the field value as a primitive double.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getString

      default String getString(int i, int j)
      Returns the value at position (i, j) as a String object.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getString

      default String getString(int i, String column)
      Returns the field value as a String object.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • toString

      default String toString(int i, int j)
      Returns the string representation of the value at position (i, j).
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the string representation of cell value.
    • toString

      default String toString(int i, String column)
      Returns the string representation of the field value.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the string representation of cell value.
    • getDecimal

      default BigDecimal getDecimal(int i, int j)
      Returns the value at position (i, j) of decimal type as java.math.BigDecimal.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getDecimal

      default BigDecimal getDecimal(int i, String column)
      Returns the field value of decimal type as java.math.BigDecimal.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getDate

      default LocalDate getDate(int i, int j)
      Returns the value at position (i, j) of date type as java.time.LocalDate.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getDate

      default LocalDate getDate(int i, String column)
      Returns the field value of date type as java.time.LocalDate.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getTime

      default LocalTime getTime(int i, int j)
      Returns the value at position (i, j) of date type as java.time.LocalTime.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getTime

      default LocalTime getTime(int i, String column)
      Returns the field value of date type as java.time.LocalTime.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getDateTime

      default LocalDateTime getDateTime(int i, int j)
      Returns the value at position (i, j) as java.time.LocalDateTime.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getDateTime

      default LocalDateTime getDateTime(int i, String column)
      Returns the field value as java.time.LocalDateTime.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getScale

      default String getScale(int i, int j)
      Returns the value at position (i, j) of NominalScale or OrdinalScale.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell scale.
      Throws:
      ClassCastException - when the data is not nominal or ordinal.
    • getScale

      default String getScale(int i, String column)
      Returns the field value of NominalScale or OrdinalScale.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell scale.
      Throws:
      ClassCastException - when the data is not nominal or ordinal.
    • getArray

      default <T> T[] getArray(int i, int j)
      Returns the value at position (i, j) of array type.
      Type Parameters:
      T - the data type of array elements.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getArray

      default <T> T[] getArray(int i, String column)
      Returns the field value of array type.
      Type Parameters:
      T - the data type of array elements.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getStruct

      default Tuple getStruct(int i, int j)
      Returns the value at position (i, j) of struct type.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getStruct

      default Tuple getStruct(int i, String column)
      Returns the field value of struct type.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • indexOf

      int indexOf(String column)
      Returns the index of a given column name.
      Parameters:
      column - the column name.
      Returns:
      the index of column.
      Throws:
      IllegalArgumentException - when a field `name` does not exist.
    • apply

      default BaseVector apply(String column)
      Selects column based on the column name and return it as a Column.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • apply

      default BaseVector apply(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the field enum.
      Returns:
      the column vector.
    • column

      BaseVector column(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • column

      default BaseVector column(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • column

      default BaseVector column(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • vector

      <T> Vector<T> vector(int i)
      Selects column based on the column index.
      Type Parameters:
      T - the data type of column.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • vector

      default <T> Vector<T> vector(String column)
      Selects column based on the column name.
      Type Parameters:
      T - the data type of column.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • vector

      default <T> Vector<T> vector(Enum<?> column)
      Selects column using an enum value.
      Type Parameters:
      T - the data type of column.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • booleanVector

      BooleanVector booleanVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • booleanVector

      default BooleanVector booleanVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • booleanVector

      default BooleanVector booleanVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • charVector

      CharVector charVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • charVector

      default CharVector charVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • charVector

      default CharVector charVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • byteVector

      ByteVector byteVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • byteVector

      default ByteVector byteVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • byteVector

      default ByteVector byteVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • shortVector

      ShortVector shortVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • shortVector

      default ShortVector shortVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • shortVector

      default ShortVector shortVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • intVector

      IntVector intVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • intVector

      default IntVector intVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • intVector

      default IntVector intVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • longVector

      LongVector longVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • longVector

      default LongVector longVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • longVector

      default LongVector longVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • floatVector

      FloatVector floatVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • floatVector

      default FloatVector floatVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • floatVector

      default FloatVector floatVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • doubleVector

      DoubleVector doubleVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • doubleVector

      default DoubleVector doubleVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • doubleVector

      default DoubleVector doubleVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • stringVector

      StringVector stringVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • stringVector

      default StringVector stringVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • stringVector

      default StringVector stringVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • select

      DataFrame select(int... columns)
      Returns a new DataFrame with selected columns.
      Parameters:
      columns - the column indices.
      Returns:
      a new DataFrame with selected columns.
    • select

      default DataFrame select(String... columns)
      Returns a new DataFrame with selected columns.
      Parameters:
      columns - the column names.
      Returns:
      a new DataFrame with selected columns.
    • drop

      DataFrame drop(int... columns)
      Returns a new DataFrame without selected columns.
      Parameters:
      columns - the column indices.
      Returns:
      a new DataFrame without selected columns.
    • drop

      default DataFrame drop(String... columns)
      Returns a new DataFrame without selected columns.
      Parameters:
      columns - the column names.
      Returns:
      a new DataFrame without selected columns.
    • merge

      DataFrame merge(DataFrame... dataframes)
      Merges data frames horizontally by columns.
      Parameters:
      dataframes - the data frames to merge.
      Returns:
      a new data frame that combines this DataFrame with one more more other DataFrames by columns.
    • merge

      DataFrame merge(BaseVector... vectors)
      Merges vectors with this data frame.
      Parameters:
      vectors - the vectors to merge.
      Returns:
      a new data frame that combines this DataFrame with one more more additional vectors.
    • union

      DataFrame union(DataFrame... dataframes)
      Unions data frames vertically by rows.
      Parameters:
      dataframes - the data frames to union.
      Returns:
      a new data frame that combines all the rows.
    • factorize

      default DataFrame factorize(String... columns)
      Returns a new DataFrame with given columns converted to nominal.
      Parameters:
      columns - column names. If empty, all object columns in the data frame will be converted.
      Returns:
      a new DataFrame.
    • toArray

      default double[][] toArray(String... columns)
      Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN. No bias term and uses level encoding for categorical variables.
      Parameters:
      columns - the columns to export. If empty, all columns will be used.
      Returns:
      the numeric array.
    • toArray

      default double[][] toArray(boolean bias, CategoricalEncoder encoder, String... columns)
      Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN.
      Parameters:
      bias - if true, add the first column of all 1's.
      encoder - the categorical variable encoder.
      columns - the columns to export. If empty, all columns will be used.
      Returns:
      the numeric array.
    • toMatrix

      default Matrix toMatrix()
      Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN.
      Returns:
      the numeric matrix.
    • toMatrix

      default Matrix toMatrix(boolean bias, CategoricalEncoder encoder, String rowNames)
      Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN. No bias term and uses level encoding for categorical variables.
      Parameters:
      bias - if true, add the first column of all 1's.
      encoder - the categorical variable encoder.
      rowNames - the column to be used as row names.
      Returns:
      the numeric matrix.
    • summary

      default DataFrame summary()
      Returns the statistic summary of numeric columns.
      Returns:
      the statistic summary of numeric columns.
    • toString

      default String toString(int numRows)
      Returns the string representation of top rows.
      Specified by:
      toString in interface Dataset<Tuple>
      Parameters:
      numRows - the number of rows to show
      Returns:
      the string representation of top rows.
    • toString

      default String toString(int numRows, boolean truncate)
      Returns the string representation of top rows.
      Parameters:
      numRows - Number of rows to show
      truncate - Whether truncate long strings and align cells right.
      Returns:
      the string representation of top rows.
    • toStrings

      default String[][] toStrings(int numRows)
      Returns the string representation of top rows.
      Parameters:
      numRows - Number of rows to show
      Returns:
      the string representation of top rows.
    • toStrings

      default String[][] toStrings(int numRows, boolean truncate)
      Returns the string representation of top rows.
      Parameters:
      numRows - Number of rows to show
      truncate - Whether truncate long strings.
      Returns:
      the string representation of top rows.
    • of

      static DataFrame of(BaseVector... vectors)
      Creates a DataFrame from a set of vectors.
      Parameters:
      vectors - The column vectors.
      Returns:
      the data frame.
    • of

      static DataFrame of(double[][] data, String... names)
      Creates a DataFrame from a 2-dimensional array.
      Parameters:
      data - The data array.
      names - the name of columns.
      Returns:
      the data frame.
    • of

      static DataFrame of(float[][] data, String... names)
      Creates a DataFrame from a 2-dimensional array.
      Parameters:
      data - The data array.
      names - the name of columns.
      Returns:
      the data frame.
    • of

      static DataFrame of(int[][] data, String... names)
      Creates a DataFrame from a 2-dimensional array.
      Parameters:
      data - The data array.
      names - the name of columns.
      Returns:
      the data frame.
    • of

      static <T> DataFrame of(List<T> data, Class<T> clazz)
      Creates a DataFrame from a collection.
      Type Parameters:
      T - The data type of elements.
      Parameters:
      data - The data collection.
      clazz - The class type of elements.
      Returns:
      the data frame.
    • of

      static DataFrame of(Stream<? extends Tuple> data)
      Creates a DataFrame from a stream of tuples.
      Parameters:
      data - The data stream.
      Returns:
      the data frame.
    • of

      static DataFrame of(Stream<? extends Tuple> data, StructType schema)
      Creates a DataFrame from a stream of tuples.
      Parameters:
      data - The data stream.
      schema - The schema of tuple.
      Returns:
      the data frame.
    • of

      static DataFrame of(List<? extends Tuple> data)
      Creates a DataFrame from a set of tuples.
      Parameters:
      data - The data collection.
      Returns:
      the data frame.
    • of

      static DataFrame of(List<? extends Tuple> data, StructType schema)
      Creates a DataFrame from a set of tuples.
      Parameters:
      data - The data collection.
      schema - The schema of tuple.
      Returns:
      the data frame.
    • of

      static <T> DataFrame of(Collection<Map<String,T>> data, StructType schema)
      Creates a DataFrame from a set of Maps.
      Type Parameters:
      T - The data type of elements.
      Parameters:
      data - The data collection.
      schema - The schema of data.
      Returns:
      the data frame.
    • of

      static DataFrame of(ResultSet rs) throws SQLException
      Creates a DataFrame from a JDBC ResultSet.
      Parameters:
      rs - The JDBC result set.
      Returns:
      the data frame.
      Throws:
      SQLException - when JDBC operation fails.