Package smile.io

Class Parquet

java.lang.Object
smile.io.Parquet

public class Parquet extends Object
Apache Parquet is a columnar storage format that supports nested data structures. It uses the record shredding and assembly algorithm described in the Dremel paper.
  • Method Details

    • read

      public static DataFrame read(Path path) throws IOException
      Reads a local parquet file.
      Parameters:
      path - the input file path.
      Returns:
      the data frame.
      Throws:
      IOException - when fails to write the file.
    • read

      public static DataFrame read(Path path, int limit) throws IOException
      Reads a local parquet file.
      Parameters:
      path - the input file path.
      limit - the number of records to read.
      Returns:
      the data frame.
      Throws:
      IOException - when fails to write the file.
    • read

      public static DataFrame read(String path) throws IOException, URISyntaxException
      Reads a HDFS parquet file.
      Parameters:
      path - the input file path.
      Returns:
      the data frame.
      Throws:
      IOException - when fails to write the file.
      URISyntaxException - when the file path syntax is wrong.
    • read

      public static DataFrame read(String path, int limit) throws IOException, URISyntaxException
      Reads a HDFS parquet file.
      Parameters:
      path - the input file path.
      limit - the number of records to read.
      Returns:
      the data frame.
      Throws:
      IOException - when fails to write the file.
      URISyntaxException - when the file path syntax is wrong.
    • read

      public static DataFrame read(org.apache.parquet.io.InputFile file) throws IOException
      Reads a parquet file.
      Parameters:
      file - an interface with the methods needed by Parquet to read data files. See HadoopInputFile for example.
      Returns:
      the data frame.
      Throws:
      IOException - when fails to write the file.
    • read

      public static DataFrame read(org.apache.parquet.io.InputFile file, int limit) throws IOException
      Reads a limited number of records from a parquet file.
      Parameters:
      file - an interface with the methods needed by Parquet to read data files. See HadoopInputFile for example.
      limit - the number of records to read.
      Returns:
      the data frame.
      Throws:
      IOException - when fails to write the file.
    • toDataType

      public static DataType toDataType(org.apache.parquet.schema.PrimitiveType primitiveType)
      Converts a parquet primitive type to smile data type.
      Parameters:
      primitiveType - a parquet primitive type.
      Returns:
      the data type.
    • toStructField

      public static StructField toStructField(org.apache.parquet.column.ColumnDescriptor column)
      Converts a parquet column to smile field.
      Parameters:
      column - a parquet column descriptor.
      Returns:
      the struct field.
    • toStructType

      public static StructType toStructType(org.apache.parquet.schema.MessageType schema)
      Converts a parquet schema to smile schema.
      Parameters:
      schema - a parquet schema.
      Returns:
      the struct type.