Package smile.io

Class Parquet

java.lang.Object
smile.io.Parquet

public class Parquet extends Object
Apache Parquet is a columnar storage format that supports nested data structures. It uses the record shredding and assembly algorithm described in the Dremel paper.
  • Method Details

    • read

      public static DataFrame read(Path path) throws IOException
      Reads a local parquet file.
      Parameters:
      path - the input file path.
      Returns:
      the data frame.
      Throws:
      IOException - when fails to write the file.
    • read

      public static DataFrame read(Path path, int limit) throws IOException
      Reads a local parquet file.
      Parameters:
      path - the input file path.
      limit - the number of records to read.
      Returns:
      the data frame.
      Throws:
      IOException - when fails to write the file.
    • read

      public static DataFrame read(String path) throws IOException, URISyntaxException
      Reads a HDFS parquet file.
      Parameters:
      path - the input file path.
      Returns:
      the data frame.
      Throws:
      IOException - when fails to write the file.
      URISyntaxException - when the file path syntax is wrong.
    • read

      public static DataFrame read(String path, int limit) throws IOException, URISyntaxException
      Reads a HDFS parquet file.
      Parameters:
      path - the input file path.
      limit - the number of records to read.
      Returns:
      the data frame.
      Throws:
      IOException - when fails to write the file.
      URISyntaxException - when the file path syntax is wrong.
    • read

      public static DataFrame read(org.apache.parquet.io.InputFile file) throws IOException
      Reads a parquet file.
      Parameters:
      file - an interface with the methods needed by Parquet to read data files. See HadoopInputFile for example.
      Returns:
      the data frame.
      Throws:
      IOException - when fails to write the file.
    • read

      public static DataFrame read(org.apache.parquet.io.InputFile file, int limit) throws IOException
      Reads a limited number of records from a parquet file.
      Parameters:
      file - an interface with the methods needed by Parquet to read data files. See HadoopInputFile for example.
      limit - the number of records to read.
      Returns:
      the data frame.
      Throws:
      IOException - when fails to write the file.