Class Parquet
java.lang.Object
smile.io.Parquet
Apache Parquet is a columnar storage format that supports
nested data structures. It uses the record shredding and
assembly algorithm described in the Dremel paper.
-
Method Summary
Modifier and TypeMethodDescriptionstatic DataFrameReads a HDFS parquet file.static DataFrameReads a HDFS parquet file.static DataFrameReads a local parquet file.static DataFrameReads a local parquet file.static DataFrameread(org.apache.parquet.io.InputFile file) Reads a parquet file.static DataFrameread(org.apache.parquet.io.InputFile file, int limit) Reads a limited number of records from a parquet file.static DataTypetoDataType(org.apache.parquet.schema.PrimitiveType primitiveType) Converts a parquet primitive type to smile data type.static StructFieldtoStructField(org.apache.parquet.column.ColumnDescriptor column) Converts a parquet column to smile field.static StructTypetoStructType(org.apache.parquet.schema.MessageType schema) Converts a parquet schema to smile schema.
-
Method Details
-
read
Reads a local parquet file.- Parameters:
path- the input file path.- Returns:
- the data frame.
- Throws:
IOException- when fails to write the file.
-
read
Reads a local parquet file.- Parameters:
path- the input file path.limit- the number of records to read.- Returns:
- the data frame.
- Throws:
IOException- when fails to write the file.
-
read
Reads a HDFS parquet file.- Parameters:
path- the input file path.- Returns:
- the data frame.
- Throws:
IOException- when fails to write the file.URISyntaxException- when the file path syntax is wrong.
-
read
Reads a HDFS parquet file.- Parameters:
path- the input file path.limit- the number of records to read.- Returns:
- the data frame.
- Throws:
IOException- when fails to write the file.URISyntaxException- when the file path syntax is wrong.
-
read
Reads a parquet file.- Parameters:
file- an interface with the methods needed by Parquet to read data files. See HadoopInputFile for example.- Returns:
- the data frame.
- Throws:
IOException- when fails to write the file.
-
read
Reads a limited number of records from a parquet file.- Parameters:
file- an interface with the methods needed by Parquet to read data files. See HadoopInputFile for example.limit- the number of records to read.- Returns:
- the data frame.
- Throws:
IOException- when fails to write the file.
-
toDataType
Converts a parquet primitive type to smile data type.- Parameters:
primitiveType- a parquet primitive type.- Returns:
- the data type.
-
toStructField
Converts a parquet column to smile field.- Parameters:
column- a parquet column descriptor.- Returns:
- the struct field.
-
toStructType
Converts a parquet schema to smile schema.- Parameters:
schema- a parquet schema.- Returns:
- the struct type.
-