Apache Parquet Reader/Writer (Technology Preview) (FME Desktop Package)
Technology Preview
Note: This format is still under active development, so please expect changes to the current behavior. We do not recommend using this format in production environments.
This package allows you to read and write Apache Parquet format datasets, which are commonly used in Big Data workflows.
Integrating this Format with FME Desktop
Download from FME Hub |
|
Created By |
|
Package Filename |
safe-lab.parquet-0.1.3.fpkg To install the Apache Parquet format package, double-click the downloaded fpkg file. |
Version |
0.1.3 |
Requirements |
FME build 19231+ |
This FME Desktop Package contains the Apache Parquet Reader/Writer introduced with FME 2019.
Overview
Apache Parquet is a columnar, file-based storage format, originating in the Apache Hadoop ecosystem. It can be queried efficiently, and is highly compressed. It is supported by many Apache big data frameworks, such as Drill, Hive, and Spark.
Parquet is additionally supported by several large-scale query providers, such as Amazon AWS Athena, Google Cloud BigQuery, and Microsoft Azure Data Lake Analytics.
Parquet File Extensions
A Parquet dataset consists of multiple *.parquet files in a folder, potentially nested into partitions by attribute. For example, a Parquet dataset of customer information, partitioned by account type, might look like this:
- customers
- starter
- Ks4Fju.parquet
- N7GQGb.parquet
- DsuO3K.parquet
- ...
- plus
- e0UOXZ.parquet
- 5htd7H.parquet
- ...
- enterprise
- GqZIV1.parquet
- GZgJAk.parquet
- GShRhm.parquet
- Tz06cp.parquet
- ...
- starter
To read from or write to this dataset, you would select the "customers" folder as the format dataset.
Reader Overview
While Parquet is a columnar format, the Parquet reader produces a feature for each row in the dataset. The reader operates on a directory containing multiple *.parquet files, optionally partitioned further by column.
A dataset has only one reader feature type.
Writer Overview
The Parquet writer writes all the attributes of a feature to a Parquet dataset.
A dataset has only one writer feature type.