Notice: This format package is now deprecated. It is replaced by the Apache Parquet Reader/Writer.
Note: Safe Software does not recommend using this format in production environments.
This package allows you to read and write Apache Parquet format datasets, which are commonly used in Big Data workflows.
About Apache Parquet Format Support
Note that this format differs from the Apache Parquet format that is integrated with FME:
This package format is an easy way to add the Parquet format to an existing FME installation. The Apache Parquet Writer is released with FME, and has full bulk-mode functionality.
Integrating this Format with FME Desktop
Download from FME Hub
Apache Parquet is a columnar, file-based storage format, originating in the Apache Hadoop ecosystem. It can be queried efficiently, and is highly compressed. It is supported by many Apache big data frameworks, such as Drill, Hive, and Spark.
Parquet is additionally supported by several large-scale query providers, such as Amazon AWS Athena, Google Cloud BigQuery, and Microsoft Azure Data Lake Analytics.
Parquet File Extensions
A Parquet dataset consists of multiple *.parquet files in a folder, potentially nested into partitions by attribute. For example, a Parquet dataset of customer information, partitioned by account type, might look like this:
To read from or write to this dataset, you would select the "customers" folder as the format dataset.
While Parquet is a columnar format, the Parquet reader produces a feature for each row in the dataset. The reader operates on a directory containing multiple *.parquet files, optionally partitioned further by column.
A dataset has only one reader feature type.
The Parquet writer writes all the attributes of a feature to a Parquet dataset.
A dataset has only one writer feature type.