Apache Parquet Reader/Writer

Apache Parquet is a columnar, file-based storage format, originating in the Apache Hadoop ecosystem. It can be queried efficiently, is highly compressed, supports null values, and is non-spatial. It is supported by many Apache big data frameworks, such as Drill, Hive, and Spark.

Parquet is additionally supported by several large-scale query providers, such as Amazon AWS Athena, Google Cloud BigQuery, and Microsoft Azure Data Lake Analytics.

About Apache Parquet Format Support

Note that this format replaces the now-deprecated Apache Parquet FME package format.

Parquet Product and System Requirements

Format	Product			Operating System
Reader/Writer	FME Desktop License	FME Server	FME Cloud	Windows	Linux	Mac
Reader	Available in FME Professional Edition and higher	Yes	Yes	64-bit: Yes	Yes	Yes
Writer	Available in FME Professional Edition and higher	Yes	Yes	64-bit: Yes	Yes	Yes

More about FME Licenses and Subscriptions.
More about FME Desktop Editions and Licenses.

Parquet File Extensions

A Parquet dataset consists of multiple *.parquet files in a folder, potentially nested into partitions by attribute. For example, a Parquet dataset of customer information, partitioned by account type, might look like this:

customers
- starter
  - Ks4Fju.parquet
  - N7GQGb.parquet
  - DsuO3K.parquet
  - ...
- plus
  - e0UOXZ.parquet
  - 5htd7H.parquet
  - ...
- enterprise
  - GqZIV1.parquet
  - GZgJAk.parquet
  - GShRhm.parquet
  - Tz06cp.parquet
  - ...

To write to this dataset, you would select the "customers" folder as the format dataset.

Reader Overview

While Parquet is a columnar format, the Parquet reader produces a feature for each row in the dataset. However, Bulk Mode bridges the gap between columnar storage and row-based access in FME.

The reader operates on a single *.parquet file, but multiple files making up a partitioned dataset can be selected. The writer will write a single file per feature type, but data can be partitioned using feature type fanout.

A dataset has only one reader feature type.

Latitude/Longitude and x, y, z coordinates

FME automatically recognizes some common attribute names as potential x,y,z coordinates and sets their types.

This data may not necessarily have a spatial component, but columns can be identified as x, y, or z coordinates to create point geometries. If a schema scan is performed and field labels contain variations of x/y, east/north, or easting/northing, FME will create the point geometry.

If FME detects latitude and longitude column names (for example, Latitude/Longitude or Lat/Long), the source coordinate system will be set to LL-WGS84.

Writer Overview

The Parquet writer writes all the attributes of a feature to a Parquet dataset. The writer operates on a folder and will write a single .parquet file to the folder for each feature type. The file name will be the feature type name. Existing files of the same name will be overwritten. There is a file version option for backwards compatibility with older external readers and a compression option to reduce file size.

A dataset has only one writer feature type, meaning a file only contains features from a single feature type.

FME Community

Search Parquet