Apache Parquet Reader/Writer

Apache Parquet is a columnar, file-based storage format, originating in the Apache Hadoop ecosystem. It can be queried efficiently, is highly compressed, supports null values, and is non-spatial. It is supported by many Apache big data frameworks, such as Drill, Hive, and Spark.

Parquet is additionally supported by several large-scale query providers, such as Amazon AWS Athena, Google Cloud BigQuery, and Microsoft Azure Data Lake Analytics.

About Apache Parquet Format Support

Note that this format replaces the now-deprecated Apache Parquet FME package format.

Parquet Product and System Requirements

Format

Product

Operating System

Reader/Writer

FME Desktop License

FME Server

FME Cloud

Windows

Linux

Mac

Reader

Available in FME Professional Edition and higher

Yes

Yes

64-bit: Yes

Yes

Yes

Writer

Available in FME Professional Edition and higher

Yes

Yes

64-bit: Yes

Yes

Yes

Parquet File Extensions

A Parquet dataset consists of multiple *.parquet files in a folder, potentially nested into partitions by attribute. For example, a Parquet dataset of customer information, partitioned by account type, might look like this:

  • customers
    • starter
      • Ks4Fju.parquet
      • N7GQGb.parquet
      • DsuO3K.parquet
      • ...
    • plus
      • e0UOXZ.parquet
      • 5htd7H.parquet
      • ...
    • enterprise
      • GqZIV1.parquet
      • GZgJAk.parquet
      • GShRhm.parquet
      • Tz06cp.parquet
      • ...

To write to this dataset, you would select the "customers" folder as the format dataset.

Reader Overview

While Parquet is a columnar format, the Parquet reader produces a feature for each row in the dataset. However, Bulk Mode bridges the gap between columnar storage and row-based access in FME.

The reader operates on a single *.parquet file, but multiple files making up a partitioned dataset can be selected. The writer will write a single file per feature type, but data can be partitioned using feature type fanout.

A dataset has only one reader feature type.

Latitude/Longitude and x, y, z coordinates

FME automatically recognizes some common attribute names as potential x,y,z coordinates and sets their types.

This data may not necessarily have a spatial component, but columns can be identified as x, y, or z coordinates to create point geometries. If a schema scan is performed and field labels contain variations of x/y, east/north, or easting/northing, FME will create the point geometry.

If FME detects latitude and longitude column names (for example, Latitude/Longitude or Lat/Long), the source coordinate system will be set to LL-WGS84.

Writer Overview

The Parquet writer writes all the attributes of a feature to a Parquet dataset. The writer operates on a folder and will write a single .parquet file to the folder for each feature type. The file name will be the feature type name. Existing files of the same name will be overwritten. There is a file version option for backwards compatibility with older external readers and a compression option to reduce file size.

A dataset has only one writer feature type, meaning a file only contains features from a single feature type.

FME Community

Search Parquet