Apache Parquet Reader/Writer (Technology Preview) (FME Desktop Package)

Technology Preview

Note: This format is still under active development, so please expect changes to the current behavior. Safe Software does not recommend using this format in production environments.

This package allows you to read and write Apache Parquet format datasets, which are commonly used in Big Data workflows.

About Apache Parquet Format Support

Note that this format differs from the Apache Parquet format that is integrated with FME:

This package format is an easy way to add the Parquet format to an existing FME installation. The Apache Parquet Writer is released with FME, and has full bulk-mode functionality.

Integrating this Format with FME Desktop

Download from FME Hub

https://hub.safe.com/publishers/safe-lab/packages/parquet

What is FME Hub?

Created By

Safe Software Lab

Package Filename

safe-lab.parquet-0.1.3.fpkg

To install the Apache Parquet format package, double-click the downloaded fpkg file.

Version

0.1.3

Requirements

FME build 19231+

This FME Desktop Package contains the Apache Parquet Reader/Writer introduced with FME 2019.

Overview

Apache Parquet is a columnar, file-based storage format, originating in the Apache Hadoop ecosystem. It can be queried efficiently, and is highly compressed. It is supported by many Apache big data frameworks, such as Drill, Hive, and Spark.

Parquet is additionally supported by several large-scale query providers, such as Amazon AWS Athena, Google Cloud BigQuery, and Microsoft Azure Data Lake Analytics.

Parquet File Extensions

A Parquet dataset consists of multiple *.parquet files in a folder, potentially nested into partitions by attribute. For example, a Parquet dataset of customer information, partitioned by account type, might look like this:

  • customers
    • starter
      • Ks4Fju.parquet
      • N7GQGb.parquet
      • DsuO3K.parquet
      • ...
    • plus
      • e0UOXZ.parquet
      • 5htd7H.parquet
      • ...
    • enterprise
      • GqZIV1.parquet
      • GZgJAk.parquet
      • GShRhm.parquet
      • Tz06cp.parquet
      • ...

To read from or write to this dataset, you would select the "customers" folder as the format dataset.

Reader Overview

While Parquet is a columnar format, the Parquet reader produces a feature for each row in the dataset. The reader operates on a directory containing multiple *.parquet files, optionally partitioned further by column.

A dataset has only one reader feature type.

Writer Overview

The Parquet writer writes all the attributes of a feature to a Parquet dataset.

A dataset has only one writer feature type.

FME Community

Search Parquet