Apache Parquet Writer Feature Type Parameters

To access feature type parameters, click the gear icon on a feature type in the workspace. This opens the Feature Type Parameter Editor.

Tip  To always display the editor in FME Workbench, you can select View > Windows > Parameter Editor.

General

All feature types share similar General parameters, which may include Feature Type Name, Reader or Writer information, and Geometry.

In most Writer Feature Type parameter dialogs, you can also control Dynamic Schema Definitions. Some database formats accept Table or Index Qualifier prefixes on the output table feature type.

Partition

When this option group is enabled, the writer will write a partitioned dataset instead of a single .parquet file in the writer's specified output directory. See the Parquet File Extensions section for more details.

Partition Type

  • Hive – When partitioning by an attribute, the subdirectory will be named in the form attrName=attrValue. This way of partitioning was introduced by Apache Hive.
    • Note that attribute values partitioned via the Hive partition type will be URI-encoded. This means that special characters (like spaces, question marks, ampersands, hashes, parentheses, braces, brackets, and punctuation) will be encoded.
    • For example, an attribute called Name with value John Smith would result in a partitioned subdirectory called Name=John%20Smith.
  • Directory – When partitioning by an attribute, the subdirectory will be named in the form attrValue. This is a simple type of directory partitioning.
    • Note that attribute values partitioned via the Directory partition type containing forward or backward slashes will result in additional subdirectories.

    • For example, an attribute called Date with value 2023/01/02 would result in a partitioned Parquet file under nested directories named 2023, 01, and 02.

Partition Attributes

The list of attributes to partition by. Attributes will be partitioned in the order they are on schema.

Note  Partitioning by a structured or list attribute is not supported. Some Parquet types are not natively supported when partitioning, but the types can be set manually on the reader to read the data correctly rather than in its raw form.

These types include: bson, decimal, enum, interval, json, uuid.

Partitioning by types that are written as binary (decimal, interval, uuid, bson, binary) means subdirectory paths can have embedded null characters, resulting in a writer error.

Maximum Number of Partitions

The maximum number of partitions or subdirectories to create.

Default: 1024