Apache Parquet (Partitioned) Reader

Apache Parquet is a columnar, file-based storage format, originating in the Apache Hadoop ecosystem. It can be queried efficiently, is highly compressed, supports null values, and is non-spatial. It is supported by many Apache big data frameworks, such as Drill, Hive, and Spark.

Parquet is additionally supported by several large-scale query providers, such as Amazon AWS Athena, Google Cloud BigQuery, and Microsoft Azure Data Lake Analytics.

FME supports the following Parquet/GeoParquet readers:

FME supports the following Parquet/GeoParquet writers:

Apache Parquet (Partitioned) Product and System Requirements

Format	FME Platform			Operating System
Reader/Writer	FME Form	FME Flow	FME Flow Hosted	Windows 64-bit	Linux	Mac
Reader	Yes	Yes	Yes	Yes	Yes	Yes

Parquet File Extensions

A Parquet dataset consists of multiple *.parquet files in a directory, potentially nested into partitions by attribute. For example, a Parquet dataset of customer information, partitioned by account type, might look like this:

customers

To read this dataset, you would select the customers directory as the format dataset using the Apache Parquet (Partitioned) Reader, set the Partition Type to Directory, and Partition Attributes to an appropriate attribute name like account.

To write this dataset (for example, using the Apache Parquet or GeoParquet writers), you would select the customers parent directory as the format dataset, name the feature type customers, and partition on an account type attribute.

Partitioning on multiple attributes is possible and will result in further nesting of subdirectories.

Note Partitioned attribute values will not be written to .parquet files as the data is considered redundant and now part of the filesystem path.

Reader Overview

While Parquet is a columnar format, the Parquet reader produces a feature for each row in the dataset.

The Apache Parquet (Partitioned) reader operates on a partitioned dataset directory of *.parquet files. The Apache Parquet writer will write a single file per feature type by default, but data can be partitioned by enabling the writer’s native partition options or using feature type fanout.

A dataset has only one reader feature type.

Latitude/Longitude and x, y, z coordinates

FME automatically recognizes some common attribute names as potential x,y,z coordinates and sets their types.

This data may not necessarily have a spatial component, but columns can be identified as x, y, or z coordinates to create point geometries. If a schema scan is performed and field labels contain variations of x/y, east/north, or easting/northing, FME will create the point geometry.

If FME detects latitude and longitude column names (for example, Latitude/Longitude or Lat/Long), the source coordinate system will be set to LL-WGS84.

Supported Coordinate Attributes
X and Y coordinate attributes are labeled using one of the well-known coordinate systems. If the X coordinate is found, the Y coordinate must also match. These well-known coordinate groups are: Geographic Group – Longitude/Latitude (not case-sensitive): (English/French/Swedish) lon, long, longitude and lat, latitude, latitud (Dutch) Lengtegraad, Breedtegraad (German) Laenge, Länge, Laengengrad, Längengrad in Östliche Längengrad (Östliche optional) Breite, Breitengrad in Nördliche Breitengrad (Nördliche optional) Japanese long and lat \U00005341\U00009032\U00007D4C\U00005EA6, \U00005341\U00009032\U00007DEF\U00005EA6 XYCoordinateGroup (not case-sensitive): east(int), north(ing) coord_x, coord_y my_x_coord, my_y_coord x, y xCoord, yCoord CoordX, CoordY Japanese X and Y coordinates: x\U00005EA7\U00006A19, y\U00005EA7\U00006A19
z values do not have to match the X and Y grouping: z, coord_z, z_coord, my_z_coord height elev, elevation depth zCoord zKoordinate CoordZ KoordinateZ zoord in Japanese z\U00005EA7\U00006A19 elevation/altitude in Japanese: \U00006A19\U00009AD8\|\U00009AD8\U00005EA6

Supported Coordinate Attributes

X and Y coordinate attributes are labeled using one of the well-known coordinate systems. If the X coordinate is found, the Y coordinate must also match. These well-known coordinate groups are:

Geographic Group – Longitude/Latitude (not case-sensitive):

(English/French/Swedish) lon, long, longitude and lat, latitude, latitud
(Dutch) Lengtegraad, Breedtegraad
(German) Laenge, Länge, Laengengrad, Längengrad in Östliche Längengrad (Östliche optional)
Breite, Breitengrad in Nördliche Breitengrad (Nördliche optional)
Japanese long and lat \U00005341\U00009032\U00007D4C\U00005EA6, \U00005341\U00009032\U00007DEF\U00005EA6

XYCoordinateGroup (not case-sensitive):

east(int), north(ing)
coord_x, coord_y
my_x_coord, my_y_coord
x, y
xCoord, yCoord
CoordX, CoordY
Japanese X and Y coordinates: x\U00005EA7\U00006A19, y\U00005EA7\U00006A19

z values do not have to match the X and Y grouping:

z, coord_z, z_coord, my_z_coord
height
elev, elevation
depth
zCoord zKoordinate
CoordZ KoordinateZ
zoord in Japanese z\U00005EA7\U00006A19
elevation/altitude in Japanese: \U00006A19\U00009AD8|\U00009AD8\U00005EA6

FME Community

Search Parquet