Apache GeoParquet Reader/Writer
Apache
Apache Parquet is a columnar, file-based storage format, originating in the Apache Hadoop ecosystem. It can be queried efficiently, is highly compressed, supports null values, and is non-spatial. It is supported by many Apache big data frameworks, such as Drill, Hive, and Spark.
Parquet is additionally supported by several large-scale query providers, such as Amazon AWS Athena, Google Cloud BigQuery, and Microsoft Azure Data Lake Analytics.
GeoParquet Product and System Requirements
Format |
FME Platform |
Operating System |
||||
---|---|---|---|---|---|---|
Reader/Writer |
FME Form |
FME Flow |
FME Flow Hosted |
Windows 64-bit |
Linux |
Mac |
Reader |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Writer |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
GeoParquet File Extensions
A
- customers
- starter
- Ks4Fju.parquet
- N7GQGb.parquet
- DsuO3K.parquet
- ...
- plus
- e0UOXZ.parquet
- 5htd7H.parquet
- ...
- enterprise
- GqZIV1.parquet
- GZgJAk.parquet
- GShRhm.parquet
- Tz06cp.parquet
- ...
- starter
To write this dataset, you would select the customers parent directory as the format dataset, name the feature type customers and partition on an account type attribute.
Partitioning on multiple attributes is possible and will result in further nesting of subdirectories.
Reader Overview
While
The reader operates on a single *.parquet file, but multiple files making up a partitioned dataset can be selected. The writer will write a single file per feature type by default, but data can be partitioned by enabling the writer’s native partition options or using feature type fanout.
A dataset has only one reader feature type.
Latitude/Longitude and x, y, z coordinates
FME automatically recognizes some common attribute names as potential x,y,z coordinates and sets their types.
This data may not necessarily have a spatial component, but columns can be identified as x, y, or z coordinates to create point geometries. If a schema scan is performed and field labels contain variations of x/y, east/north, or easting/northing, FME will create the point geometry.
If FME detects latitude and longitude column names (for example, Latitude/Longitude or Lat/Long), the source coordinate system will be set to LL-WGS84.
Supported Coordinate Attributes |
---|
X and Y coordinate attributes are labeled using one of the well-known coordinate systems. If the X coordinate is found, the Y coordinate must also match. These well-known coordinate groups are: Geographic Group – Longitude/Latitude (not case-sensitive):
XYCoordinateGroup (not case-sensitive):
|
z values do not have to match the X and Y grouping:
|
Writer Overview
The
The writer operates on a directory and will write a single .parquet file to the directory for each feature type. The file name will be the feature type name. Existing files of the same name will be overwritten.
There is a file version option for backwards compatibility with older external readers and a compression option to reduce file size.
A dataset has only one writer feature type, which means that a file contains only features from a single feature type.
The writer can also be configured to write a partitioned dataset. In this case, the writer operates on a directory and writes