GeoParquet Writer Parameters
Dataset
The compression type to apply to each column when writing the file. Use this option to compress and reduce file size.
- UNCOMPRESSED – This is the default.
- ZSTD – This option provides a good compression ratio across diverse datasets. ZSTD works well with CHAR and VARCHAR columns that store a wide range of long and short strings, including JSON strings.
- Snappy – This option provides a reasonable compression ratio, but fast compression speeds.
Determines use of Parquet Format version types. The option is defaulted to write version 2.0.
Controls whether a geometry column is written by default, and which type it is. This parameter also affects how the geometry user attribute type is handled.
- Parquet – Write a spatial column using the Parquet Geometry type.
- GeoParquet – Write a spatial column using the GeoParquet geometry type.
- None – Do not write a spatial column.
When this option group is enabled, the writer will write a partitioned dataset instead of a single .parquet file in the writer's specified output directory. See the GeoParquet File Extensions section for more details.
Overwrite Existing Dataset
Check this option to overwrite an existing partitioned dataset.
- If this option is unchecked (default) and a partitioned dataset already exists, the writer will error. Features cannot be written to an existing dataset and this avoids unexpectedly deleting data.
- If this option is checked, any existing directories or files in the partitioned dataset directory will be deleted before writing begins.
If a dataset does not exist when writing begins, a new dataset is created.
Partition Type
- Hive – When partitioning by an attribute, the subdirectory will be named in the form attrName=attrValue. This way of partitioning was introduced by Apache Hive.
- Note that attribute values partitioned via the Hive partition type will be URI-encoded. This means that special characters (like spaces, question marks, ampersands, hashes, parentheses, braces, brackets, and punctuation) will be encoded.
- For example, an attribute called Name with value John Smith would result in a partitioned subdirectory called Name=John%20Smith.
- Directory – When partitioning by an attribute, the subdirectory will be named in the form attrValue. This is a simple type of directory partitioning.
Note that attribute values partitioned via the Directory partition type containing forward or backward slashes will result in additional subdirectories.
For example, an attribute called Date with value 2023/01/02 would result in a partitioned GeoParquet file under nested directories named 2023, 01, and 02.
Spatial
Coordinate systems may be extracted from input feature data sources, may come predefined with FME, or may be user-defined. FME allows different output and input coordinate systems, and performs the required coordinate conversions when necessary.
If a coordinate system is specified in both the source format and the workspace, the coordinate system in the workspace is used. The coordinate system specified in the source format is not used, and a warning is logged. If a source coordinate system is not specified in the workspace and the format or system does not store coordinate system information, then the coordinate system is not set for the features that are read.
If a destination coordinate system is set and the feature has been tagged with a coordinate system, then a coordinate system conversion is performed to put the feature into the destination system. This happens right before the feature enters into the writer.
If the destination coordinate system was not set, then the features are written out in their original coordinate system.
If a destination coordinate system is set, but the source coordinate system was not specified in the workspace or stored in the source format, then no conversion is performed. The features are simply tagged with the output system name before being written to the output dataset.
For systems that know their coordinate system, the Coordinate System field will display Read from Source and FME will read the coordinate system from the source dataset. For most other input sources, the field will display Unknown (which simply means that FME will use default values). In most cases, the default value is all you'll need to perform the translation.
You can always choose to override the defaults and choose a new coordinate system. Select More Coordinate Systems from the drop-down menu to open the Coordinate System Gallery.
Changing a Reprojection
To perform a reprojection, FME typically uses the CS-MAP reprojection engine, which includes definitions for thousands of coordinate systems, with a large variety of projections, datums, ellipsoids, and units. However, GIS applications have slightly different algorithms for reprojecting data between different coordinate systems. To ensure that the data FME writes matches exactly to your existing data, you can use the reprojection engine from a different application.
To change the reprojection engine, Select Workspace Parameters > Spatial > Reprojection Engine. In the example shown, you can select Esri (but the selection here depends on your installed applications):
- The coordinate systems file coordsys.db in the FME installation folder contains the names and descriptions of all predefined coordinate systems.
- Some users may wish to use coordinate systems that do not ship with FME, and in those cases, FME also supports custom coordinate systems.
- Learn more about Working with Coordinate Systems in FME.