Zarr Reader/Writer
Zarr is an open source, file-based storage format, meant for storing large multi-dimensional typed arrays.
It is designed primarily with performance in mind. It supports highly efficient parallel reading and writing from directories or cloud storage. In addition, its compressor is very customizable with different trade-offs of performance-to-compression ratio.
Zarr was originally created for its integration with Numpy and SciPy, but has picked up traction in the geospatial raster space as a more fully featured counterpart to HDF5 and NetCDF.
Zarr is very easy to store in almost any solution. Trivially it can be stored as a local OS directory or zip file, as key-value pair stores such as Amazon S3, or in relational databases.
Zarr Product and System Requirements
Format |
FME Platform |
Operating System |
||||
---|---|---|---|---|---|---|
Reader/Writer |
FME Form |
FME Flow |
FME Flow Hosted |
Windows 64-bit |
Linux |
Mac |
Reader |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Writer |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Zarr File Extensions
A Zarr dataset consists of a folder commonly but not required to be named with the extension *.zarr. This folder must contain a .zgroup file, .zarray file, or a .zmetadata file. This may also contain array files, or array directories. For example, a simple 2D raster Zarr file might look like this:
- raster.zarr
- .zgroup
- .zarray
- .zattrs
- 0
- ...
- y
- .zarray
- .zattrs
- 0
- ...
- main
- .zarray
- .zattrs
- 0.0
- 0.1
- ...
- .zgroup
To read this dataset, you would select the .zgroup file or .zmetadata file in raster.zarr.
To write this dataset you would write to raster or raster.zarr.
- On Mac, these files will be hidden and you will be unable to select them through the file picker. You will have to manually set the file path.
- On Linux, by default these files will be hidden, but you can enable viewing hidden files through the OS.
Reader Overview
FME considers a single Zarr dataset to be a dataset.
If the dataset can be interpreted as a single 2D FME raster feature, it will be. If the dataset is interpreted as an n-dimensional array, it will be represented as a bunch of 2D FME raster feature slices. Otherwise, each array in the dataset will be considered a single FME raster feature.
Writer Overview
FME considers a dataset to be a folder name.
The name of the output Zarr folder to the output dataset folder is determined from the FME Feature Type. The folder does not have to exist before the translation occurs. If there is an existing Zarr folder with the same name, it will be overwritten.
All features passed into the Zarr writer will be added as a subdataset to the Zarr dataset. The name of the array in the Zarr dataset can be set in advanced options – if unspecified, the name will default to zarr_logical_path if present, and fme_basename if not.
The writer will not create multiple Zarr datasets. If a writer receives a set of features with the same logical path and a slice index is either specified in advanced options or zarr_slice_index is present, the writer will consolidate these features into a single subdataset.