Microsoft Azure Cosmos DB (DocumentDB) Reader/Writer
Licensing options for this format begin with FME Desktop Professional Edition.
This reader/writer supports the Azure Cosmos DB SQL API. Note that although CosmosDB also implements other APIs (Cassandra, MongoDB, Gremlin and Azure Table Storage), they are not supported by this reader/writer.
Overview
The Microsoft Azure Cosmos DB Reader/Writer allows FME to read and write Documents inside Collections within a Cosmos DB account.
Microsoft Azure Cosmos DB is a managed database service that supports multiple data storage and access models, including graph, key-value, and JSON documents. The DocumentDB API for Cosmos DB provides a JSON document store, with support for GeoJSON, geospatial queries, and SQL statements. The DocumentDB API for Cosmos DB replaces the standalone DocumentDB service. For more information about Cosmos DB, visit the Azure Cosmos DB web page.
Note: For clarity, this document will use DocumentDB when referring to accessing Cosmos DB using the DocumentDB API.
Terminology in this Chapter
Cosmos DB |
Definition or FME Representation |
---|---|
Collections |
Feature Types |
Document |
Feature |
Collection id values |
Feature Type names |
Database, under which Collections reside |
Specified at the Reader/Writer Parameters level |
Reader Overview
The DocumentDB Reader retrieves features (Documents) by executing SQL queries against a Collection.
Each JSON Document is converted to a feature based on the schema defined on the corresponding reader feature type. If a key in the JSON Document corresponds to a user attribute on the feature type schema, then a corresponding attribute is set on the feature.
The original JSON Document is available in the documentdb_json format attribute if the Read Original JSON Document parameter is enabled.
Schema Scanning
To generate the schema for feature types, the DocumentDB reader infers the schema based on an arbitrary first Document from the corresponding Collection. The names of top-level properties (attributes) in the Document are adopted as user attributes on the generated schema. Attribute types are also set based on the types of values found in the first Document. The attribute types defined on the generated schema are informational only, and are not enforced.
Depending on the nature of the Documents inside a Collection, and the use of a WHERE clause, the schema inferred by the DocumentDB reader may not accurately reflect the Documents being read. Reader feature type schema may be manually adjusted by enabling reader feature type editing in Workbench options.
Workbench Reader Dataset
The value for the Reader Dataset is the DocumentDB account.
This value can be the name of the account (for example contoso), or the URL for the account (for example, https://contoso.documents.azure.com). Paths in the URL are ignored.
Writer Overview
Each write operation in DocumentDB corresponds to a single HTTP request. There is no bulk writing mode. To maximize performance, the DocumentDB Writer issues multiple write requests in parallel, and there is no guaranteed order of write operations. Overall performance is influenced by Document size, network quality, and the Collections provisioned throughput.
Update and delete operations must supply a value for the Document id. If an id is not provided for Insert and upsert operations, a UUID is generated in place of it. For more information, refer to the documentation for the Document ID Attribute writer feature type parameter.
The DocumentDB writer can create Collections. Since there is an hourly cost associated with each Collection, you should use caution to avoid creating too many Collections. For more information, refer to the documentation for the Collection Handling writer feature type parameter.
JSON Document Generation
For INSERT, UPSERT, and DELSERT feature operations, the DocumentDB Writer converts each incoming feature into a JSON Document via one of two ways:
- If the documentdb_json format attribute is present and can be parsed as JSON, it’s used as the Document, and all user attributes are ignored. This method is useful for writing arbitrary JSON Documents that may have inconsistent or unknown schema, or when the source data is already serialized JSON.
- If the JSON doesn’t have an id property, it will be set using the value of the attribute specified by the Document ID Attribute writer feature type parameter.
- If the JSON has an id property, and the feature has a value in the attribute specified by the Document ID Attribute writer feature type parameter, but the two values are not the same, a warning is raised, and the feature is skipped.
- User attributes defined on the writer Feature Type are used to build a JSON Document with the corresponding properties.
- If the feature has no value for a user attribute, then no corresponding property will be present on the generated Document.
- Attribute types defined on the Feature Type schema are enforced. If a value can’t be cast to the target type, a warning is logged, and the value is written as a string.
- A value must be present under the attribute specified by the Document ID Attribute writer feature type parameter.
Workbench Writer Dataset
The value for the Writer Dataset is the DocumentDB account.
This value can be the name of the account (for example, contoso
), or the URL for the account (for example, https://contoso.documents.azure.com
). Paths in the URL are ignored.
Automatic Translation
Due to the number of per feature type options on the DocumentDB writer, as well as the potential monetary cost of creating many feature types (Collections), the FME Quick Translator cannot use the DocumentDB writer.