Microsoft Azure Cosmos DB Reader/Writer
This reader/writer supports the Azure Cosmos DB SQL API.
Note that although CosmosDB also implements other APIs (Cassandra, MongoDB, Gremlin and Azure Table Storage), they are not supported by this reader/writer.
Cosmos DB Product and System Requirements
Format |
FME Platform |
Operating System |
||||
---|---|---|---|---|---|---|
Reader/Writer |
FME Form |
FME Flow |
FME Flow Hosted |
Windows 64-bit |
Linux |
Mac |
Reader |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Writer |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Overview
The Microsoft Azure Cosmos DB Reader/Writer allows FME to read and write Documents inside Collections within a Cosmos DB account.
Microsoft Azure Cosmos DB is a managed database service that supports multiple data storage and access models, including graph, key-value, and JSON documents. The Cosmos DB SQL API provides a JSON document store, with support for GeoJSON, geospatial queries, and SQL statements. For more information about Cosmos DB, visit the Azure Cosmos DB web page.
Microsoft Name Changes
- This format was originally named Microsoft Azure DocumentDB. In 2017, Microsoft renamed it to Microsoft Azure Cosmos DB.
- The keyword and format attribute prefixes in FME were defined before Microsoft renamed the service. For compatibility reasons, the Microsoft Azure Cosmos DB Reader/Writer format keyword and format attribute prefixes continue to use documentdb.
Terminology Notes
Definition or FME Representation |
Cosmos DB Equivalent |
---|---|
Feature Types |
Collections |
Feature |
Document |
Feature Type names |
Collection id values |
Specified at the Reader/Writer Parameters level |
Database, under which Collections reside |
Reader Overview
The Cosmos DB Reader retrieves features (Documents) by executing SQL queries against a Collection.
Each JSON Document is converted to a feature based on the schema defined on the corresponding reader feature type. If a key in the JSON Document corresponds to a user attribute on the feature type schema, then a corresponding attribute is set on the feature.
The original JSON Document is available in the documentdb_json format attribute if the Read Original JSON Document parameter is enabled.
Schema Scanning
To generate the schema for feature types, the Cosmos DB reader infers the schema based on an arbitrary first Document from the corresponding Collection. The names of top-level properties (attributes) in the Document are adopted as user attributes on the generated schema. Attribute types are also set based on the types of values found in the first Document. The attribute types defined on the generated schema are informational only, and are not enforced.
Depending on the nature of the Documents inside a Collection, and the use of a WHERE clause, the schema inferred by the Cosmos DB reader may not accurately reflect the Documents being read. Reader feature type schema may be manually adjusted by enabling reader feature type editing in FME Workbench options.
FME Workbench Reader Dataset
The value for the Reader Dataset is the Cosmos DB account.
This value can be the name of the account (for example contoso), or the URL for the account (for example, https://contoso.documents.azure.com). Paths in the URL are ignored.
Writer Overview
Each write operation corresponds to a single HTTP request. There is no bulk writing mode. To maximize performance, the Cosmos DB writer issues multiple write requests in parallel, and there is no guaranteed order of write operations. Overall performance is influenced by Document size, network quality, and the Collections provisioned throughput.
Update and delete operations must supply a value for the Document id. If an id is not provided for Insert and upsert operations, a UUID is generated in place of it. For more information, refer to the documentation for the Document ID Attribute writer feature type parameter.
The Cosmos DB writer can create Collections. Since there is an hourly cost associated with each Collection, you should use caution to avoid creating too many Collections. For more information, refer to the documentation for the Collection Handling writer feature type parameter.
JSON Document Generation
For Insert, Upsert, and Replace feature operations, the Cosmos DB writer converts each incoming feature into a JSON Document via one of two ways:
- If the documentdb_json format attribute is present and can be parsed as JSON, it’s used as the Document, and all user attributes are ignored. This method is useful for writing arbitrary JSON Documents that may have inconsistent or unknown schema, or when the source data is already serialized JSON.
- If the JSON doesn’t have an id property, it will be set using the value of the attribute specified by the Document ID Attribute writer feature type parameter.
- If the JSON has an id property, and the feature has a value in the attribute specified by the Document ID Attribute writer feature type parameter, but the two values are not the same, a warning is raised, and the feature is skipped.
- User attributes defined on the writer feature type are used to build a JSON Document with the corresponding properties.
- If the feature has no value for a user attribute, then no corresponding property will be present on the generated Document.
- Attribute types defined on the feature type schema are enforced. If a value can’t be cast to the target type, a warning is logged, and the value is written as a string.
- A value must be present under the attribute specified by the Document ID Attribute writer feature type parameter.
FME Workbench Writer Dataset
The value for the Writer Dataset is the Cosmos DB account.
This value can be the name of the account (for example, contoso
), or the URL for the account (for example, https://contoso.documents.azure.com
). Paths in the URL are ignored.
Automatic Translation
Due to the number of per feature type options on the Cosmos DB writer, as well as the potential monetary cost of creating many feature types (Collections), the FME Quick Translator cannot use the Cosmos DB writer.