Microsoft Azure DocumentDB Reader/Writer

Licensing options for this format begin with FME Professional Edition.

The Microsoft Azure DocumentDB Reader/Writer provides FME with the ability to read and write Documents inside Document Collections within a DocumentDB account.

Microsoft Azure DocumentDB is a managed NoSQL JSON document store, hosted on the Microsoft Azure cloud platform. DocumentDB supports GeoJSON inside Documents, geospatial queries, and executing SQL statements against the document store. For more information about DocumentDB, visit http://documentdb.com.

In FME, Document Collections correspond to Feature Types, and each Document corresponds to a Feature. Feature Type names are Document Collection id values. The Database, under which Document Collections reside, is specified at the Reader/Writer Parameters level.

Reader Overview

The DocumentDB Reader retrieves features (Documents) by executing SQL queries against a Document Collection.

Each JSON Document is converted to a feature based on the schema defined on the corresponding reader feature type. If a key in the JSON Document corresponds to a user attribute on the feature type schema, then a corresponding attribute is set on the feature.

The original JSON Document is available in the documentdb_json format attribute if the Include Document JSON parameter is enabled.

Schema Scanning

To generate the schema for feature types, the DocumentDB reader infers the schema based on an arbitrary first Document from the corresponding Document Collection. The names of top-level properties (attributes) in the Document are adopted as user attributes on the generated schema. Attribute types are also set based on the types of values found in the first Document. The attribute types defined on the generated schema are informational only, and are not enforced.

Depending on the nature of the Documents inside a Document Collection, and the use of a WHERE clause, the schema inferred by the DocumentDB Reader may not accurately reflect the Documents being read. Reader feature type schema may be manually adjusted by enabling reader feature type editing in Workbench options.

Workbench Reader Dataset

The value for the Reader Dataset is the DocumentDB account.

This value can be the name of the account (for example contoso), or the URL for the account (for example, https://contoso.documents.azure.com). Paths in the URL are ignored.

Writer Overview

Each write operation in DocumentDB corresponds to a single HTTP request. There is no bulk writing mode. To maximize performance, the DocumentDB Writer issues multiple write requests in parallel, and there is no guaranteed order of write operations. Overall performance is influenced by Document size, network quality, and the Document Collection’s configured performance level. For more information about performance levels, see https://azure.microsoft.com/en-gb/documentation/articles/documentdb-performance-levels/. Limits also apply to the size of Documents and Document Collections. See https://azure.microsoft.com/en-us/documentation/articles/documentdb-limits/ for details.

Update and delete operations must supply a value for the Document id. If an id is not provided for Insert and upsert operations, a UUID is generated in place of it. For more information, refer to the documentation for the Document ID Attribute writer feature type parameter.

The DocumentDB Writer is capable of creating Document Collections. As there is an hourly cost associated with each Collection, caution is advised to avoid creating too many Collections. For more information, refer to the documentation for the Collection Handling writer feature type parameter.

JSON Document Generation

For INSERT, UPSERT, and DELSERT feature operations, the DocumentDB Writer converts each incoming feature into a JSON Document via one of two ways:

  1. If the documentdb_json format attribute is present and can be parsed as JSON, it’s used as the Document, and all user attributes are ignored. This method is useful for writing arbitrary JSON Documents that may have inconsistent or unknown schema, or when the source data is already serialized JSON.
    • If the JSON doesn’t have an id property, it will be set using the value of the attribute specified by the Document ID Attribute writer feature type parameter.
    • If the JSON has an id property, and the feature has a value in the attribute specified by the Document ID Attribute writer feature type parameter, but the two values are not the same, a warning is raised, and the feature is skipped.
  1. User attributes defined on the writer Feature Type are used to build a JSON Document with the corresponding properties.
    • If the feature has no value for a user attribute, then no corresponding property will be present on the generated Document.
    • Attribute types defined on the Feature Type schema are enforced. If a value can’t be cast to the target type, a warning is logged, and the value is written as a string.
    • A value must be present under the attribute specified by the Document ID Attribute writer feature type parameter.

Workbench Writer Dataset

The value for the Writer Dataset is the DocumentDB account.

This value can be the name of the account (for example, contoso), or the URL for the account (for example, https://contoso.documents.azure.com). Paths in the URL are ignored.

Automatic Translation

Due to the number of per feature type options on the DocumentDB Writer, as well as the potential monetary cost of creating many feature types (Collections), the FME Quick Translator cannot use the DocumentDB Writer.