XMLFragmenter
Maps elements from an XML document into XML fragments, and optionally flattens the content of the XML elements and the children further as feature attributes.
Output Ports
Each fragment is output as a separate FME feature via the Fragments port. Each feature from the port will have an xml_fragment attribute holding the fragment. The fragment is a valid XML document that may be further processed via subsequent XML-based and/or XQuery-based transformers.
Three additional attributes are added to the Fragments features:
- xml_matched_element – records the element that was matched. This attribute can be used to identify which element matched the expression, if the last component of the matched expression is a wildcard character (*).
- xml_id – holds an ID for that element. This attribute is not guaranteed to be globally unique, but it will be unique only in the context of the input document.
- xml_parent_id – holds an ID for the parent of that element. If the parent of the element is not matched or it does not have any parent, then this attribute is empty.
- xml_parent_child_pos – holds the position of the element in relation to its parent. If the parent of the element is not matched or it doesn’t have any parent, then this attribute is empty. The xml_parent_child_pos starts its count at 0.
If Flatten Options is enabled, then the Fragments features will have additional attributes related to the contents of the matched XML element.
Parameters
XML Source
The XML source type is either an XML file or a feature attribute whose value is the entire XML document.
Feature Paths Configuration
This parameter specifies which fragments to map. The Feature Paths are either whitespace-separated xfMap match expressions or each expression can be specified on new line.
This parameter can be typed directly in the text box or click the browse button to display the editor or choose a feature attribute.
Example
<dc:metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:subject>Utah</dc:subject>
<dc:subject>boundaries</dc:subject>
<dc:subject>County</dc:subject>
<dc:subject>Administrative</dc:subject>
<dc:subject>geoscientificInformation</dc:subject>
<dc:description>This data set represents county boundaries in Utah at 1:24,000 scale.</dc:description>
<dc:date>2004-04-20T00:00:00.000</dc:date>
<dc:type>dataset</dc:type>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">{42AE2814-FCC1-4BC2-BAF4-CA3E55514997}</dc:identifier>
<dc:language>en</dc:language>
<dc:spatial>
<dcmiBox:Box name="Geographic" projection="EPSG:4326" xmlns:dcmiBox="http://dublincore.org/documents/2000/07/11/dcmi-box/">
<dcmiBox:northlimit units="decimal degrees">42.01</dcmiBox:northlimit>
<dcmiBox:eastlimit units="decimal degrees">-109.21</dcmiBox:eastlimit>
<dcmiBox:southlimit units="decimal degrees">36.98</dcmiBox:southlimit>
<dcmiBox:westlimit units="decimal degrees">-114.1</dcmiBox:westlimit>
</dcmiBox:Box>
</dc:spatial>
<dc:rights></dc:rights>
</dc:metadata>
These are a few Feature Paths xfMap expressions targeting the above <dc:metadata> input document:
* "dc:subject" |
Extracts every <dc:subject> into an XML fragment, producing 5 fragment features in total. |
* "dc:spatial/dcmiBox:Box" |
Extracts the <dcmiBox:Box> fragment, but <dc:spatial> must be the parent. |
* "dcmiBox:Box/*" |
Extracts every child of <dcmiBox:Box> into fragments, 4 fragments features corresponding to <dcmiBox:northlimit>, <dcmiBox:eastlimit>, <dcmiBox:southlimit> and <dcmiBox:westlimit> are output. |
* "dc:subject dc:spatial/dcmiBox:Box dcmiBox:Box/*" |
The three previous matched expressions combined, each separated by whitespace. |
If a feature path in ‘Elements to Match’ matches multiple elements, then this parameter can be used to specify which elements should be excluded in the results. The input to this parameter also takes the form of the feature path xfMap expressions described in the ‘Elements to Match’ parameter.
Using the input document above, if the ‘Elements to Match’ is set to ‘dcmi:Box/*’ and ‘Elements to Exclude’ is set to ‘dcmi:northlimit dcmi:eastlimit” then only 2 fragment features will be output corresponding only to <dcmi:southlimit> and <dcmi:westlimit> elements.
Customize Attributes
Setting this parameter to Yes will merge the attributes from the input feature to the output features.
This parameter can be specified to extract the children of the matched elements as xml fragments.
Example
The same XML input as shown in the above example – with the Feature Paths xfMap expression set to “dcmiBox:Box”, the default options accepted in Flatten Options, and Elements As XML Fragments set to ‘dcmi:northlimit dcmi:southlimit” – will produce the following feature. (The differences compared to the previous example are highlighted in bold.)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Feature Type: `XMLFragmenter_FRAGMENTS'
Attribute(encoded: utf-16): `eastlimit' has value `-109.21'
Attribute(encoded: utf-16): `eastlimit.units' has value `decimal degrees'
Attribute(string) : `fme_type' has value `fme_no_geom'
Attribute(encoded: utf-16): `northlimit' has value `42.01'
Attribute(encoded: utf-16): `northlimit.units' has value `decimal degrees'
Attribute(encoded: utf-16): `southlimit' has value `36.98'
Attribute(encoded: utf-16): `southlimit.units' has value `decimal degrees'
Attribute(encoded: utf-16): `westlimit' has value `-114.1'
Attribute(encoded: utf-16): `westlimit.units' has value `decimal degrees'
Attribute(encoded: utf-16): `xml_fragment' has value `?<?xml version="1.0" encoding="UTF-
16"?><dcmiBox:Box name="Geographic" projection="EPSG:4326"
xmlns:dcmiBox="http://dublincore.org/documents/2000/07/11/dcmi-box/">
<dcmiBox:northlimit units="decimal degrees">42.01</dcmiBox:northlimit>
<dcmiBox:eastlimit units="decimal degrees">-109.21</dcmiBox:eastlimit>
<dcmiBox:southlimit units="decimal degrees">36.98</dcmiBox:southlimit>
<dcmiBox:westlimit units="decimal degrees">-114.1</dcmiBox:westlimit>
</dcmiBox:Box>'
Attribute(encoded: utf-16): `xml_fragment_northlimit{0}' has value `<?xml version="1.0" encoding="UTF-16"?><dcmiBox:northlimit units="decimal degrees" xmlns:dcmiBox="http://dublincore.org/documents/2000/07/11/dcmi-box/">42.01</dcmiBox:northlimit>'
Attribute(encoded: utf-16): `xml_fragment_southlimit{0}' has value `<?xml version="1.0" encoding="UTF-16"?><dcmiBox:southlimit units="decimal degrees" xmlns:dcmiBox="http://dublincore.org/documents/2000/07/11/dcmi-box/">36.98</dcmiBox:southlimit>'
Attribute(encoded: utf-16): `xml_id' has value `id-Box-1.2.1.11.1'
Attribute(encoded: utf-16): `xml_matched_element' has value `Box'
Attribute(string) : `xml_type' has value `xml_no_geom'
Geometry Type: Unknown (0)
=================================================
The Options button opens the XML Flatten Options dialog. These options control the children of the matched elements to be flattened as attributes/attribute lists on the features produced.
The default view is Basic mode, where several options are listed:
- Ignore Specific Sub-Elements > Sub-Elements To Ignore : specify the children of the matched elements (from Flatten Paths) that should be ignored. For example, In the example above, if Flatten Paths is “shipto”, then if this option is set to “country city” then both country and city contents are ignored in the output.
- Skip Empty Elements : specify whether empty elements should be mapped as empty feature attributes
- Add Custom Prefix > Prefix: specify the prefix for all the feature attributes that will be added from the flattened XML structure
- Include XML Child Position > Attribute Name: specify the attribute name whose value will be the position of the child element within its parent.
- Add Ancestor Attributes > Ancestor Element(s) : specify the parent element(s) of the elements in Flatten Path whose XML attributes will also be added as feature attributes in the output.
The Advanced button opens the Advanced Editor, which provides additional options for customizing the feature attributes. The functionality of each option is described in the table below. The options here allows customization of the attributes and attribute lists of the matched XML subtree that will be added to FME Features.
Option (with example value) | Description | Default Value | Possible Values |
---|---|---|---|
separator="." | This value of this option is used as the separator in the naming of the attributes of the children of the matched elements. | period (.) | any string |
map-empty-elements="yes" |
If set to yes, any empty elements will be added onto the features as attributes with empty values. Otherwise, the attributes will not be added onto the features. |
Default value if not specified: yes |
yes | no Possible values: any string |
matched-prefix="attributes" |
This option controls whether FME feature attributes produced will be prefixed with matched element's name. If the value is yes, then both the matched element's attributes and all its children are prefixed If the value is no, then none of the feature attributes are prefixed with matched element's name If the value is children, then only the children of the matched element are prefixed If the value is attributes, then only the attributes of the matched element are prefixed. |
Default value if not specified: yes |
yes | no | children | attributes |
matched-attributes="yes" |
If this option is set to yes then the attributes of the matched element are mapped as FME feature attributes. Otherwise, the attributes of the matched element are ignored. matched-prefix option can also be set to attributes or yes to allow the attributes to be prefixed with the name of the matched element. |
Default value if not specified: yes |
yes | no |
matched-ancestor-attributes="" |
The option controls whether XML attributes from ancestor of the matched element should be included as FME feature attributes. - 'parent' or '1' : XML attributes of the parent of the matched element are added - 'grandparent' or '2' : XML attributes of the grandparent of the matched element are added - 'root' or '-1' : XML attributes of the root of the document are added - any non-negative number : XML attributes of the ancestor by going up x levels from the matched element are added. 0 is the matched element. To include more than one ancestor, multiple values can be separated by a space. For example: To get the attributes from root, parent and grand parent, we can specify matched-ancestor-attributes="parent grandparent root" |
parent | grandparent | root |
|
cardinality="+{?}" |
This option can be specified as a space separated list of cardinality directives. |
Default value: +{?} (Treat child elements as a list if there are more than one with the same name) |
Possible values: Refer to the xfMap section in the XML Reader documentation. |
except="" |
The except attribute accepts the same types of expressions as the match or except attribute of a mapping rule. For example, the expression except="parent/child{2}" could be used to exclude the second <child> element contained in a <parent> element from the output of the structure subrule. |
any path expression | |
structure-prefix="" |
This option can be set to non-empty string that serves as a prefix to every attribute that is generated for a matched element. |
any string | |
child-position-attribute="" |
When this option is set to non-empty string, each child element will generate an additional feature attribute whose value will be the position of the child element within its parent. |
any string |
|
attribute-identifier="" |
XML attributes can be differentiated from leaf elements. By setting this option to a non-empty string, the XML reader will append a prefix to the attributes in the leaf elements. |
All the options have more detailed examples and descriptions in the FME Readers/Writers manual: XML (Extensible Markup Language) Reader/Writer.
Example
Given the same XML input as above, and Feature Paths xfMap expression is set to “dcmiBox:Box” with the default options in “Flatten Options” will produce the following feature:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Feature Type: `XMLFragmenter_FRAGMENTS'
Attribute(encoded: utf-16): `eastlimit' has value `-109.21'
Attribute(encoded: utf-16): `eastlimit.units' has value `decimal degrees'
Attribute(string) : `fme_type' has value `fme_no_geom'
Attribute(encoded: utf-16): `northlimit' has value `42.01'
Attribute(encoded: utf-16): `northlimit.units' has value `decimal degrees'
Attribute(encoded: utf-16): `southlimit' has value `36.98'
Attribute(encoded: utf-16): `southlimit.units' has value `decimal degrees'
Attribute(encoded: utf-16): `westlimit' has value `-114.1'
Attribute(encoded: utf-16): `westlimit.units' has value `decimal degrees'
Attribute(encoded: utf-16): `xml_fragment' has value `?<?xml version="1.0" encoding="UTF-
16"?><dcmiBox:Box name="Geographic" projection="EPSG:4326"
xmlns:dcmiBox="http://dublincore.org/documents/2000/07/11/dcmi-box/">
<dcmiBox:northlimit units="decimal degrees">42.01</dcmiBox:northlimit>
<dcmiBox:eastlimit units="decimal degrees">-109.21</dcmiBox:eastlimit>
<dcmiBox:southlimit units="decimal degrees">36.98</dcmiBox:southlimit>
<dcmiBox:westlimit units="decimal degrees">-114.1</dcmiBox:westlimit>
</dcmiBox:Box>'
Attribute(encoded: utf-16): `xml_id' has value `id-Box-1.2.1.11.1'
Attribute(encoded: utf-16): `xml_matched_element' has value `Box'
Attribute(string) : `xml_type' has value `xml_no_geom'
Geometry Type: Unknown (0)
=================================================
Expose Attributes
Exposes any attributes so they can be used by other transformers. Type directly in the text box or click the browse button to display the editor and add attributes there.
Editing Transformer Parameters
Using a set of menu options, transformer parameters can be assigned by referencing other elements in the workspace. More advanced functions, such as an advanced editor and an arithmetic editor, are also available in some transformers. To access a menu of these options, click beside the applicable parameter. For more information, see Transformer Parameter Menu Options.
Transformer Categories
Search FME Knowledge Center
Search for samples and information about this transformer on the FME Knowledge Center.
Tags Keywords: XMLExploder