GeoRSS/RSS Feed Reader/Writer
XML feeds are a popular method of publishing information to a set of subscribers. Using GeoRSS, an XML feed can be extended to include spatial data.
The GeoRSS reader/writer plug-in enables FME to read and write XML feeds and their spatial data extensions.
Overview
An XML feed can be in one of several different formats, with the most common formats being RSS and Atom. Both of these formats have a similar structure in that the feed contains metadata and a collection of entries. The specifications for the current versions of these formats can be found at http://www.rssboard.org/rss-specification and http://tools.ietf.org/html/rfc4287, respectively.
Currently the GeoRSS reader supports RSS versions 0.91, 0.92 and 2.0, as well as Atom 0.3 and 1.0. The GeoRSS writer can output feeds in RSS 2.0 or Atom 1.0.
The GeoRSS specification defines a way to add spatial information to an XML feed. The GeoRSS reader and writer both support each of the three methods used to include spatial information: W3C Geo, GeoRSS Simple, and GML. Specifications for each of these methods can be found at http://www.georss.org.
Reader Overview
The GeoRSS reader works by mapping an XML feed and its entries into FME features. A single FME feature is created for the feed metadata, and a FME feature is created for each entry in the feed. Because each feed format has a similar structure, the same schema is used for every feed that the reader processes, regardless of the feed format and version. The reader can handle an XML feed from a local or network file, or a remote URL accessible via http or ftp. The reader can access these URL’s directly, or it can be routed through a proxy server.
The GeoRSS reader supports each of the three methods for extending an XML feed with spatial data. The reader also supports feeds and entries with aggregate geometries, even though these are not explicitly included in the GeoRSS specification. This includes feeds and entries with multiple instances of the same data format, or combinations of the three spatial data formats.
The EPSG:4326 coordinate system is used for all features that contain W3C Geo or GeoRSS Simple geometry extensions. The GML geometry extension allows a different coordinate system to be set for each feature, with EPSG:4326 being the default.
If a feature contains only GML geometry extensions, then the feature’s coordinate system will be set from the first extension. The lowest level non-default coordinate system from the first extension will be used, or if there are no coordinate systems specified, the default coordinate system will be used.
The GeoRSS reader will not ignore any XML elements that it encounters. If the reader encounters XML that it cannot use to populate the predefined GeoRSS feature schema, it will simply add the XML to the feature as a new attribute.
The new attribute will be named based on the prefix and name of the unknown element. If the xml element has a prefix, the new attribute will be named prefix_name. If the element has no prefix, the new attribute name will be _name.
If the XML element has no attributes and only text content, the value of the new feature attribute will be the text content of the XML element. If the XML element contains XML attributes or non-text child elements, then the entire XML element will be the value of the new attribute.
Writer Overview
The GeoRSS writer can write any collection of features out as a GeoRSS feed. If no feature is specified to be a Feed type feature, then the writer will use default values for the metadata it produces. Any feature whose type is not ‘Feed’ will be treated as an ‘Entry’ type feature.
This means that the writer will look at features for the attributes specified by the GeoRSS schema. Any feature attributes which are not contained in the GeoRSS feature schema will be ignored. Furthermore, if a feature has no value for certain attributes, the GeoRSS writer will provide default values for these attributes. This ensures that the GeoRSS writer always tries to produce a valid Atom or RSS feed, regardless of the features that are passed to it.
The W3C Geo and GeoRSS Simple geometry formats use the EPSG:4326 coordinate system. Thus if the GeoRSS writer is using either of these geometry formats, all features passed to the writer will be reprojected to EPSG:4326 if this functionality is licensed. Features with no coordinate system are assumed to be in EPSG:4326.
The GML geometry format supports any coordinate system, so if the GeoRSS writer is using this geometry format, features passed to the writer will be written in whichever coordinate system they have been tagged with. In the event that no coordinate system is set on a feature, the coordinate system will be assumed to be EPSG:4326.
The three different geometry formats support varying levels of geometry complexity. The GeoRSS writer will attempt to downgrade unsupported geometry to a supported type, but this is not always possible. Since the GeoRSS specification does not allow multi-geometries, the GeoRSS writer will always only attempt to write the first item of an aggregate of multi-geometry.
The W3C Geo geometry format only supports point geometry. If a feature with any other type of geometry is passed to the GeoRSS writer while it is writing in this format, the feature’s geometry will be ignored.
The GeoRSS Simple geometry format supports point, line, polygon geometry. When writing in the format, the GeoRSS writer will attempt to downgrade more complex geometries to one of these types. For example, a feature with donut geometry will have it’s geometry written out as a polygon, and the interior of the donut will be ignored. Similarly, areas and ellipses will also be downgraded to polygon geometry. Arcs, paths and curves will be downgraded to line geometry.
The GML geometry format supports similar geometries to the ones supported by the GeoRSS simple geometry format. However the GML format allows donut geometry, so features with donut geometry will not be written out as polygon geometry.