MRF2DCleaner
Note: Support for the MRFClean transformers is no longer provided as of FME 2022.0. Please contact Safe Software for alternative transformers that can be used to transition your FME workspaces.
The MRFCleaner repairs geometry, particularly during data migration from CAD to GIS, and is built upon the MRFCleanFactory, which is an integration of MRF’s cleaning technology into FME. The MRFCleaner fixes geometric problems in input data such as line overshoots and undershoots within the user-specified tolerance. It is useful for multi-layer and multi-tolerance two-dimensional data cleaning. Typical applications include the correction of utility maps, parcel maps, topographic maps and resource maps as data is migrated from one system to another.
The MRFCleaner includes the following functionality:
- fuzzy tolerance
- extending lines
- weeding lines
- joining lines
- processing short elements
- removing gaps
- removing duplicates
- removing dangles
- performing conflation
The number of layers used in cleaning the data is determined by the number of different cleaning tolerance values of input features. Features that have the same cleaning tolerances are processed as being on the same layer. This allows feature data from a high-quality data source to be assigned a low cleaning tolerance and integrated with data from a lower-quality data source which would be given a larger cleaning tolerance.
Geometries such as path, polygon, donut, ellipse, elliptical arc, multi-area, multi-curve, text, and multi-text are converted to basic geometries such as point, line, path, arc or multi-point prior to the cleaning process. The cleaner understands and works with circular arcs. Input features with invalid geometries are ignored and deleted.
Usage Tips
You can also use one more of the following transformers to perform singular MRFCleaner operations. These transformer parameters are all available as part of this MRF2DCleaner transformer, but you may wish to use separate transformers so that the operations are more easily visible in your workflow.
Output Ports
Each feature that is output through the Cleaned port has a new attribute mrf_clean_status added to specify whether the feature is modified, created, or will remain unchanged in the cleaning process. The possible values of this attribute are "Modified", "Created" and "Original".
Features can also be output through the Flagged port if any of the Remove Dangles, Remove Short Geometries and Compute True Intersections is set to Flag. Each of these features has a new attribute mrf_clean_flag added to specify whether this feature is flagged as being shorter than the cleaning tolerance value ("short"), a dangling geometry ("dangle") or an intersection point ("intersection").
Parameters
Group Processing
The default behavior is to use the entire set of input features as the group. This option allows you to select attributes that define which groups to form—each set of features that have the same value for all of these attributes will be processed as an independent group.
When All Features Received: This is the default behavior. Processing will only occur in this transformer once all input is present.
When Group Changes (Advanced): This transformer will process input groups in order. Changes of the value of the Group By parameter on the input stream will trigger processing on the currently accumulating group. This may improve overall speed (particularly with multiple, equally-sized groups), but could cause undesired behavior if input groups are not truly ordered.
There are two typical reasons for using When Group Changes (Advanced) . The first is incoming data that is intended to be processed in groups (and is already so ordered). In this case, the structure dictates Group By usage - not performance considerations.
The second possible reason is potential performance gains.
Performance gains are most likely when the data is already sorted (or read using a SQL ORDER BY statement) since less work is required of FME. If the data needs ordering, it can be sorted in the workspace (though the added processing overhead may negate any gains).
Sorting becomes more difficult according to the number of data streams. Multiple streams of data could be almost impossible to sort into the correct order, since all features matching a Group By value need to arrive before any features (of any feature type or dataset) belonging to the next group. In this case, using Group By with When All Features Received may be the equivalent and simpler approach.
Note: Multiple feature types and features from multiple datasets will not generally naturally occur in the correct order.
As with many scenarios, testing different approaches in your workspace with your data is the only definitive way to identify performance gains.
General
This is used as the default cleaning tolerance unless the Feature Tolerance Attribute is specified and valid. The minimum cleaning tolerance allowed is 0.0.
The number of layers used in cleaning the data is determined by the number of different cleaning tolerance values of input features. Features that have the same cleaning tolerances are processed as being on the same layer.
If set to Yes, intersections between all input features are computed, breaking arcs and lines wherever an intersection occurs.
If set to Flag, the intersection point will be output through the Flagged port, with an mrf_clean_flag attribute set to "intersection".
If Yes, a fuzzy intersection is created from geometries which are within one of the cleaning tolerance distances, but do not actually touch or cross.
If set to Yes, arcs and lines that are within the specified cleaning tolerance are extended – while maintaining line-work direction. No intersections are created while doing this. This option does not process overshoots; a combination of Compute Intersections and Remove Short Geometries can serve this purpose.
If set to Yes, a number of vertices of lines are removed. The number of vertices removed is controlled by a weeding tolerance of the value of (Filter Factor * value of Cleaning Tolerance) or (Filter Factor * value of Feature Tolerance Attribute). The latter is always used when it is valid and the Feature Tolerance Attribute is specified. The larger the value of weeding tolerance, the more vertices will be removed.
If set to Yes, then features that have at least one free end point and have lengths smaller than (Dangle Factor * value of Cleaning Tolerance) or (Dangle Factor * value of Feature Tolerance Attribute) are removed.
If set to Remove Short and Flag Long, then features that have at least one free end point will either be removed as above, or its end point will be output through the Flagged port, with an mrf_clean_flag attribute set to "dangle".
The default value of Dangle Factor is 1.0 and the minimum is 0.0.
This parameter is used with Remove Dangles to determine if a dangling feature is too short.
The default value is 1.0 and the minimum value is 0.0.
This parameter is used with Generalize Lines to determine a weeding tolerance.
The default value is 1.0 and the minimum value is 0.0.
Geometries
If set to Yes, features that have lengths smaller than the specified cleaning tolerances are deleted.
If set to Flag, a point on the feature will be output through the Flagged port, with an mrf_clean_flag attribute set to "short".
If set to Yes, duplicated features are deleted. Features are considered to be duplicates if their geometries are within the cleaning tolerance and only features with a smaller cleaning tolerance will remain after cleaning.
If set to Yes, then singly-connected features are joined to form longer ones. A pair of linear features becomes a candidate for joining only when the two are singly connected at a given node or end point.
If set to Yes, then the geometry of a feature can be changed to match that of another, if the two are approximately the same to begin with.
If set to Yes, then area features such as polygons or donuts will be cleaned without stroking them first.
Module Workflow
MRFCleaner Modules provide more detailed information on the modules in the underlying MRFCleanFactory.
This default workflow is suitable for most situations. However, using the individual modules, it is possible to create any number of customized workflows for specific projects and/or datasets (for example, in Workbench, by using a series of consecutive MRFCleaner transformers or custom transformers). It is important, however, to understand the data being processed and the desired end result.
More Information
Editing Transformer Parameters
Using a set of menu options, transformer parameters can be assigned by referencing other elements in the workspace. More advanced functions, such as an advanced editor and an arithmetic editor, are also available in some transformers. To access a menu of these options, click beside the applicable parameter. For more information, see Transformer Parameter Menu Options.
Defining Values
There are several ways to define a value for use in a Transformer. The simplest is to simply type in a value or string, which can include functions of various types such as attribute references, math and string functions, and workspace parameters. There are a number of tools and shortcuts that can assist in constructing values, generally available from the drop-down context menu adjacent to the value field.
Using the Text Editor
The Text Editor provides a convenient way to construct text strings (including regular expressions) from various data sources, such as attributes, parameters, and constants, where the result is used directly inside a parameter.
Using the Arithmetic Editor
The Arithmetic Editor provides a convenient way to construct math expressions from various data sources, such as attributes, parameters, and feature functions, where the result is used directly inside a parameter.
Conditional Values
Set values depending on one or more test conditions that either pass or fail.
Parameter Condition Definition Dialog
Content
Expressions and strings can include a number of functions, characters, parameters, and more.
When setting values - whether entered directly in a parameter or constructed using one of the editors - strings and expressions containing String, Math, Date/Time or FME Feature Functions will have those functions evaluated. Therefore, the names of these functions (in the form @<function_name>) should not be used as literal string values.
These functions manipulate and format strings. | |
Special Characters |
A set of control characters is available in the Text Editor. |
Math functions are available in both editors. | |
Date/Time Functions | Date and time functions are available in the Text Editor. |
These operators are available in the Arithmetic Editor. | |
These return primarily feature-specific values. | |
FME and workspace-specific parameters may be used. | |
Creating and Modifying User Parameters | Create your own editable parameters. |
Dialog Options - Tables
Transformers with table-style parameters have additional tools for populating and manipulating values.
Row Reordering
|
Enabled once you have clicked on a row item. Choices include:
|
Cut, Copy, and Paste
|
Enabled once you have clicked on a row item. Choices include:
Cut, copy, and paste may be used within a transformer, or between transformers. |
Filter
|
Start typing a string, and the matrix will only display rows matching those characters. Searches all columns. This only affects the display of attributes within the transformer - it does not alter which attributes are output. |
Import
|
Import populates the table with a set of new attributes read from a dataset. Specific application varies between transformers. |
Reset/Refresh
|
Generally resets the table to its initial state, and may provide additional options to remove invalid entries. Behavior varies between transformers. |
Note: Not all tools are available in all transformers.
FME Community
The FME Community is the place for demos, how-tos, articles, FAQs, and more. Get answers to your questions, learn from other users, and suggest, vote, and comment on new features.
Search for samples and information about this transformer on the FME Community.
Keywords: MRFCleaner2D