MRF2DDuplicateRemover

Deletes duplicated features. Features are considered to be duplicates if their geometries are within tolerance and only features with a smaller tolerance will remain after cleaning.

Output Ports

Parameters

Parallel Processing

Note: How parallel processing works with FME: see About Parallel Processing for detailed information.

This parameter determines whether or not the transformer should perform the work across parallel processes. If it is enabled, a process will be launched for each group specified by the Group By parameter.

Parallel Processing Levels

Parameter	Number of Processes
No Parallelism	1
Minimal	coresThe processor, or CPU, is the physical part of the computer that performs mathematical calculations. It is the most important part of a computer system. Traditional processors have only one core on the processor, meaning that at any given time, only one set of calculations is being performed. If a processor is dual-core, this means the single chip contains hardware for two processors, now called cores to distinguish them from the single chip, running simultaneously, side by side. (Source: http://www.ehow.com/facts_5730257_computer-core-processors_.html) / 2
Moderate	exact number of cores
Aggressive	cores x 1.5
Extreme	cores x 2

For example, on a quad-core machine, minimal parallelism will result in two simultaneous FME processes. Extreme parallelism on an 8-core machine would result in 16 simultaneous processes.

You can experiment with this feature and view the information in the Windows Task Manager and the Workbench Log window.

Input Ordered

No: This is the default behavior. Processing will only occur in this transformer once all input is present.

By Group: This transformer will process input groups in order. Changes of the value of the Group By parameter on the input stream will trigger batch processing on the currently accumulating group. This will improve overall speed if groups are large/complex, but could cause undesired behavior if input groups are not truly ordered.

Considerations for Using Input is Ordered By

Using Ordered input can provide performance gains in some scenarios, however, it is not always preferable, or even possible. Consider the following when using it, with both one- and two-input transformers.

Single Datasets/Feature Types: Are generally the optimal candidates for Ordered processing. If you know that the dataset is correctly ordered by the Group By attribute, using Input is Ordered By can improve performance, depending on the size and complexity of the data.

If the input is coming from a database, using ORDER BY in a SQL statement to have the database pre-order the data can be an extremely effective way to improve performance. Consider using a Database Readers with a SQL statement, or the SQLCreator transformer.

Multiple Datasets/Feature Types: Since all features matching a Group By value need to arrive before any features (of any feature type or dataset) belonging to the next group, using Ordering with multiple feature types is more complicated than processing a single feature type.

Multiple feature types and features from multiple datasets will not generally naturally occur in the correct order.

One approach is to send all features through a Sorter, sorting on the expected Group By attribute. The Sorter is a feature-holding transformer, collecting all input features, performing the sort, and then releasing them all. They can then be sent through an appropriate filter (TestFilter, AttributeFilter, GeometryFilter, or others), which are not feature-holding, and will release the features one at a time to the transformer using Input is Ordered By, now in the expected order.

The processing overhead of sorting and filtering may negate the performance gains you will get from using Input is Ordered By. In this case, using Group By without using Input is Ordered By may be the equivalent and simpler approach.

In all cases when using Input is Ordered By, if you are not sure that the incoming features are properly ordered, they should be sorted (if a single feature type), or sorted and then filtered (for more than one feature or geometry type).

As with many scenarios, testing different approaches in your workspace with your data is the only definitive way to identify performance gains.

Usage Notes

This transformer performs the same operation as the MRF2DCleaner with the Remove Duplicate Geometries set to Yes and no other options selected. See the MRF2DCleaner for more details.1Portions of this work are the intellectual property of the MRF Geosystems Corporation and are used under license. Copyright © 2006 MRF Geosystems Corporation. All rights reserved.

FME Licensing Level

The MRFCleaner transformers are available as an extra-cost package from Safe Software. Please contact sales@safe.com or call 604-501-9985.

Related Transformers

MRF2DShortGeometryRemover

Editing Transformer Parameters

Using a set of menu options, transformer parameters can be assigned by referencing other elements in the workspace. More advanced functions, such as an advanced editor and an arithmetic editor, are also available in some transformers. To access a menu of these options, click beside the applicable parameter. For more information, see Transformer Parameter Menu Options.

Transformer Categories

Data Quality

Integrations

Technical History

Associated FME function or factory: MRFCleanFactory

Search FME Knowledge Center

Search for samples and information about this transformer on the FME Knowledge Center.