Syntax FACTORY_DEF MRFCleanFactory2D | MRFCleanFactory3D [FACTORY_NAME ] [INPUT FEATURE_TYPE [ ]* []*]* [AGGREGATE_OUTPUT (YES|NO)] [DIMENSION (2D|3D)] [GROUP_BY ] TOLERANCE [TOLERANCE_ATTR ] [SIMPLIFY [(YES|NO)]] [SHORT_ELEMENT [(YES|NO|FLAG)]] [EXTEND [(YES|NO)]] [INTERSECT [(YES|NO|FLAG)]] [FUZZY_INTERSECT [(YES|NO)]] [DUPLICATE_REMOVE [(YES|NO)]] [JOIN [(YES|NO)]] [CONFLATE [(YES|NO)]] [DANGLER [(YES|NO|FLAG)]] [OBJECT_CLEAN [(YES|NO)]] [DANGLE_FACTOR ] [FILTER_FACTOR ] [OUTPUT (CLEANED|FLAGGED) FEATURE_TYPE [ ]* []*]* Overview Note: MRFCleanFactory requires an extra-cost plug-in. This factory takes features as input and processes them based on the specified modules, tolerance, dangle factor, filter factor, and specific attribute data. This factory is useful for multi-layer and multi-tolerance two-dimensional data cleaning. The number of layers used in cleaning the data is determined by the number of different tolerance values of input features. Features that have the same tolerances are processed as being on the same layer. Geometries such as path, polygon, donut, ellipse, elliptical arc, multi-area, multi-curve, text, and multi-text are converted to basic geometries such as point, line, path, arc or multi-point prior to the cleaning process. In other words, the geometries of the features are not necessarily preserved. Input features with invalid geometries are ignored and deleted. Module Sequence This default workflow shown below is suitable for most situations. However, using the individual modules, it is possible to create any number of customized workflows for specific projects and/or datasets (for example, in Workbench, by using a series of consecutive MRFCleaner transformers or custom transformers). It is important, however, to understand the data being processed and the desired end result. Read Input Elements Complex Element Stroking Short Element Processing Line Simplification Extension Clustering and Splitting Duplicate Removal Element Joining 1 Conflation Element Joining 2 Dangle Removal Element Joining 3 Output General Processing Tips There are several fundamental tips and tricks for optimizing processing. The following list identifies and briefly describes some key issues to consider when cleaning data. Know your data The first step in any cleaning exercise is to become familiar with your source data. Information on data quality (i.e., 1m vs. 100m accuracy), data currency, and intended use is important in determining which cleaning modules and tolerances should be used. If such information is not available, a visual inspection of the design file(s) should provide insight into average line- work gap sizes, line weeding requirements, and other issues which may exist. Start small When setting cleaning tolerances, it is always best to start small. With smaller tolerances, the software uses a smaller search radius, which reduces the number of potential element intersections to consider and increases processing speed. Also, if the bulk of the linework errors can be corrected using a small tolerance, more detail can be maintained in the data set. One or more cleaning processes can always be repeated with larger tolerances to increase the number of errors automatically corrected. Mix it up Depending on the source dataset, and its intended use, you may achieve better results running the individual modules with different tolerances. MRFCleanFactory Modules Tip: The documentation for the various MRF transformers contain useful samples and images. TOLERANCE The TOLERANCE clause must be specified. This value serves as the default tolerance of the input features if the TOLERANCE_ATTR clause is not specified or the features have invalid tolerance value. The minimum value allowed is 0.0. If the TOLERANCE_ATTR is specified and the value of the is greater than or equal to 0.0, this value is used instead of the value specified in TOLERANCE clause. SIMPLIFY If the SIMPLIFY clause is set to "YES", the line weeding / generalization will be included as part of the cleaning process. This involves the removal of line string vertices based on a specified tolerance. This process uses a weeding tolerance of the value of (FILTERFACTOR * TOLERANCE) or (FILTERFACTOR* value of TOLERANCE_ATTR). The latter is used whenever TOLERANCE_ATTR clause is specified and its value is valid. The larger the value of the weeding tolerance, the more vertices will be removed. The default value of FILTERFACTOR is 1.0. SHORT_ELEMENT If the SHORT_ELEMENT clause is set to "YES", geometries of features that have lengths smaller than the specified tolerances are deleted. Short geometries created during the cleaning process are also deleted. EXTEND If the EXTEND clause is set to "YES", the MRF Extend module is enabled. It is useful to extend certain elements - correcting for undershoot - while maintaining line-work direction. If a feature has a free end, this module will attempt to extend it until it meets other line-work within its tolerance; no intersections are created. This module does not process overshoots; the combination INTERSECTION and DANGLER modules can be used to serve this purpose. The EXTEND clause processes elements in the following manner: line-line extension line-arc extension arc-arc extension INTERSECT If the INTERSECT clause is set to "YES", the MRF Intersect module is enabled. This module computes intersections between all input features, breaking arcs and lines wherever an intersection occurs. A fuzzy intersection is also created from geometries which are within one of the tolerance distances, but do not actually touch or cross. If the FUZZY_INTERSECT clause is set to "NO" then fuzzy intersections will not be created in the process. DUPLICATE_REMOVE If the DUPLICATE_REMOVE clause is set to "YES", the MRF Duplicate remover module is enabled. Features are considered to be duplicates if their geometries are within tolerance and only features with a smaller tolerance will remain after cleaning. JOIN If the JOIN clause is set to "YES", then singly-connected features will be joined to form longer ones. A pair of linear features become candidates for joining only when the two of them are singly connected at a given node or end point. CONFLATE If the CONFLATE clause is set to "YES", then the geometry of a feature can be changed match that of another, if the two are approximately the same to begin with. DANGLER A dangle is a geometry that has at least one free end point. If the DANGLER clause is set to "YES", MRFCleanFactory will remove dangles if their lengths are less than the (DANGLEFACTOR * TOLERANCE) or (DANGLEFACTOR * value of TOLERANCE_ATTR). Again the latter is always used whenever possible. The default value of DANGLEFACTOR is 1.0. See below for a description of the other types of output clauses that are supported. OBJECT_CLEAN If the OBJECT_CLEAN is set to "YES", then area features such as polygons or donuts will be cleaned without being stroked first. Output Tags CLEANED This tags output features that are cleaned by the MRFCleanFactory. There is one additional attribute (MRF_CLEAN_STATUS) added on each feature that specifies if the feature is unchanged ("Original"), modified ("Modified"), or created ("Created") in the process of data cleaning. FLAGGED This tags output features that are flagged by the MRFCleanFactory if any of the SHORT_ELEMENT, INTERSECT and DANGLER is set to FLAG. There is one additional attribute (MRF_CLEAN_STATUS) added on each feature that specifies if the feature is flagged for being shorter than the tolerance ("short"), dangling ("dangle"), or an intersection point ("intersection") in the process of data cleaning. TO BE RESOLVED DIMENSION clause added to Syntax section above, but not documented.