StringSearcher
Performs a regular expression match on the specified expression. If the expression matches the pattern, the feature is output via the Matched port, and the portion of the original search string that matched the regular expression is stored in the attribute specified in the Matched Result Attribute (and optionally matching pieces of the expression are stored in the attribute list specified in the All Matches List Name and Subexpression Matches List Name parameters, along with the starting indices of each piece). Otherwise, it is output via the NotMatched port.
Parameters
The input text to be searched.
To use this transformer to parse out portions of a string, regular expressions are used. The portion of the input text that matched the regular expression is stored as an entry in the All Matches List Name list. The elements of this list can then be exposed to Workbench by right clicking on it and indicating the number of elements to expose for later use. See below for some examples.
Advanced Regular Expressions (AREs) are supported. For a complete description of AREs, see Syntax of Perl-Compatible Regular Expressions at http://perldoc.perl.org/perlre.html#Regular-Expressions.
In brief, an ARE is one or more branches, separated by |, matching anything that matches any of the branches.
A brief summary of the special characters and their meanings is:
| | separates "branches" (or choices) |
* | a sequence of 0 or more matches of what precedes it |
+ | a sequence of 1 or more matches of what precedes it |
? | a sequence of 0 or 1 matches of what precedes it |
. | matches any single character |
^ | matches the start of the value |
$ | matches the end of the value |
[ ] | enclose a set of character choices |
( ) | enclose a "subexpression" -- whatever matches each subexpression is placed into the “Subexpression Matches List Name”{} list attribute |
a | any character can be listed to be matched |
Examples:
^A | matches any value starting with an A |
^[0-9] | matches any value starting with a digit |
^[0-9]+$ | matches any value consisting exclusively of digits |
^(beef|chicken)$ | matches values of either “beef” or “chicken” |
^([0-9]*) ([0-9]*)$ | matches two integer numbers separated by a space, and puts the first number into “Subexpression Matches List Name”{0} and the second into “Subexpression Matches List Name”{1} |
^N([0-9][0-9])[.]([0-9][0-9])[.]([0-9][0-9]) | matches N23.45.11 and puts 23 into “Subexpression Matches List Name”{0}, 45 into “Subexpression Matches List Name”{1}, and 11 into “Subexpression Matches List Name”{2} |
The Contains Regular Expression field can also include any number of special characters.
Characters can be expressed as regular characters but they can also include any number of control characters.
Special character sequences are interpreted as shown below:
Sequence | Description |
---|---|
Ctrl+Shift+h (^H) |
Backspace (0x08) |
Ctrl+Shift+l (^L) |
Form feed (0x0c) |
Ctrl+Shift+j (^J) |
Newline (0x0a) |
Ctrl+Shift+r (^M) |
Carriage return (0x0d) |
Ctrl+Shift+i (^I) |
Tab (0x09) |
Ctrl+Shift+k (^K) |
Vertical tab (0x0b) |
Defining Special Characters
You can define special characters through the Text Editors. Click Open Text Editor from the parameter menu:
Text Editor
Enter characters using the shortcuts from the table above.
Note: To see tab characters, click the Options menu on the bottom left and select Show Spaces/Tabs.
Note that the matches can be either case-sensitive or case-insensitive, depending on how the transformer is configured.
The attribute name used to store the matching result. The default attribute name is _first_match.
Advanced
The optional name for the list attribute to store all matches. This parameter is left empty by default.
For example, if All Matches List Name was somelist and the string "Oct 15 1915; Apr 7 1964; May 21 1937" was searched with the regular expression \d{4}, all 4 digit strings would be matched, and we would have the following result:
somelist{1} = 1915
somelist{2} = 1964
somelist{3} = 1937
Note: List attributes are not accessible from the output schema in Workbench unless they are first processed using a transformer that operates on them, such as ListExploder or ListConcatenator. Alternatively, AttributeExposer can be used.
The optional name for the list attribute to store subexpression matches. The difference between this list and All Matches List Name is the matches listed here are from capture groups, which are regex subexpressions contained by brackets. This parameter is left empty by default.
Note: List attributes are not accessible from the output schema in Workbench unless they are first processed using a transformer that operates on them, such as ListExploder or ListConcatenator. Alternatively, AttributeExposer can be used.
This parameter controls the order in which features exit a transformer.
When a transformer has more than one output port, features usually exit one port at a time. At times, it may be useful to keep the order that features arrived in, switching from port to port as necessary. This allows feature order to be preserved, though at a potential cost in processing efficiency.
Select a method for feature ordering.
Per Output Port (Default) |
Only preserve the input order of features as they occur within the group of features exiting a given output port. All features exiting an output port retain their ordering relative to each other (within the group), but not relative to features exiting other output ports. This option is generally the most efficient, where large chunks of features will exit an output port together (taking advantage of bulk mode). As features exiting different output ports may not be strictly output in the order they arrived, output ordering may be unpredictable. |
Across Output Ports |
Strictly preserve the input order of features, regardless of which output port they exit. Features will be output singly in the same order they arrived, switching from port to port as necessary. This option is generally less efficient as the processing gains of bulk mode are less likely to apply - however, feature order is predictable. |
Usage Notes
- Output feature order may be controlled with the Advanced > Preserve Feature Order parameter.
- To replace substrings matching a regular expression in a string, use the StringReplacer transformer.
Additional Resources
Test regular expressions with the Regular Expression Editor in the Contains Regular Expression field’s context menu.
For more information on regular expression syntax, see http://perldoc.perl.org/perlre.html#Regular-Expressions.
Editing Transformer Parameters
Using a set of menu options, transformer parameters can be assigned by referencing other elements in the workspace. More advanced functions, such as an advanced editor and an arithmetic editor, are also available in some transformers. To access a menu of these options, click beside the applicable parameter. For more information, see Transformer Parameter Menu Options.
Defining Values
There are several ways to define a value for use in a Transformer. The simplest is to simply type in a value or string, which can include functions of various types such as attribute references, math and string functions, and workspace parameters. There are a number of tools and shortcuts that can assist in constructing values, generally available from the drop-down context menu adjacent to the value field.
Using the Text Editor
The Text Editor provides a convenient way to construct text strings (including regular expressions) from various data sources, such as attributes, parameters, and constants, where the result is used directly inside a parameter.
Using the Arithmetic Editor
The Arithmetic Editor provides a convenient way to construct math expressions from various data sources, such as attributes, parameters, and feature functions, where the result is used directly inside a parameter.
Conditional Values
Set values depending on one or more test conditions that either pass or fail.
Parameter Condition Definition Dialog
Content
Expressions and strings can include a number of functions, characters, parameters, and more.
When setting values - whether entered directly in a parameter or constructed using one of the editors - strings and expressions containing String, Math, Date/Time or FME Feature Functions will have those functions evaluated. Therefore, the names of these functions (in the form @<function_name>) should not be used as literal string values.
These functions manipulate and format strings. | |
Special Characters |
A set of control characters is available in the Text Editor. |
Math functions are available in both editors. | |
Date/Time Functions | Date and time functions are available in the Text Editor. |
These operators are available in the Arithmetic Editor. | |
These return primarily feature-specific values. | |
FME and workspace-specific parameters may be used. | |
Creating and Modifying User Parameters | Create your own editable parameters. |
Dialog Options - Tables
Transformers with table-style parameters have additional tools for populating and manipulating values.
Row Reordering
|
Enabled once you have clicked on a row item. Choices include:
|
Cut, Copy, and Paste
|
Enabled once you have clicked on a row item. Choices include:
Cut, copy, and paste may be used within a transformer, or between transformers. |
Filter
|
Start typing a string, and the matrix will only display rows matching those characters. Searches all columns. This only affects the display of attributes within the transformer - it does not alter which attributes are output. |
Import
|
Import populates the table with a set of new attributes read from a dataset. Specific application varies between transformers. |
Reset/Refresh
|
Generally resets the table to its initial state, and may provide additional options to remove invalid entries. Behavior varies between transformers. |
Note: Not all tools are available in all transformers.
Transformer History
This transformer was previously named Grepper.
FME Community
The FME Community is the place for demos, how-tos, articles, FAQs, and more. Get answers to your questions, learn from other users, and suggest, vote, and comment on new features.
Search for samples and information about this transformer on the FME Community.
Keywords: Grepper