StringSearcher
Performs a regular expression match on the specified expression. If the expression matches the pattern, the feature is output via the Matched port, and the portion of the original search string that matched the regular expression is stored in the attribute specified in the “Matched Result Attribute” (and optionally matching pieces of the expressed are stored in the attribute list specified in the “Matched Parts Attribute”). Otherwise, it is output via the NotMatched port.
This transformer takes its name and inspiration from the UNIX utility grep, which searches for patterns in text files.
Parameters
The input text to be searched.
To use this transformer to parse out portions of a string, "subexpressions" within the regular expression are used. Subexpressions are enclosed in parentheses ( ), and the portion of the input text that matched that subexpression is stored as an entry in the “Matched Parts Attribute” list. The elements of this list can then be exposed to Workbench by right clicking on it and indicating the number of elements to expose for later use. See below for some examples.
Advanced Regular Expressions (AREs) are supported. For a complete description of AREs, see Syntax of Tcl Regular Expressions in the FME Functions and Factories manual, or see Regular Expression Information.
In brief, an ARE is one or more branches, separated by |, matching anything that matches any of the branches.
A brief summary of the special characters and their meanings is:
| | separates "branches" (or choices) |
* | a sequence of 0 or more matches of what precedes it |
+ | a sequence of 1 or more matches of what precedes it |
? | a sequence of 0 or 1 matches of what precedes it |
. | matches any single character |
^ | matches the start of the value |
$ | matches the end of the value |
[ ] | enclose a set of character choices |
( ) | enclose a "subexpression" -- whatever matches each subexpression is placed into the “Matched Parts Attribute”{} list attribute |
a | any character can be listed to be matched |
Examples:
^A | matches any value starting with an A |
^[0-9] | matches any value starting with a digit |
^[0-9]+$ | matches any value consisting exclusively of digits |
^(beef|chicken)$ | matches values of either “beef” or “chicken” |
^([0-9]*) ([0-9]*)$ | matches two integer numbers separated by a space, and puts the first number into “Matched Parts Attribute”{0} and the second into “Matched Parts Attribute”{1} |
^N([0-9][0-9])[.]([0-9][0-9])[.]([0-9][0-9]) | matches N23.45.11 and puts 23 into “Matched Parts Attribute”{0}, 45 into “Matched Parts Attribute”{1}, and 11 into “Matched Parts Attribute”{2} |
The Regular Expression field can also include any number of special characters.
Characters can be expressed as regular characters but they can also include any number of control characters.
Special character sequences (Advanced Editor only) are interpreted as shown below:
Sequence | Description |
---|---|
Ctrl+Shift+h (^H) |
Backspace (0x08) |
Ctrl+Shift+l (^L) |
Form feed (0x0c) |
Ctrl+Shift+j (^J) |
Newline (0x0a) |
Ctrl+Shift+r (^M) |
Carriage return (0x0d) |
Ctrl+Shift+i (^I) |
Tab (0x09) |
Ctrl+Shift+k (^K) |
Vertical tab (0x0b) |
Defining Special Characters
You can define special characters through the Basic or Advanced Editors. Click Open Editor from the parameter menu:
Basic Text Editor
Select Constant from the String Type column (or, in some transformers, the Value column) and click on the empty field in the column:
Click the browse button to the right of the column to open an Edit Value dialog. In this editor, enter characters using the shortcut keys from the table above.
Advanced Text Editor
Enter characters using the shortcuts from the table above.
Note: To see tab characters, click the Options menu on the bottom left and select Show Spaces/Tabs.
Note that the matches can be either case-sensitive or case-insensitive, depending on how the transformer is configured.
The attribute name used to store the matching result. The default attribute name is _matched_characters.
The attribute list name used to store optional matching subexpressions. The default attribute list name is _matched_parts.
Note: Attribute lists are not accessible from the output schema in Workbench unless they are first processed using a transformer that operates on them, such as ListExploder or ListConcatenator. All attribute list transformers are displayed in the Contents pane of the Transformer Help under Lists. Alternatively, AttributeExposer can be used.
Related Transformers
To replace substrings matching a regular expression in a string, use the StringReplacer transformer.
Additional Resources
Test regular expressions with Rubular, a Ruby-based regular expression editor.
Editing Transformer Parameters
Using a set of menu options, transformer parameters can be assigned by referencing other elements in the workspace. More advanced functions, such as an advanced editor and an arithmetic editor, are also available in some transformers. To access a menu of these options, click beside the applicable parameter. For more information, see Transformer Parameter Menu Options.
Transformer Categories
Transformer History
This transformer was previously named Grepper.
Search FME Knowledge Center
Search for samples and information about this transformer on the FME Knowledge Center.
Tags Keywords: Grepper