HTMLExtractor
Typical Uses
- Extracting content from a web page
How does it work?
The HTMLExtractor lets you define multiple queries to run against incoming HTML content, which can be provided either as an attribute or as a file. The queries are composed of an output attribute name, a CSS Selector which defines what type of tags to extract, and the choice of extracting whole tags, values, or HTML attributes.
You may either extract the first matching tag only, or keep multiple results as a list attribute.
The HTMLExtractor is better suited to HTML content than the XML transformers or regular expression searches, due to more lenient parsing and filters that can withstand minor changes to page content.
Usage Notes
- Standard CSS selectors are used to create queries. A list of them may be found here: CSS Selector Reference
Configuration
Input Ports
Output Ports
Parameters
Dialog Options
Editing Transformer Parameters
Using a set of menu options, transformer parameters can be assigned by referencing other elements in the workspace. More advanced functions, such as an advanced editor and an arithmetic editor, are also available in some transformers. To access a menu of these options, click beside the applicable parameter. For more information, see Transformer Parameter Menu Options.
Defining Values
There are several ways to define a value for use in a Transformer. The simplest is to simply type in a value or string, which can include functions of various types such as attribute references, math and string functions, and workspace parameters. There are a number of tools and shortcuts that can assist in constructing values, generally available from the drop-down context menu adjacent to the value field.
Reference
Processing Behavior |
|
Feature Holding |
No |
Dependencies | None |
FME Licensing Level | FME Base Edition and above |
Aliases | |
History | Released: FME 2017.0 |
Categories |
FME Community
The FME Community is the place for demos, how-tos, articles, FAQs, and more. Get answers to your questions, learn from other users, and suggest, vote, and comment on new features.
Search for all results about the HTMLExtractor on the FME Community.
Examples may contain information licensed under the Open Government Licence – Vancouver