FME Transformers: 2024.1

Categories
Integrations
Strings
Web

Web

Workflows
Related Transformers
HTTPCaller

HTMLExtractor

Extracts structured data from web page or other HTML sources that are formatted for human readability (screen scraping), using CSS selectors to extract portions of HTML content into feature attributes.

Jump to Configuration

Typical Uses

  • Extracting content from a web page

How does it work?

The HTMLExtractor lets you define multiple queries to run against incoming HTML content, which can be provided either as an attribute or as a file. The queries are composed of an output attribute name, a CSS Selector which defines what type of tags to extract, and the choice of extracting whole tags, values, text, or HTML attributes.

You may either extract the first matching tag only, or keep multiple results as a list attribute.

The HTMLExtractor is better suited to HTML content than the XML transformers or regular expression searches, due to more lenient parsing and filters that can withstand minor changes to page content.

Examples

Usage Notes

Configuration

Input Ports

Output Ports

Parameters

Editing Transformer Parameters

Transformer parameters can be set by directly entering values, using expressions, or referencing other elements in the workspace such as attribute values or user parameters. Various editors and context menus are available to assist. To see what is available, click beside the applicable parameter.

For more information, see Transformer Parameter Menu Options.

Reference

Processing Behavior

Feature-Based

Feature Holding

No

Dependencies None
Aliases  
History Released: FME 2017.0

FME Community

The FME Community is the place for demos, how-tos, articles, FAQs, and more. Get answers to your questions, learn from other users, and suggest, vote, and comment on new features.

Search for all results about the HTMLExtractor on the FME Community.

 

Examples may contain information licensed under the Open Government Licence – Vancouver, Open Government Licence - British Columbia, and/or Open Government Licence – Canada.