HTML Table Reader

FME can read table and list data from HTML (Hypertext Markup Language) documents.

HTML is used to format documents for display in web browsers. While the primary purpose is not to store data for machine-readability, table and list elements often contain useful data.

Although HTML is XML-based, it is not compatible with strict XML parsing. As a further complication, due to the lenient parsing methods used in web browsers, an HTML document does not have to follow the HTML specification fully in order to display reasonably well.

Note The HTML Table reader processes static HTML – it cannot execute JavaScript. Tables generated dynamically by client-side scripts will not be readable.

HTML Table Product and System Requirements

Format	FME Platform			Operating System
Reader/Writer	FME Form	FME Flow	FME Flow Hosted	Windows 64-bit	Linux	Mac
Reader	Yes	Yes	Yes	Yes	Yes	Yes

Reader Overview

The HTML Table Reader parses features from the document. The HTML Table Reader lists all of the table and list (ul and ol) elements in the HTML document and allows you to select which tables or lists to read.

FME Workbench Reader Dataset

The value for the Reader Dataset is the path or URL to an HTML document.

For additional information, see:

Schema Scanning

Since the values in an HTML table do not have an associated schema, each attribute will be assigned the feature data type text by default. In the case of lists, or tables without a header row, generic attribute names will be generated.