HTML Table Reader

FME can read table and list data from HTML (Hypertext Markup Language) documents.

HTML is used to format documents for display in web browsers. While the primary purpose is not to store data for machine-readability, table and list elements often contain useful data.

Although HTML is XML-based, it is not compatible with strict XML parsing. As a further complication, due to the lenient parsing methods used in web browsers, an HTML document does not have to follow the HTML specification fully in order to display reasonably well.

HTML Table Product and System Requirements

Format

FME Platform

Operating System

Reader/Writer

FME Form

FME Flow

FME Flow Hosted

Windows 64-bit

Linux

Mac

Reader

Yes

Yes

Yes

Yes

Yes

Yes

Reader Overview

The HTML Table Reader parses features from the document. The HTML Table Reader lists all of the table and list (ul and ol) elements in the HTML document and allows you to select which tables or lists to read.

FME Workbench Reader Dataset

The value for the Reader Dataset is the path or URL to an HTML document.

For additional information, see:

Schema Scanning

Since the values in an HTML table do not have an associated schema, each attribute will be assigned the feature data type text by default. In the case of lists, or tables without a header row, generic attribute names will be generated.