HTML Table Reader
FME can read table and list data from HTML (Hypertext Markup Language) documents.
HTML is used to format documents for display in web browsers. While the primary purpose is not to store data for machine-readability, table and list elements often contain useful data.
Although HTML is XML-based, it is not compatible with strict XML parsing. As a further complication, due to the lenient parsing methods used in web browsers, an HTML document does not have to follow the HTML specification fully in order to display reasonably well.
HTML Table Product and System Requirements
Format |
FME Platform |
Operating System |
||||
---|---|---|---|---|---|---|
Reader/Writer |
FME Form |
FME Flow |
FME Flow Hosted |
Windows 64-bit |
Linux |
Mac |
Reader |
Yes |
Yes |
Yes |
Yes |
Yes |
Yes |
Reader Overview
The HTML Table Reader parses features from the document. The HTML Table Reader lists all of the table and list (ul and ol) elements in the HTML document and allows you to select which tables or lists to read.
FME Workbench Reader Dataset
The value for the Reader Dataset is the path or URL to an HTML document.
For additional information, see:
Schema Scanning
Since the values in an HTML table do not have an associated schema, each attribute will be assigned the feature data type text by default. In the case of lists, or tables without a header row, generic attribute names will be generated.