HTML Table Reader Parameters
Source Dataset
HTML Document
The path or URL to an HTML document.
HTML File Extensions
By convention, HTML files have the extension .htm or .html. However, web URLs will often have no file extension, or reflect the source script used to generate the HTML output, such as .php or .asp.
Note that URLs that generate HTML pages are valid datasets provided the request to the URL returns valid HTML. The HTML Table Reader allows any file extension when reading from disk.
Feature Type Configuration
This parameter specifies how the table feature types should be named.
The available options are:
-
Element ID (default)
-
Element Class
-
Automatically Generated
These options refer to the id and class attributes of the HTML tags.
Note: If the requested attribute cannot be found on the tag, the reader will default to Automatically Generate, which generates sequential generic names.
Example feature type names:
<table id=”mytable” class=”data-table align-right”>
...
</table>
From Element ID: mytable
From Element Class: .data-table.align-right
Automatically Generated: Table1
If any name is duplicated, numbers will be added for disambiguation:
.myclass
.myclass00
.myclass01
This parameter specifies how the list feature types should be named.
The available options are:
-
Element ID (default)
-
Element Class
-
Automatically Generated
These options refer to the id and class attributes of the HTML tags.
Note: If the requested attribute cannot be found on the tag, the reader will default to Automatically Generate, which generates sequential generic names.
Removes HTML tags and leaves plain text.
This optional parameter presents a list of available ordered and unordered lists and allows the choice of one or more for reading.
Only the selected tables and lists will be read.
Schema Attributes
Use this parameter to expose Format Attributes in Workbench when you create a workspace:
- In a dynamic scenario, it means these attributes can be passed to the output dataset at runtime.
- In a non-dynamic scenario, this parameter allows you to expose additional attributes on multiple feature types. Click the browse button to view the available format attributes (which are different for each format) for the reader.