HTML Reader Parameters
Source Dataset
The path or URL to an HTML document.
Feature Type Configuration
This parameter specifies how the table feature types should be named.
The available options are Element ID (default), Element Class, and Automatically Generate. These options refer to the id and class attributes of the HTML tags.
If the requested attribute cannot be found on the tag, the reader will default to Automatically Generate, which generates sequential generic names.
Example feature type names:
<table id=”mytable” class=”data-table align-right”>
...
</table>
From Element ID: mytable
From Element Class: .data-table.align-right
Automatically Generated: Table1
If any name is duplicated, numbers will be added for disambiguation:
.myclass
.myclass00
.myclass01
This parameter specifies how the list feature types should be named. The available options are Element ID (default), Element Class, and Automatically Generate. These refer to the id and class attributes of the HTML tags in question. If the requested attribute cannot be found on the tag, the reader will fall back to the third option, Automatically Generate, which generates sequential generic names.
Removes HTML tags and leaves plain text.
This optional parameter presents a list of available ordered and unordered lists and allows the choice of one or more for reading.
Only the selected tables and lists will be read.
Schema Attributes
Use this parameter to expose Format Attributes in Workbench when you create a workspace:
- In a dynamic scenario, it means these attributes can be passed to the output dataset at runtime.
- In a non-dynamic scenario where you have multiple feature types, it is convenient to expose additional attributes using this one parameter. For example, if you have ten feature types and want to expose the same attribute in each one, it is easier to define it once than it is to set each feature type individually in the workspace.