Reader Directives
The suffixes shown are prefixed by the current <ReaderKeyword>
in a mapping file. By default, the <ReaderKeyword>
for the CSV reader is CSV.
DATASET
Required/Optional: Required
This is the name of a folder containing one or more CSV files, or the name of a single CSV file. The default extension for CSV files is .csv.
Example:
CSV_DATASET /usr/data/csv/input
Workbench Parameter: Source Comma Separated Value (CSV) File(s)
GROUP_BY_DATASET
When the value is set to No, the only feature type this reader will use is the reader type name.
When the value is set to Yes, the feature type of each dataset is the filename (without the path or the extension) of the dataset.
Required/Optional
Required
Values
YES | NO (default)
Mapping File Syntax
GROUP_BY_DATASET “Yes”
Workbench Parameter
Feature Type Name(s)
DEF
Required/Optional
Required
Each CSV file must be defined before it can be read. The definition contains the file’s base name without any of the extensions, followed by the names and types of the attributes. There may be many DEF lines, one for each file to be read. If this is not specified, then all defined CSV files in the folder are read.
The syntax of a CSV DEF line is:
<ReaderKeyword>_DEF <baseName> \ [<attrName> <attrType>]+
The following table shows the attribute types supported.
Field Type |
Description |
char(<width>) |
Character fields store fixed length strings. The |
date |
Date fields store date as character string with the format YYYYMMDD. |
number(<width>,<decimals>) |
Number fields store single and double precision floating point values. The |
float |
Float fields store floating point values. There is no ability to specify the precision and width of the field. |
int |
Integer fields store 32-bit signed integers. |
logical |
Logical fields store TRUE/FALSE data. Data read or written from and to such fields must always have a value of either |
text | Text fields store unbounded character data strings. |
x coordinate
y coordinate z coordinate |
Coordinate fields store 64 bit floating point values used to create the x, y and optionally z coordinates of a point geometry for the row. The point geometry will be created in addition to the attributes of coordinate types. |
The following table shows all the DEF line directives that are supported by the CSV reader. Each of these directives has the same meaning as the global CSV reader keyword with the same suffix. Any value specified on a DEF line will override values defined for equivalent global directives, as they apply to the table being defined.
DEF Line Directives |
Value |
Required/ |
|
See FIELD_NAMES for details. |
Optional |
<yes|no> |
See FIELD_NAMES_AFTER_HEADER for details. |
Optional |
CSV_SEPARATOR (<separator>) |
See SEPARATOR for details. |
Optional |
CSV_SKIP_LINES <number> |
See SKIP_LINES for details. |
Optional
|
CSV_STRIP_QUOTES <yes|no> |
See STRIP_QUOTES for details. |
Optional |
CSV_DUPLICATE_DELIMS <yes|no> |
See DUPLICATE_DELIMS for details. |
Optional |
CSV_EXTENSION <extension> |
See EXTENSION for details. |
Optional |
CSV_ENCODING <encoding> |
See ENCODING for details. |
Optional |
The following mapping file fragment defines a CSV file called roads. Here we define the ‘?’ as the separator character for columns in the file and we choose not to output the field names to the output file.
CSV_DEF roads \ CSV_SEPARATOR (?) \ CSV_FIELD_NAMES no \ id_num number(11,0) \ type char(20)
IDs
Required/Optional: Optional
This specification limits the available and defined CSV files read. If no IDs are specified, then all defined and available CSV files in the folder are read.
The syntax of the IDs keyword is:
<ReaderKeyword>_IDs <baseName> \ <baseName1> … \ <baseNameN>
The basenames must match those used in DEF lines.
The example below selects only the roads CSV file for input during a translation:
CSV_IDs roads
Workbench Parameter
Feature Types to Read
FIELD_NAMES
Required/Optional: Optional
If the field or column names of the CSV table are specified in the file, then set this value to yes and the names will be extracted from the file. Otherwise, the columns of the CSV table are given default names (i.e. col0, col1, ... , colN) with the setting no. The default is no.
Note: If FIELD_NAMES is set to yes, skip_lines should also be set to skip at least one row, or the first row will be also be processed as a feature. You can also set FIELD_NAMES_AFTER_HEADER to yes. See FIELD_NAMES_AFTER_HEADER below for details.
Values: <yes | no>
FIELD_NAMES_AFTER_HEADER
Required/Optional
Optional
If the column/field names is AFTER the header information instead of BEFORE, then you can set FIELD_NAMES_AFTER_HEADER to yes. Otherwise, by default, the first line of the file will be used as the column/field names.
Note: This parameter is ignored if FIELD_NAMES is not set, or it is set to no. If FIELD_NAMES_AFTER_HEADER is set to yes, SKIP_LINES should also be set to skip at least one row, or the first row will be also be processed as a feature.
Values: <yes | no>
SEPARATOR
Required/Optional: Optional
A special field is listed to identify the separator used to divide the fields in the file. By default, a comma is used; however, different one-character separators can also be specified. Tab character separators are indicated by a backslash (\) followed by a “t”; for example:
CSV_SEPARATOR (\t)
Note: There must be a space between CSV_SEPARATOR and (<separator>). The begin and end parentheses are optional.
Values: (<separator>)
SKIP_LINES
Required/Optional: Optional
This field can be listed to indicate the number of lines to skip at the top of the file. By default, no lines are skipped. Each line skipped is logged to the log file. This is useful if the CSV file contains a header line of field names or other descriptive material that should be skipped.
Values: <number>
Workbench Parameter: Number of Lines to Skip
SKIP_FOOTER
This field can be listed to indicate the number of footer lines to skip at the bottom of the file. By default, no footer lines are skipped. Each footer line skipped is logged to the log file.
This is useful if the file contains footer lines of descriptive material that should be skipped. This directive is ignored if reading the whole file at once. If reading backwards, the lines are skipped from the top of the file instead.
Required/Optional
Optional
Values
<number>
Workbench Parameter
Number of Footer Lines to Skip
STRIP_QUOTES
Required/Optional: Optional
Some CSV files place quotation marks around all values they contain. By setting this special field to yes, then these quotes can be stripped from each attribute. The default is no.
Values: <yes|no>
Workbench Parameter: Strip Quotes from Fields
DUPLICATE_DELIMS
Required/Optional: Optional
This field can be listed to indicate if duplicate delimiters are to be treated as a single delimiter. If set to yes then multiple contiguous delimiters are treated as a single delimiter; otherwise, each delimiter is treated as if it delimits a different field.
Values: <yes|no>
Workbench Parameter: Skip Duplicate Delimiters
SCAN_MAX_FEATURES
Required/Optional
Optional
This field can be listed to indicate the maximum number of rows to scan to obtain both the number of fields and the lengths of fields in the resultant source feature type. This does not affect the preview in the source parameter dialog which uses always scans a fixed amount of rows. If set to a value of 0, the entire file will be read to determine the schema.
Values
<number of lines>
Workbench Parameter
Maximum Lines to Scan (0 for no limit)
EXTENSION
Required/Optional: Optional
This specifies the file extension to be read or written in. The default is .csv.
Values
<.extension> (Include the period (.) in front of the extension name.)
Default
0
SORT_GROUP
This directive defines whether sorting is enabled. If not present, sorting will be enabled if SORT_PARAMS has a value. Otherwise, if this directive is present, then it controls whether or not sorting is enabled.
Required/Optional
Optional
Values
<yes/no>
SORT_PARAMS
This directive defines the fields, methods, and directions to use to sort the CSV dataset. If this parameter is present, features will be produced in the order defined by the parameter values. Once sorting is complete, features will be output from the reader in sorted order.
Required/Optional
Optional
Values
<field name>,<numeric|alphabetic>,<ascending|descending>
Workbench Parameter
Sort
FILTER_GROUP
This directive defines whether filtering is enabled. If not present, filtering will be enabled if FILTER_ATTRS has a value. Otherwise, if this directive is present, then it controls whether filtering is enabled or not
Required/Optional
Optional
Values
<yes/no>
FILTER_ATTRS
This directive defines which fields of the CSV dataset will be used to filter rows on read. The FILTER_REGEX directive specifies the regular expression that will be applied to the fields in FILTER_ATTRS. Depending on the value of the FILTER_UNMATCHED directive, the CSV reader will either ignore rows that match the filter, or only read rows that do match the filter.
Required/Optional
Optional
Values
<field name1> <field name2> …
Workbench Parameter
Field Names
FILTER_UNMATCHED
If FILTER_ATTRS is specified, this directive defines whether the CSV reader will skip, or exclusively read rows that match FILTER_REGEX, or exclusively read the matching rows. If FILTER_UNMATCHED is yes, then the CSV reader will skip any rows that match the FILTER_REGEX for the fields specified in FILTER_ATTRS. If FILTER_UNMATCHED is no, then the CSV reader will only read rows with values of FILTER_ATTRS that match FILTER_REGEX.
Required/Optional
Optional
Values
<yes|no>
Workbench Parameter
Filter Matching
FILTER
If FILTER_ATTRS is specified, this directive defines the regular expression string that will be compared against the values of fields specified in FILTER_ATTRS during filtering.
Required/Optional
Optional
Values
<regular expression string>
Workbench Parameter
Filter
ENCODING
This specifies the file encoding to use when reading.
Required/Optional
Optional
Values
<encoding>
Workbench Parameter
Character Encoding
Encodings |
UTF-8 |
UTF-16 |
UTF-16LE |
UTF-16BE |
UTF-32 |
UTF-32LE |
UTF-32BE |
ANSI |
BIG5 |
SJIS |
CP437 |
CP708 |
CP720 |
CP737 |
CP775 |
CP850 |
CP852 |
CP855 |
CP857 |
CP860 |
CP861 |
CP862 |
CP863 |
CP864 |
CP865 |
CP866 |
CP869 |
CP874 |
CP932 |
CP936 |
CP950 |
CP1250 |
CP1251 |
CP1252 |
CP1253 |
CP1254 |
CP1255 |
CP1256 |
CP1257 |
CP1258 |
ISO8859-1 |
ISO8859-2 |
ISO8859-3 |
ISO8859-4 |
ISO8859-5 |
ISO8859-6 |
ISO8859-7 |
ISO8859-8 |
ISO8859-9 |
ISO8859-13 |
ISO8859-15 |
Windows-874 |
Windows-949 |
Windows-10000 |
Windows-10006 |
Windows-10007 |
Windows-10029 |