Reader Directives

The suffixes shown are prefixed by the current <ReaderKeyword> in a mapping file. By default, the <ReaderKeyword> for the CSV reader is CSV.

DATASET

Required/Optional: Required

This is the name of a folder containing one or more CSV files, or the name of a single CSV file. The default extension for CSV files is .csv.

Example:

CSV_DATASET /usr/data/csv/input

Workbench Parameter: Source Comma Separated Value (CSV) File(s)

GROUP_BY_DATASET

When the value is set to No, the only feature type this reader will use is the reader type name.

When the value is set to Yes, the feature type of each dataset is the filename (without the path or the extension) of the dataset.

Required/Optional

Required

Values

YES | NO (default)

Mapping File Syntax

GROUP_BY_DATASET “Yes”

Workbench Parameter

Feature Type Name(s)

DEF

Required/Optional

Required

Each CSV file must be defined before it can be read. The definition contains the file’s base name without any of the extensions, followed by the names and types of the attributes. There may be many DEF lines, one for each file to be read. If this is not specified, then all defined CSV files in the folder are read.

The syntax of a CSV DEF line is:

<ReaderKeyword>_DEF <baseName>	\
	[<attrName> <attrType>]+

The following table shows the attribute types supported.

Field Type

Description

char(<width>)

Character fields store fixed length strings. The width parameter controls the maximum number of characters that can be stored by the field. No padding is required for strings shorter than this width.

date

Date fields store date as character string with the format YYYYMMDD.

number(<width>,<decimals>)

Number fields store single and double precision floating point values. The width parameter is the total number of characters allocated to the field, including the decimal point. The decimals parameter controls the precision of the data and is the number of digits to the right of the decimal.

float

Float fields store floating point values. There is no ability to specify the precision and width of the field.

int

Integer fields store 32-bit signed integers.

logical

Logical fields store TRUE/FALSE data. Data read or written from and to such fields must always have a value of either true or false.

text Text fields store unbounded character data strings.
x coordinate

y coordinate

z coordinate

Coordinate fields store 64 bit floating point values used to create the x, y and optionally z coordinates of a point geometry for the row. The point geometry will be created in addition to the attributes of coordinate types.

The following table shows all the DEF line directives that are supported by the CSV reader. Each of these directives has the same meaning as the global CSV reader keyword with the same suffix. Any value specified on a DEF line will override values defined for equivalent global directives, as they apply to the table being defined.

DEF Line Directives

Value

Required/
Optional

CSV_FIELD_NAMES

<yes|no>

See FIELD_NAMES for details.

Optional

CSV_FIELD_NAMES_AFTER_
HEADER

<yes|no>

See FIELD_NAMES_AFTER_HEADER for details.

Optional

CSV_SEPARATOR

(<separator>)

See SEPARATOR for details.

Optional

CSV_SKIP_LINES

<number>

See SKIP_LINES for details.

Optional

 

CSV_STRIP_QUOTES

<yes|no>

See STRIP_QUOTES for details.

Optional

CSV_DUPLICATE_DELIMS

<yes|no>

See DUPLICATE_DELIMS for details.

Optional

CSV_EXTENSION

<extension>

See EXTENSION for details.

Optional

CSV_ENCODING

<encoding>

See ENCODING for details.

Optional

The following mapping file fragment defines a CSV file called roads. Here we define the ‘?’ as the separator character for columns in the file and we choose not to output the field names to the output file.

CSV_DEF roads \
	CSV_SEPARATOR	(?) \
	CSV_FIELD_NAMES	no \
	id_num	number(11,0) \
	type  char(20)

IDs

Required/Optional: Optional

This specification limits the available and defined CSV files read. If no IDs are specified, then all defined and available CSV files in the folder are read.

The syntax of the IDs keyword is:

<ReaderKeyword>_IDs  <baseName>	\
	<baseName1> … 	\
	<baseNameN> 

The basenames must match those used in DEF lines.

The example below selects only the roads CSV file for input during a translation:

CSV_IDs roads

Workbench Parameter

Feature Types to Read

FIELD_NAMES

Required/Optional: Optional

If the field or column names of the CSV table are specified in the file, then set this value to yes and the names will be extracted from the file. Otherwise, the columns of the CSV table are given default names (i.e. col0, col1, ... , colN) with the setting no. The default is no.

Note: If FIELD_NAMES is set to yes, skip_lines should also be set to skip at least one row, or the first row will be also be processed as a feature. You can also set FIELD_NAMES_AFTER_HEADER to yes. See FIELD_NAMES_AFTER_HEADER below for details.

Values: <yes | no>

FIELD_NAMES_AFTER_HEADER

Required/Optional

Optional

If the column/field names is AFTER the header information instead of BEFORE, then you can set FIELD_NAMES_AFTER_HEADER to yes. Otherwise, by default, the first line of the file will be used as the column/field names.

Note: This parameter is ignored if FIELD_NAMES is not set, or it is set to no. If FIELD_NAMES_AFTER_HEADER is set to yes, SKIP_LINES should also be set to skip at least one row, or the first row will be also be processed as a feature.

Values: <yes | no>

SEPARATOR

Required/Optional: Optional

A special field is listed to identify the separator used to divide the fields in the file. By default, a comma is used; however, different one-character separators can also be specified. Tab character separators are indicated by a backslash (\) followed by a “t”; for example:

	CSV_SEPARATOR (\t)

Note: There must be a space between CSV_SEPARATOR and (<separator>). The begin and end parentheses are optional.

Values: (<separator>)

SKIP_LINES

Required/Optional: Optional

This field can be listed to indicate the number of lines to skip at the top of the file. By default, no lines are skipped. Each line skipped is logged to the log file. This is useful if the CSV file contains a header line of field names or other descriptive material that should be skipped.

Values: <number>

Workbench Parameter: Number of Lines to Skip

SKIP_FOOTER

This field can be listed to indicate the number of footer lines to skip at the bottom of the file. By default, no footer lines are skipped. Each footer line skipped is logged to the log file.

This is useful if the file contains footer lines of descriptive material that should be skipped. This directive is ignored if reading the whole file at once. If reading backwards, the lines are skipped from the top of the file instead.

Required/Optional

Optional

Values

<number>

Workbench Parameter

Number of Footer Lines to Skip

STRIP_QUOTES

Required/Optional: Optional

Some CSV files place quotation marks around all values they contain. By setting this special field to yes, then these quotes can be stripped from each attribute. The default is no.

Values: <yes|no>

Workbench Parameter: Strip Quotes from Fields

DUPLICATE_DELIMS

Required/Optional: Optional

This field can be listed to indicate if duplicate delimiters are to be treated as a single delimiter. If set to yes then multiple contiguous delimiters are treated as a single delimiter; otherwise, each delimiter is treated as if it delimits a different field.

Values: <yes|no>

Workbench Parameter: Skip Duplicate Delimiters

SCAN_MAX_FEATURES

Required/Optional

Optional

This field can be listed to indicate the maximum number of rows to scan to obtain both the number of fields and the lengths of fields in the resultant source feature type. This does not affect the preview in the source parameter dialog which uses always scans a fixed amount of rows. If set to a value of 0, the entire file will be read to determine the schema.

Values

<number of lines>

Workbench Parameter

Maximum Lines to Scan (0 for no limit)

EXTENSION

Required/Optional: Optional

This specifies the file extension to be read or written in. The default is .csv.

Values

<.extension> (Include the period (.) in front of the extension name.)

Default

0

SORT_GROUP

This directive defines whether sorting is enabled. If not present, sorting will be enabled if SORT_PARAMS has a value. Otherwise, if this directive is present, then it controls whether or not sorting is enabled.

Required/Optional

Optional

Values

<yes/no>

SORT_PARAMS

This directive defines the fields, methods, and directions to use to sort the CSV dataset. If this parameter is present, features will be produced in the order defined by the parameter values. Once sorting is complete, features will be output from the reader in sorted order.

Required/Optional

Optional

Values

<field name>,<numeric|alphabetic>,<ascending|descending>

Workbench Parameter

Sort

FILTER_GROUP

This directive defines whether filtering is enabled. If not present, filtering will be enabled if FILTER_ATTRS has a value. Otherwise, if this directive is present, then it controls whether filtering is enabled or not

Required/Optional

Optional

Values

<yes/no>

FILTER_ATTRS

This directive defines which fields of the CSV dataset will be used to filter rows on read. The FILTER_REGEX directive specifies the regular expression that will be applied to the fields in FILTER_ATTRS. Depending on the value of the FILTER_UNMATCHED directive, the CSV reader will either ignore rows that match the filter, or only read rows that do match the filter.

Required/Optional

Optional

Values

<field name1> <field name2> …

Workbench Parameter

Field Names

FILTER_UNMATCHED

If FILTER_ATTRS is specified, this directive defines whether the CSV reader will skip, or exclusively read rows that match FILTER_REGEX, or exclusively read the matching rows. If FILTER_UNMATCHED is yes, then the CSV reader will skip any rows that match the FILTER_REGEX for the fields specified in FILTER_ATTRS. If FILTER_UNMATCHED is no, then the CSV reader will only read rows with values of FILTER_ATTRS that match FILTER_REGEX.

Required/Optional

Optional

Values

<yes|no>

Workbench Parameter

Filter Matching

FILTER

If FILTER_ATTRS is specified, this directive defines the regular expression string that will be compared against the values of fields specified in FILTER_ATTRS during filtering.

Required/Optional

Optional

Values

<regular expression string>

Workbench Parameter

Filter

ENCODING

This specifies the file encoding to use when reading.

Required/Optional

Optional

Values

<encoding>

Workbench Parameter

Character Encoding

Encodings

UTF-8

UTF-16
UTF-16LE
UTF-16BE
UTF-32
UTF-32LE
UTF-32BE
ANSI
BIG5
SJIS
CP437
CP708
CP720
CP737
CP775
CP850
CP852
CP855
CP857
CP860
CP861
CP862
CP863
CP864
CP865
CP866
CP869
CP874
CP932
CP936
CP950
CP1250
CP1251
CP1252
CP1253
CP1254
CP1255
CP1256
CP1257
CP1258
ISO8859-1
ISO8859-2
ISO8859-3
ISO8859-4
ISO8859-5
ISO8859-6
ISO8859-7
ISO8859-8
ISO8859-9
ISO8859-13
ISO8859-15
Windows-874
Windows-949
Windows-10000
Windows-10006
Windows-10007
Windows-10029