CSV (Comma Separated Value) Reader Parameters
Dataset Parameters
This parameter allows you to choose different naming schemes, and the number of feature types generated for the reader.
Feature type name choices:
- From File Name(s): Generates one feature type per source filename.
- From Format Name: Produces only a single feature type containing the format name.
Fields
The single character specified as the delimiter between values.
The line number that contains the field names. Note that the first line in the file is considered to be line number 1. If the file does not contain field names, leave this blank.
When the file does not contain field names, the columns of the CSV table are given default names (for example, col0, col1, ..., colN).
The line number at which the data starts. Note that the first line in the file is considered to be line number 1.
Advanced
If selected, multiple contiguous delimiters are treated as a single delimiter; otherwise, each delimiter is treated as if it delimits a different field.
Specifies the character that encloses field values. When a field starts with this character, all text that follows this character and precedes the next occurrence of the character will be treated as one value, even if that text contains a delimiter or newline character.
For example, if the delimiter is a comma (,) and the field qualifier is a quotation mark ("), then the value
"Vancouver, BC"
will be treated as one value
Vancouver, BC
rather than two separate values
Vancouver
BC
Specifies the character that escapes the field qualifier character. This is used when wanting to have a field qualifier literal within a field qualifier group.
For example, if the field qualifier character is a quotation mark (") and the escape character is a backslash (\), then the value
"Vancouver \"Lotusland\", BC"
will be read as
Vancouver "Lotusland", BC
Field Names
Specifies whether the field names should be matched against the schema in a case-sensitive or case-insensitive manner.
For example, suppose the schema contains the attribute "MyField" but the file contains the field "myfield". If field names are case-sensitive, these are considered to not match, and the attribute "MyField" will not be read. On the other hand, if field names are not case-sensitive, these are considered to match, and values from the "myfield" column will be read for attribute "MyField".
Specifies whether to enforce a strict schema.
- If this parameter is set to Yes and the fields in the file do not match the attributes on the schema in FME, the reader will fail.
- If this parameter is set to No, the reader will warn about any attributes that exist on the schema but are not present in the file, and will continue reading.
Field Values
Specifies whether to trim the field qualifier character from values. Note that these characters are only trimmed when they serve as field qualifiers (that is, the first character in the value is this character, until the next instance of that character).
For example, if the field qualifier is a quotation mark ("), then the value
"Vancouver, BC" and "More"
will be read as
Vancouver, BC and "More"
Encoding
This parameter is applicable if you are working with foreign (non-English) character sets. If your source data contains foreign characters, using this parameter along with the encoding value ensures that the original data is preserved from the reader to the writer.
By default, the character encoding will be automatically detected from the source file if there is a Byte Order Mark (BOM) present in the source file. Otherwise, all input strings will be encoded in the system encoding. If you select any other character encoding, it will take precedence over the automatically detected character encoding.
Note that only UTF encodings are identified using the BOM – all other character sets must be explicitly identified or they will be read as system encoding.
FME supports most encodings.
Specifies whether string attributes will be set in the file encoding.
- Yes: String attributes will always be in the encoding of the file.
- No: String attributes may be in the file encoding, but may also be in a Unicode encoding. Setting this parameter to No may improve performance when reading from an encoded file.
Skipped Lines
Specifies whether to read lines from the file that occur before the data start line. (Note: The field name's line is never read as a feature.)
If set to Yes, the reader will produce features for these lines, where the attribute csv_skipped_lineis set to the content of that line.
If the field structure of the first several rows of a file is representative of the remainder of the file, this option can be set to prevent FME from unnecessarily reading further rows from a potentially large file when determining its schema.
If left blank, there will be no limit and all rows will be read.
Note: This setting only applies to the schema generation; it does not limit the number of rows read when the translation is run.
Specifies whether to try to determine the types of attributes when scanning for schema.
- No – All attributes will be treated as strings.
- Yes (default) – FME will attempt to determine the correct type for each attribute (for example, int32, real64, etc.). (For more information, see Usage Notes.)
Using properly typed attributes may improve reading and processing performance. However, if an attribute value is not valid for a scanned type (for example, because the value was not included when scanning for schema), it will be set to null.
When scanning for types, FME will also attempt to automatically map fields to coordinates (for example, a field named x will be given a type of x_coordinate).
Specifies whether to scan for additional fields, beyond those found in the field names row.
- Yes: FME will attempt to find additional fields that aren’t included in the field names row.
- No: The field names row is assumed to contain all fields in the file.
This option has no effect when the file does not contain field names.
Preview
Shows a preview of the input CSV dataset, as read with the current options.
Attributes
Shows the schema of the dataset, as read with the current options:
Read | Name | Type |
---|---|---|
Whether to read this field as an attribute. | The name of the attribute. | The type of the attribute. |
Use this parameter to expose Format Attributes in Workbench when you create a workspace:
- In a dynamic scenario, it means these attributes can be passed to the output dataset at runtime.
- In a non-dynamic scenario, this parameter allows you to expose additional attributes on multiple feature types. Click the browse button to view the available format attributes (which are different for each format) for the reader.