CSV (Comma Separated Value) Reader Parameters
Dataset Parameters
This parameter allows you to choose different naming schemes, and the number of feature types generated for the reader.
Feature type name choices:
- From File Name(s): Generates one feature type per source filename.
- From Format Name: Produces only a single feature type containing the format name.
Fields
The single character specified as the delimiter between values.
The line number that contains the field names. Note that the first line in the file is considered to be line number 1. If the file does not contain field names, leave this blank.
When the file does not contain field names, the columns of the CSV table are given default names (for example, col0, col1, ..., colN).
The line number at which the data starts. Note that the first line in the file is considered to be line number 1.
Advanced
If selected, multiple contiguous delimiters are treated as a single delimiter; otherwise, each delimiter is treated as if it delimits a different field.
Specifies the character that encloses field values. When a field starts with this character, all text that follows this character and precedes the next occurrence of the character will be treated as one value, even if that text contains a delimiter or newline character.
For example, if the delimiter is a comma (,) and the field qualifier is a quotation mark ("), then the value
"Vancouver, BC"
will be treated as one value
Vancouver, BC
rather than two separate values
Vancouver
BC
Specifies the character that escapes the field qualifier character. This is used when wanting to have a field qualifier literal within a field qualifier group.
For example, if the field qualifier character is a quotation mark (") and the escape character is a backslash (\), then the value
"Vancouver \"Lotusland\", BC"
will be read as
Vancouver "Lotusland", BC
Field Names
Specifies whether the field names should be matched against the schema in a case-sensitive or case-insensitive manner.
For example, suppose the schema contains the attribute "MyField" but the file contains the field "myfield":
- If field names are case-sensitive, these are considered to not match, and the attribute "MyField" will not be read.
- If field names are not case-sensitive, these are considered to match, and values from the "myfield" column will be read for attribute "MyField".
Specifies whether to enforce a strict schema.
- If this parameter is set to Yes and the fields in the file do not match the attributes on the schema in FME, the reader will fail.
- If this parameter is set to No, the reader will warn about any attributes that exist on the schema but are not present in the file, and will continue reading.
Field Values
Specifies whether to trim the field qualifier character from values. Note that these characters are only trimmed when they serve as field qualifiers (that is, the first character in the value is this character, until the next instance of that character).
For example, if the field qualifier is a quotation mark ("), then the value
"Vancouver, BC" and "More"
will be read as
Vancouver, BC and "More"
Specifies whether to read empty values as Null or Missing in FME.
Historically, for string field types, FME read empty values as empty strings; and for numeric field types, FME read empty values as Null. This behavior will continue in workspaces created before FME 2021. In newer (FME 2021+) workspaces containing readers with this option, both string and numeric fields containing empty values will be read as either Null or Missing, as specified.
Note: This affects expected values for known columns. Additional fields containing empty data values will continue to be treated as Missing.
Encoding
This parameter is applicable if you are working with extended (not basic ASCII) character sets. If your source data contains non-ASCII characters, using this parameter along with the encoding value ensures that the original data is preserved from the reader to the writer.
By default, the character encoding will be automatically detected from the source file if there is a Byte Order Mark (BOM) present in the source file. If you select any other character encoding, it will override the automatically detected character encoding.
Note that only UTF encodings are identified using the BOM – all other character sets must be explicitly identified or they will be read as system encoding. (System encoding is dependent on your computer's operating system locale setting.)
FME supports most encodings.
Specifies whether string attributes will be set in the file encoding.
- Yes: String attributes will always be in the encoding of the file.
- No: String attributes may be in the file encoding, but may also be in a Unicode encoding. Setting this parameter to No may improve performance when reading from an encoded file.
Skipped Lines
Specifies whether to read lines from the file that occur before the data start line. (Note: The field name's line is never read as a feature.)
If set to Yes, the reader will produce features for these lines, where the attribute csv_skipped_lineis set to the content of that line.
If the field structure of the first several rows of a file is representative of the remainder of the file, this option can be set to prevent FME from unnecessarily reading further rows from a potentially large file when determining its schema.
If left blank, there will be no limit and all rows will be read.
Note: This setting only applies to the schema generation; it does not limit the number of rows read when the translation is run.
Specifies whether to try to determine the types of attributes when scanning for schema.
- No – All attributes will be treated as strings.
- Yes (default) – FME will attempt to determine the correct type for each attribute (for example, int32, real64, etc.). (For more information, see Usage Notes.)
Using properly typed attributes may improve reading and processing performance. However, if an attribute value is not valid for a scanned type (for example, because the value was not included when scanning for schema), it will be set to Null or Missing, depending on the value of the Read Empty Values As parameter.
When scanning for types, FME will also attempt to automatically map fields to coordinates (for example, a field named x will be given a type of x_coordinate).
Specifies which attribute type to scan for:
- Standard Types (default) – Standard numeric types.
- Explicit Width and Precision – Number types with an explicit width and precision.
Specifies which attribute type to scan for:
- Explicit Width and Precision(default) – String types with an explicit width and precision
- Standard Types – Standard string types.
The input format string from which to detect and create FME-formatted date, time, and datetime attributes.
When Scan for Types is set to Yes, this option is enabled.
Specify a format string inline, or specify an attribute that contains a format string.
The default is set to ISO (auto detect), and there is a list of additional presets to choose from.
For information on the presets <Auto detect FME and ISO formats>, FME (auto detect), and ISO (auto detect), please see the documentation about FME and ISO in the topic Format String Flags and Examples.
Specifies whether FME should scan for additional fields, beyond those found in the field names row.
- Yes – FME will attempt to find additional fields that aren’t included in the field names row.
- No – The field names row is assumed to contain all fields in the file.
This option has no effect when the file does not contain field names. If FME does not scan for additional fields and extra data is found beyond the defined fields, these data values will be Missing (that is, not present as attributes on the output data features).
Preview
Shows a preview of the input CSV dataset, as read with the current options.
Attributes
Shows the schema of the dataset, as read with the current options:
Read | Name | Type |
---|---|---|
Whether to read this field as an attribute. |
The name of the attribute. |
The type of the attribute. |
Use this parameter to expose Format Attributes in FME Workbench when you create a workspace:
- In a dynamic scenario, it means these attributes can be passed to the output dataset at runtime.
- In a non-dynamic scenario, this parameter allows you to expose additional attributes on multiple feature types. Click the browse button to view the available format attributes (which are different for each format) for the reader.