Amazon Redshift Reader/Writer
FME can read and write Redshift attribute data.
Redshift is a petabyte-scale managed database, provided by Amazon Web Services. Redshift is based on PostgreSQL 8.0.2, an Object-Relational Database Management System (ORDBMS) that stores attribute information.
The Redshift reader/writer is specifically designed to handle the attribute portion of the data in the database. For maximum throughput, the Redshift Reader/Writer communicates directly with the Redshift database via the libpq interface.
This chapter assumes familiarity with Redshift, SQL, the attribute types supported, and its indexing mechanisms.
For more information, please see the Amazon Redshift website.
Reader Overview
FME considers a Redshift dataset to be a database containing a collection of relational tables together with their corresponding geometries. The tables must be defined in the workspace before they can be read.
Arbitrary WHERE clauses and joins are fully supported, as well as an entire arbitrary SQL SELECT statement; however, the user then assumes responsibility for the correctness of the statement or clause, including quoting where necessary.
When reading from the Redshift database, each table is considered a feature type in FME and each row of a table at least one feature in FME.
Features read from Redshift consist of a series of attribute values only and no geometry. The feature type of each feature is as defined in its Feature Type parameters, but the only type used is redshift_none.
Underlying Redshift geometries are not read as geometries, but are interpreted as strings.
Reading Process
The basic reading process involves opening a connection to the database, querying metadata, and querying data. The data is read using a series of select statements and rows are fetched to the client machine in batches of 10000 by default.
Tables and Columns
- Truncation – Table and column names are truncated at 64 characters. If duplicate names are produced by truncation, the behavior is undetermined. Please ensure that table names comply with Redshift naming conventions.
- Naming – Spaces and special characters are permissible in both table and column names. Case sensitivity has also been implemented, so table and column names are no longer changed to lowercase. Qualified table naming is supported in the form <schemaname>.<tablename>. The schema search path is read and interpreted to determine a user’s default schema when writing and the available schema to read from when reading. Failing a valid schema search path, the default public schema will be used for newer databases.
- Encoding – UNICODE support works with a client’s system encoding. Although there is no way to specify the encoding explicitly, the client is assumed to have entered data and created tables and columns in the encoding of their operating system. Multiple system encodings are now supported via the native conversions between client and server, particularly if the server encoding is set to UNICODE.
Writer Overview
The Redshift writer module stores attributes into a Redshift database. Note that attributes are always written as lowercase.
The Redshift writer provides the following capabilities:
- Transaction Support – The writer provides transaction support that eases the data loading process. Occasionally, a data load operation terminates prematurely due to data difficulties. The transaction support provides a mechanism for reloading corrected data without data loss or duplication.
- Bulk Loading – The writer uses a multi-row insert technique to ensure speedy data load.