Amazon Redshift Spatial Reader/Writer
FME can read and write Redshift spatial data.
Redshift is a petabyte-scale managed database, provided by Amazon Web Services. Redshift is based on PostgreSQL 8.0.2, an Object-Relational Database Management System (ORDBMS) that stores attribute information.
For maximum throughput, the Redshift reader/writer communicates directly with the Redshift database via the libpq interface.
This chapter assumes familiarity with Redshift, SQL, the attribute types supported, and its indexing mechanisms.
For more information, please see the Amazon Redshift website.
Reader Overview
FME considers a Redshift dataset to be a database containing a collection of relational tables together with their corresponding geometries. The tables must be defined before they can be read.
Arbitrary WHERE clauses and joins are fully supported, as well as an entire arbitrary SQL SELECT statement; however, the user then assumes responsibility for the correctness of the statement or clause, including quoting where necessary.
When reading from the Redshift database, each table is considered a feature type in FME, and each row of a table at least one feature in FME.
Reading Process
The basic reading process involves opening a connection to the database, querying metadata, and querying data. The data is read using a series of select statements and rows are fetched to the client machine in batches of 10000 by default.
Tables and Columns
Truncation – Table and column names are truncated at 64 characters. If duplicate names are produced by truncation, the behavior is undetermined. Please ensure that table names comply with Redshift naming conventions.
Naming – Spaces and special characters are permissible in both table and column names. Case sensitivity has also been implemented, so table and column names are no longer changed to lowercase. Qualified table naming is supported in the form <schemaname>.<tablename>. The schema search path is read and interpreted to determine a user’s default schema when writing and the available schema to read from when reading. Failing a valid schema search path, the default public schema will be used for newer databases.
Encoding – UNICODE support works with a client’s system encoding. Although there is no way to specify the encoding explicitly, the client is assumed to have entered data and created tables and columns in the encoding of their operating system. Multiple system encodings are now supported via the native PostgreSQL conversions between client and server, particularly if the server encoding is set to UNICODE.
Writer Overview
The Redshift Spatial writer module stores attributes into a Redshift database. Note that attributes are always written as lowercase.
The writer provides the following capabilities:
- Transaction Support – The Redshift writer provides transaction support that eases the data loading process. Occasionally, a data load operation terminates prematurely due to data difficulties. The transaction support provides a mechanism for reloading corrected data without data loss or duplication.
- Bulk Loading – The Redshift writer uses a multi-row insert technique to ensure speedy data load.