Amazon Redshift ETL connector for data replication
Instructions for granting Amazon Redshift access and setting up Amazon Redshift as a destination in Daton.
Instructions to grant Redshift cluster access and (optionally) creating a user and schema for setting up Redshift as a destination in Daton.
- Click on the specific Cluster.
- In the Configuration tab of the details page, note down the Redshift Host name and Port from the Endpoint. This will be used in the Daton integration page.
- In the Configuration tab of the details page, under the VPC security groups, click the name of security group.
- In the details page, click the Inbound tab, click Edit.
- Click Add Rule and white list the IP Addresses below.
White list Daton IP addresses on Amazon Redshift
Click on Actions and Modify publicly accessible setting and then set the instance to be publicly accessible
To enable Daton to write to a specific dataset, create a user for the relevant Amazon Redshift Dataset:
CREATE USER daton WITH PASSWORD 'yourpassword';
CREATE SCHEMA daton AUTHORIZATION daton;
GRANT CREATE ON DATABASE db-name TO daton;
That's it, you're done configuring access — time to connect to Amazon Redshift.
- Click Clusters from left menu and click on the cluster name to view the cluster details
Open the Redshift Cluster
- Scroll down to see the JDBC URL details and note them. This URL is required along with database username and password while integrating Redshift on Daton
Amazon Redshift JDBC URL
Daton - Public Key for SSH Tunnel
- Click Data Sources & select Choose Warehouse
- Click Amazon Redshift & enter any integration name
- Enter the JDBC URL, database username, and password from above steps
- Click Next, if successfully connected, select the schema from the options
- Click Finish to complete setup
Daton has been built to handle multiple data warehouses. Since all data warehouses are not built the same, the choice of a data warehouse may impact how Daton loads data. This section tables about some key capabilities and limitations of RedShift that impact how data looks in your data warehouse.
RedShift data warehouse does not support nested tables. Hence, Daton creates separate 'child' tables for each nested data present in source schema(or parent table).
For example, if data is present in Listrecommendations.Fulfillmentrecommendations.Member.Itemdimensions.Dimensiondescriptionthen the following tables would be created -
- 1.Listrecommendations (Parent Table)
- 2.Listrecommendations_Fulfillmentrecommendations (Level 1 Child)
- 3.Listrecommendations_Fulfillmentrecommendations_Member (Level 2 Child)
- 4.Listrecommendations_Fulfillmentrecommendations_Member_Itemdimensions (Level 3 Child)
Any nested data present in Itemdimesions would be stored at text and any excess data which does not fit into that field would be truncated.
Data between parent and child tables can be mapped or combined using daton_parent_batch_id and daton_batch_runtime meta fields. The same applies for other levels of child tables.
For example, if API response for ListOrderItems contains data in the following structure
then the following tables would be created by Daton on RedShift
ListOrderItem (Parent Table)
ListOrderItem_ItemPrice (Level 1 Child Table)
In order to fetch Item price for an AmazonOrderId, daton_batch_id and daton_batch_runtime must be used for mapping records
select Amount from listorderitems_itemprice
daton_parent_batch_id = (select daton_batch_id from mws_listorderitems where amazonorderid = 'XXX-7370285-XXXX')
daton_batch_runtime = (select daton_batch_runtime from mws_listorderitems where amazonorderid = 'XXX-7370285-XXXX')