# Amazon Redshift

## Summary

Instructions for granting Amazon Redshift access and setting up Amazon Redshift as a destination in Daton.&#x20;

## Amazon Redshift Setup Information

Setting up access to your [Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/managing-clusters-console.html) data warehouse is not very complicated and just takes a couple of simple steps. If you don't have an existing account on AWS, you can sign up for a free tier [here](https://aws.amazon.com/free/).

### Step 1 - Grant Cluster Access

Instructions to grant Redshift cluster access and (optionally) creating a user and schema for setting up Redshift as a destination in Daton:

* &#x20;Access your [Amazon Redshift Console](https://console.aws.amazon.com/redshift/).
* &#x20;Click *Clusters* from the top left sidebar.

![](https://15515196-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LbJ1eMq5M51cIpU163R%2F-LbJI4L6RMiyMuoh3YHS%2F-LbJKUdXJHUUVIte8iuM%2Fredshiftclustersidebar.png?alt=media\&token=5f17ea61-c510-483a-b0e7-a762406227ca)

* Click on the specific Cluster.

![](https://15515196-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LbJ1eMq5M51cIpU163R%2F-LbJI4L6RMiyMuoh3YHS%2F-LbJKKrHKVDw2By0oAjo%2Fredshiftclickcluster.png?alt=media\&token=c784b02f-9591-4418-af40-c818ac8a4dd4)

* In the Configuration tab of the details page, note down the Redshift Hostname and Port from the Endpoint. This will be used on the Daton integration page.
* In the Configuration tab of the details page, under the VPC security groups, click the name of the security group.

![](https://15515196-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LbJ1eMq5M51cIpU163R%2F-LbJKhklFjC8yhRqecHX%2F-LbJKqc8uJjYEQsIHKY6%2Fredshiftvpcsecuritygroups.png?alt=media\&token=93c6583a-8089-44b0-8f4d-fc33d652d937)

* On the details page, click the Inbound tab, and click Edit.

![](https://15515196-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LbJ1eMq5M51cIpU163R%2F-LbJKhklFjC8yhRqecHX%2F-LbJLEvY59l-aczu5Xeh%2Fredshiftvpcsecuritygroupdetails.png?alt=media\&token=214680e1-11e7-4c29-a89a-b6993a77aeb6)

#### IP Whitelisting

* Click Add Rule and white list the IP Addresses below.
  * 35.238.200.205/32

![White list Daton IP addresses on Amazon Redshift](https://15515196-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LbJ1eMq5M51cIpU163R%2F-Lrhw6cD4XXYA-CfL5_U%2F-LrhwgNuwZNI7t_6GXgG%2FRedshift_security_group.jpg?alt=media\&token=7d51a279-188c-4514-aa6d-03f40480d77e)

#### Make Instance Publicly Accessible&#x20;

Click on Actions and Modify publicly accessible setting and then set the instance to be publicly accessible

![](https://15515196-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LbJ1eMq5M51cIpU163R%2F-MgRyxIYyN1_5itaqj-X%2F-MgRz4kfAqqvInlxc43R%2F2021-08-06%20at%203.40%20PM.png?alt=media\&token=c514284a-f10b-4201-9aea-0935d6c98686)

### **Step 2 - Create a User and Schema for Daton**

To enable Daton to write to a specific dataset, create a user for the relevant Amazon Redshift Dataset:

* Access your [Amazon Redshift Console](https://console.aws.amazon.com/redshift/).

  Click the Query editor and run the below commands to grant privileges to Daton. Change the user, password, and db-name per your choice.

![](https://15515196-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LbJ1eMq5M51cIpU163R%2F-LbJKhklFjC8yhRqecHX%2F-LbJMzBA7STHlaBqxusX%2Fredshiftqueryeditor.png?alt=media\&token=8ea7c931-399a-4537-8ca5-206965e1bffc)

```
    CREATE USER daton WITH PASSWORD 'yourpassword';
    CREATE SCHEMA daton AUTHORIZATION daton;
    GRANT CREATE ON DATABASE db-name TO daton;
```

That's it, you're done configuring access — time to connect to Amazon Redshift.

### **Step 3 - JDBC URL and other configuration parameters**

* Click *Clusters* from the left menu and click on the cluster name to view the cluster details.

![Open the Redshift Cluster](https://15515196-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LbJ1eMq5M51cIpU163R%2F-Lrh_qz9lBPma_3-15KE%2F-Lrhce9lJU9dKBYXKAA5%2FRedshift_JDBC1.jpg?alt=media\&token=1c17b6da-f852-4ad2-bd4b-5a2944925bc7)

* Scroll down to see the JDBC URL details and note them. This URL is required along with the database username and password while integrating Redshift on Daton.

![Amazon Redshift JDBC URL](https://15515196-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LbJ1eMq5M51cIpU163R%2F-Lrh_qz9lBPma_3-15KE%2F-LrhdY16HpLZSL75WDz0%2FRedshift_JDBC2.jpg?alt=media\&token=61bf1323-7f5d-455d-9c3e-f6eef2d6adea)

### **Step 4 - Optional SSH Tunnel Configuration**

* Copy the below SSH public key and use it while creating the [SSH rule](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/authorizing-access-to-an-instance.html#add-rule-authorize-access) for the Security Group.

![Daton - Public Key for SSH Tunnel](https://15515196-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LbJ1eMq5M51cIpU163R%2F-M2DM4XG4JIgxJ50_1q2%2F-M2DPIYiZpyaeI582UtX%2Fimage.png?alt=media\&token=8a726356-a06d-42f2-8f1d-378a220deaf3)

## Daton Setup Information

* Login to Daton [web-app](https://daton.sarasanalytics.com/).
* Click Data Sources & select Choose Warehouse.
* Click Amazon Redshift & enter any integration name.
* Enter the JDBC URL, database username, and password from the above steps.
* Click Next, if successfully connected, select the schema from the options.
* Click Finish to complete the setup.

## Replication Guide

Daton has been built to handle multiple data warehouses. Since all data warehouses are not built the same, the choice of a data warehouse may impact how Daton loads data. This section tables some key capabilities and limitations of RedShift that impact how data looks in your data warehouse.

<table><thead><tr><th width="190">Feature</th><th width="113">Supported</th><th>Comments</th></tr></thead><tbody><tr><td>Nested Fields</td><td>Yes</td><td></td></tr><tr><td>Primary Keys</td><td>Yes</td><td></td></tr><tr><td>Updates</td><td>No</td><td>Redshift doesnt allow updating while insert mechanism. Instead, Daton will append data to the end of the table with every replication run.</td></tr><tr><td>Full table replication</td><td>Yes</td><td></td></tr><tr><td>Column based</td><td>No</td><td></td></tr><tr><td>Inserts only</td><td>Yes</td><td></td></tr><tr><td>SSH Connections</td><td>Yes</td><td></td></tr><tr><td>Adding columns to existing integration</td><td>Yes</td><td>When new columns are added to an existing table in Amazon Redshift, user is given two options. Depending on their choice, the new column(s) is added and entire table is re-loaded or there is no re-load. Read the edit integration (proceed link) functionality to understand the impact of making changes.</td></tr><tr><td>Removing columns to existing integration</td><td>Yes</td><td>Users can use the edit integration functionality</td></tr><tr><td>Adding tables to existing integration</td><td>No</td><td>When tables are deselected from an integration, the loads into those tables stop. Daton doesn't delete the tables from the data warehouse. This is great for cases where usage of a source has been discontinued, but there is a desire to capture that data.</td></tr><tr><td>Table Partitioning</td><td>Yes</td><td></td></tr></tbody></table>

## Querying Daton created tables on RedShift

The redShift data warehouse does not support nested tables. Hence, Daton creates separate 'child' tables for each nested data present in the source schema(or parent table).&#x20;

#### At present Daton creates separate child tables up to three levels.&#x20;

For example, if data is present in *Listrecommendations.Fulfillmentrecommendations.Member.Itemdimensions.Dimensiondescription*then the following tables would be created -&#x20;

1. `Listrecommendations` (Parent Table)
2. `Listrecommendations_Fulfillmentrecommendations` (Level 1 Child)
3. `Listrecommendations_Fulfillmentrecommendations_Member` (Level 2 Child)
4. `Listrecommendations_Fulfillmentrecommendations_Member_Itemdimensions` (Level 3 Child)

Any nested data present in Itemdimesions would be stored in *text* and any excess data which does not fit into that field would be truncated.

#### Data Mapping

Data between parent and child tables can be mapped or combined using *`daton_parent_batch_id`* and *`daton_batch_runtime`* meta fields. The same applies to other levels of child tables.&#x20;

For example, if the API response for `ListOrderItems` contains data in the following structure,

| Column Name            | Datatype |
| ---------------------- | -------- |
| AmazonOrderId          | STRING   |
| ASIN                   | STRING   |
| SellerSKU              | STRING   |
| OrderItemId            | STRING   |
| ItemPrice              | RECORD   |
| ItemPrice.CurrencyCode | STRING   |
| ItemPrice.Amount       | STRING   |

then the following tables would be created by Daton on RedShift:

**`Table 1:`**` ``ListOrderItem (Parent Table)`

| Column Name           | Datatype |
| --------------------- | -------- |
| AmazonOrderId         | STRING   |
| ASIN                  | STRING   |
| SellerSKU             | STRING   |
| OrderItemId           | STRING   |
| daton\_user\_id       | NUMERIC  |
| daton\_batch\_runtime | NUMERIC  |
| daton\_batch\_id      | NUMERIC  |

**`Table 2:`**` ``ListOrderItem_ItemPrice (Level 1 Child Table)`

| Column Name              | Datatype |
| ------------------------ | -------- |
| CurrencyCode             | STRING   |
| Amount                   | STRING   |
| daton\_user\_id          | NUMERIC  |
| daton\_batch\_runtime    | NUMERIC  |
| daton\_batch\_id         | NUMERIC  |
| daton\_parent\_batch\_id | NUMERIC  |

In order to fetch the Item price for an `AmazonOrderId`*,* `daton_batch_id` and `daton_batch_runtime` must be used for mapping records

```
select  Amount from listorderitems_itemprice
where 
daton_parent_batch_id = (select daton_batch_id from mws_listorderitems where amazonorderid = 'XXX-7370285-XXXX')
and
daton_batch_runtime = (select daton_batch_runtime from mws_listorderitems where amazonorderid = 'XXX-7370285-XXXX')
```
