Google Cloud Storage

Google Cloud Storage ETL connector for data replication

Snapshot

Instructions to Integrate GCS with Daton

Pre-requisites

  1. To integrate GCS with Daton, you will need an active Project in the Connector. Create a project if not already created:

    • Sign up for Google Cloud Storage. Read more about the managed service here.

    • Create a project, necessary buckets, folders, and subfolders. Click here to read the instructions to follow.

    Read our stepwise guide on How to set up Google BigQuery Project.

  2. Provide Storage Object Viewer permissions to the daton-bigquery@daton-210514.iam.gserviceaccount.com service account. Click here to read more about GCP IAM.

  3. Ensure that there is at least one file in the required format at the storage location and it has at least 1000 records with valid data. Daton uses this file as a sample for generating the schema automatically.

  4. Daton supports only the following DateTime formats. Dates will be processed as STRINGS if the format is not in the below-listed formats

    YYYY-MM-DD / HH:MM:SS.

  5. Each folder must have files in the same format. If files with either a different file format or different file type are present then data extracts will throw exceptions and data would not be processed. Please note that you can create separate integrations for different file types and formats.

Integrate GCS with Daton

  1. Sign in to Daton.

  2. Select Google Cloud Storage from the list of Integrations.

  3. Provide the necessary Integration Name, Replication Frequency, and Replication Start Date data and click on the 'Authenticate' button. Please Note, the Integration Name would be used in creating tables for the integration and cannot be changed later.

  4. Authenticate with your GCS Username and Password.

  5. Provide the following details for the integration: File Path: GCP Project ID/Storage Bucket Id/Folder Name File Type: We presently support only CSV format Number of Header Rows: We do not support nested headers

  6. Review and update Column Names and Data Types from the auto-generated schema mapper.

  7. Submit the integration.

Workflow

  1. Integrations would be in the Pending state initially and will be moved to the Active state as soon as the first job loads data successfully onto the configured warehouse.

  2. Users can Re-Authenticate, Edit, Clone, Pause or Delete the integration at any given time by clicking on the settings You can also edit the integration to change frequency and history.

  3. Users can view job status and process logs from the integration details page by clicking on the integration name from the active list.

  4. Files to be processed would be identified with the modified dates of files. If a few records in an existing file get updated, then the entire file would be processed again.

  5. Files present in subfolders from the selected location would not be processed.

  6. If new fields are to be added to the files, users must append them to the file, and edit the existing integration in Daton by appending the additional fields for the integration. Daton doesn't support intermittent field additions to a file.

Error Handling

  1. If the Date format used in files does not fall in the above-mentioned list, then they would be processed as strings

  2. Datatypes cannot be changed after initiating the integration.

Last updated