> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Getting Started with Redshift and SDF

> SDF as a best-in-class transformation layer for Redshift

## Overview

This guide will walk you through the steps of creating a Redshift integration. By the end, you will be able to reference remote Redshift tables in SDF as if they were locally defined!

SDF will use remote Redshift schema information to do type bindings, column level lineage, and other static analysis checks on your workspace.

Redshift catalog, schema, and tables map 1:1 to SDF's catalog.schema.table model.

<img src="https://cdn.sdf.com/docs/redshift_warehouse_structure.png" alt="drawing" width="500" />

## Guide using an IAM access\_key and secret\_key

<Steps>
  <Step title="Collect Required Information">
    To connect to Redshift via IAM credentials, you need a valid keypair with the minimum credentials to read schema information in your redshift instance.

    * `access_key` the access key of your IAM profile
    * `secret_access_key` the secret key of your IAM profile
    * `CLUSTER_IDENTIFIER` - The Redshift cluster you want to connect to

    <Note>
      Run `sdf auth login aws --help` to see all login options.
    </Note>

    If you already have an IAM role, you may use that one. Or, simply create a new IAM user. The below is the minimum viable allow ARN policy.

    ```json theme={null}
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "Statement1",
                "Effect": "Allow",
                "Action": [
                    "redshift:ListDatabases",
                    "redshift:ListSchemas",
                    "redshift:ListTables",
                    "redshift:ListSavedQueries",
                    "redshift:DescribeQuery",
                    "redshift:DescribeSavedQueries",
                    "redshift:DescribeTable",
                    "redshift:DescribeTableRestoreStatus",
                    "redshift:FetchResults",
                    "redshift:CreateSavedQuery",
                    "redshift:ExecuteQuery",
                    "redshift:GetClusterCredentials",
                    "redshift:GetClusterCredentialsWithIAM"
                ],
                "Resource": [
                    "arn:aws:redshift:{region}:{account-number}:dbname:{cluster-identifier}/{dbname}"
                ]
            },
            {
                "Sid": "Statement2",
                "Effect": "Allow",
                "Action": [
                    "redshift-data:DescribeStatement",
                    "redshift-data:DescribeTable",
                    "redshift-data:GetStatementResult",
                    "redshift-data:ListDatabases",
                    "redshift-data:ListSchemas",
                    "redshift-data:ExecuteStatement",
                    "redshift-data:BatchExecuteStatement",
                    "redshift-data:CancelStatement",
                    "redshift-data:ListStatements",
                    "redshift-data:ListTables"
                ],
                "Resource": [
                    "*"
                ]
            }
        ]
    }
    ```
  </Step>

  <Step title="Register the User in the AWS CLI">
    The newly created user needs to be registered with the AWS CLI. You can optionally provide a profile name. As default region, please use the AWS region that your redshift instance is located in.

    The example below configures a profile called *redshift*

    ```shell theme={null}
    aws configure --profile redshift  
    ```
  </Step>

  <Step title="Connect SDF to AWS">
    Connect SDF to AWS by telling SDF which of your AWS profiles to use.

    ```shell theme={null}
    sdf auth login aws --profile redshift --default-region <REGION>
    ```

    <Info>
      This will create a new [credential](/reference/sdf-yml#block-credential) in a `~/.sdf/` directory in the root of your system. This credential will be used to authenticate with AWS services. By default, the
      credential's name is default. As such, the credential does not need to be explicitly referenced in the integrations configuration below.
    </Info>

    To validate the connection, run:

    ```shell theme={null}
    sdf auth status
    ```
  </Step>

  <Step title="Add Redshift Provider in Workspace">
    Once authenticated, add an integrations block in your `workspace.sdf.yml` workspace block. This tells SDF to use Redshift to hydrate missing table schemas.

    ```yml theme={null}
    ---
    workspace:
        ...
        integrations:
            - provider: redshift
              type: database
              cluster-identifier: <CLUSTER_IDENTIFIER>
              sources: 
                - pattern: <DATABASE>.*.*
    ```

    Replace `<DATABASE>` with the name of the database you want to hydrate. Note this is configurable and can be changed to any database you have access to. For example, if I wanted SDF to pull from two databases, called `db1` and `db2`, I would write:

    ```yml theme={null}
    ---
    workspace:
        ...
        integrations:
            - provider: redshift
              type: database
              cluster-identifier: <CLUSTER_IDENTIFIER>
              sources: 
                - pattern: <DATABASE>.*.*
                - pattern: <DATABASE>.*.*
    ```

    Lastly, integrations can also be configured to use specific credentials by referencing the credential name in the integration block. For example, if you wanted to use a credential called `redshift-creds`, you would write:

    ```yml theme={null}
    workspace:
        ...
        integrations:
            - provider: redshift
              type: database
              credential: redshift-creds
              cluster-identifier: <CLUSTER_IDENTIFIER>
              sources: 
                - pattern: db1.schema1.my_table
                - pattern: db1.schema2.*
    ```

    For more information on provider configuration, see the [integration documentation](/guide/setup/integrations).
  </Step>

  <Step title="Try it out!">
    Now that you're connected, let's make sure SDF can pull the schema information it needs.

    Run `sdf compile -q "select * from <DATABASE>.<SCHEMA>.<TABLE>" --show all`

    If the connection is successful, you will see the schema for the table you selected.
  </Step>
</Steps>

<Warning>
  The ARN you created also needs user access to the database. Depending on your DBs Access Policy, you might also have to give explicit access to
  this new user within your Redshift cluster. You can do this through by adding the ARN user to the appropriate redshift group:

  ```sql theme={null}
  ALTER GROUP <your-group-name> add user "IAMR:<IAM_ROLE_NAME>";
  ```
</Warning>

## Troubleshoot Resources

1. **Seeing errors like this `ERROR: permission denied for schema schema1` from `sdf run`?**

   This may be due to the Redshift user not having the correct permissions to this schema.
   To check if you have the right access, you can run the following query.

   Use `SVV_SCHEMA_PRIVILEGES` to view the schema owner, and schema\_acl which shows the access control list for the schema.

   ```sql theme={null}
   SELECT schema_owner, schema_acl FROM SVV_ALL_SCHEMAS WHERE schema_name = 'schema1';
   ```

   If your user does not have access to the schema, you can grant access to the schema by running the following query.
   To run this query, you either have to be a superuser or the owner of the schema.

   ```sql theme={null}
   GRANT CREATE ON SCHEMA schema1 TO <username>;
   ```

   The `<username>` is the user name of your IAM user, with `IAM:` prefix. For example, if your IAM user name is `my_user`, you can run:

   ```sql theme={null}
   GRANT CREATE ON SCHEMA schema1 TO "IAM:my_user";
   ```