Overview

This guide will walk you through the steps of creating a Redshift integration. By the end, you will be able to reference remote Redshift tables in SDF as if they were locally defined!

SDF will use remote Redshift schema information to do type bindings, column level lineage, and other static analysis checks on your workspace.

Redshift catalog, schema, and tables map 1:1 to SDF’s catalog.schema.table model.

drawing

Guide using an IAM access_key and secret_key

1

Collect Required Information

To connect to Redshift via IAM credentials, you need a valid keypair with the minimum credentials to read schema information in your redshift instance.

  • access_key the access key of your IAM profile
  • secret_access_key the secret key of your IAM profile
  • CLUSTER_IDENTIFIER - The Redshift cluster you want to connect to

Run sdf auth login aws --help to see all login options.

If you already have an IAM role, you may use that one. Or, simply create a new IAM user. The below is the minimum viable allow ARN policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Action": [
                "redshift:ListDatabases",
                "redshift:ListSchemas",
                "redshift:ListTables",
                "redshift:ListSavedQueries",
                "redshift:DescribeQuery",
                "redshift:DescribeSavedQueries",
                "redshift:DescribeTable",
                "redshift:DescribeTableRestoreStatus",
                "redshift:FetchResults",
                "redshift:CreateSavedQuery",
                "redshift:ExecuteQuery",
                "redshift:GetClusterCredentials",
                "redshift:GetClusterCredentialsWithIAM"
            ],
            "Resource": [
                "arn:aws:redshift:{region}:{account-number}:dbname:{cluster-identifier}/{dbname}"
            ]
        },
        {
            "Sid": "Statement2",
            "Effect": "Allow",
            "Action": [
                "redshift-data:DescribeStatement",
                "redshift-data:DescribeTable",
                "redshift-data:GetStatementResult",
                "redshift-data:ListDatabases",
                "redshift-data:ListSchemas",
                "redshift-data:ExecuteStatement",
                "redshift-data:BatchExecuteStatement",
                "redshift-data:CancelStatement",
                "redshift-data:ListStatements",
                "redshift-data:ListTables"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
2

Register the User in the AWS CLI

The newly created user needs to be registered with the AWS CLI. You can optionally provide a profile name. As default region, please use the AWS region that your redshift instance is located in.

The example below configures a profile called redshift

aws configure --profile redshift  
3

Connect SDF to AWS

Connect SDF to AWS by telling SDF which of your AWS profiles to use.

sdf auth login aws --profile redshift --default-region <REGION>

This will create a new credential in a ~/.sdf/ directory in the root of your system. This credential will be used to authenticate with AWS services. By default, the credential’s name is default. As such, the credential does not need to be explicitly referenced in the provider configuration below.

4

Add Redshift Provider in Workspace

Once authenticated, add a table provider block in your workspace.sdf.yml workspace block. This tells SDF to use Redshift to hydrate missing table schemas.

---
workspace:
    ...
    providers:
        - type: redshift
          cluster-identifier: <CLUSTER_IDENTIFIER>
          sources: 
            - pattern: <DATABASE>.*.*

Replace <DATABASE> with the name of the database you want to hydrate. Note this is configurable and can be changed to any database you have access to. For example, if I wanted SDF to pull from two databases, called db1 and db2, I would write:

---
workspace:
    ...
    providers:
        - type: redshift
          cluster-identifier: <CLUSTER_IDENTIFIER>
          sources: 
            - pattern: <DATABASE>.*.*
            - pattern: <DATABASE>.*.*

Lastly, providers can also be configured to use specific credentials by referencing the credential name in the provider block. For example, if you wanted to use a credential called redshift-creds, you would write:

workspace:
    ...
    providers:
        - type: redshift
          credential: redshift-creds
          cluster-identifier: <CLUSTER_IDENTIFIER>
          sources: 
            - pattern: db1.schema1.my_table
            - pattern: db1.schema2.*

For more information on provider configuration, see the provider documentation.

5

Try it out!

Now that you’re connected, let’s make sure SDF can pull the schema information it needs.

Run sdf compile -q "select * from <DATABASE>.<SCHEMA>.<TABLE>" --show all

If the connection is successful, you will see the schema for the table you selected.

The ARN you created also needs user access to the database. Depending on your DBs Access Policy, you might also have to give explicit access to this new user within your Redshift cluster. You can do this through by adding the ARN user to the appropriate redshift group:

ALTER GROUP <your-group-name> add user "IAMR:<IAM_ROLE_NAME>";

Troubleshoot Resources

  1. Seeing errors like this ERROR: permission denied for schema schema1 from sdf run?

    This may be due to the Redshift user not having the correct permissions to this schema. To check if you have the right access, you can run the following query.

    Use SVV_SCHEMA_PRIVILEGES to view the schema owner, and schema_acl which shows the access control list for the schema.

    SELECT schema_owner, schema_acl FROM SVV_ALL_SCHEMAS WHERE schema_name = 'schema1';
    

    If your user does not have access to the schema, you can grant access to the schema by running the following query. To run this query, you either have to be a superuser or the owner of the schema.

    GRANT CREATE ON SCHEMA schema1 TO <username>;
    

    The <username> is the user name of your IAM user, with IAM: prefix. For example, if your IAM user name is my_user, you can run:

    GRANT CREATE ON SCHEMA schema1 TO "IAM:my_user";