Overview

This guide will walk you through the steps of creating an AWS S3 integration. By the end, you will be able to reference remote S3 objects in SDF as if they were locally defined!

SDF will use remote S3 schema information to do type bindings, column level lineage, and other static analysis checks on your workspace.

Guide using an IAM access_key and secret_key

1

Collect Required Information

To connect to S3 via IAM credentials, you need a valid keypair with the minimum credentials to read objects from your S3 Object Storage.

  • access_key the access key of your IAM profile
  • secret_access_key the secret key of your IAM profile
  • region containing the S3 objects

Run sdf auth login aws --help to see all login options.

2

Register the User in the AWS CLI

To begin, you need to register your user with the AWS CLI. You can optionally provide a profile name. As default region, please use the AWS region that your S3 bucket is located in.

The example below configures a profile called S3

aws configure --profile S3  
3

Connect SDF to AWS

Connect SDF to AWS by telling SDF which of your AWS profiles to use.

sdf auth login aws --profile S3 --default-region <REGION>

This will create a new credential in a ~/.sdf/ directory in the root of your system. This credential will be used to authenticate with AWS services. By default, the credential’s name is default. As such, the credential does not need to be explicitly referenced in the integrations configuration below.

To validate the connection, run:

sdf auth status
4

Add S3 Integration in your SDF Workspace

Once authenticated, add an integrations block in your workspace.sdf.yml workspace block. This tells SDF to pull data from the specified bucket,

---
workspace:
    ...
    integrations:
        - provider: S3
          type: data
          bucket: 
            - uri: <LOCATION>
              region: <REGION>

For more information on SDF integrations, see the integration documentation.

5

Create a Table that Pulls Data from S3

Next, we’ll need to define an external table in SQL that references data in our S3 bucket. Let’s imagine this table was pulling data from a CSV file, if so the definition would like like so:

my_external_table.sql
    CREATE TABLE database.schema.my_external_table WITH (
        FORMAT='CSV', 
        SKIP_HEADER_LINE_COUNT=1,
        LOCATION='s3://path/to/your.csv'
    );

We recommend placing this file in the models directory, or whichever directory you have specified in your workspace.sdf.yml’s includes block.

6

Try it out!

Now that you’re connected and you have a table defined, let’s make sure SDF can pull the schema information it needs.

Run sdf compile -q "select * from database.schema.my_external_table" --show all

If the connection is successful, you will see the schema for the table you selected.

The ARN you created also needs user access to the database. Depending on your DBs Access Policy, you might also have to give explicit access to this new user for this S3 bucket. You can do this through by adding the ARN user to the appropriate group.