Getting Started
SDF supports reading data and metadata from S3, and writing it the results back.
Overview
This guide will walk you through the steps of creating an AWS S3 integration. By the end, you will be able to reference remote S3 objects in SDF as if they were locally defined!
SDF will use remote S3 schema information to do type bindings, column level lineage, and other static analysis checks on your workspace.
Guide using an IAM access_key and secret_key
Collect Required Information
To connect to S3 via IAM credentials, you need a valid keypair with the minimum credentials to read objects from your S3 Object Storage.
access_key
the access key of your IAM profilesecret_access_key
the secret key of your IAM profileregion
containing the S3 objects
Run sdf auth login aws --help
to see all login options.
Register the User in the AWS CLI
To begin, you need to register your user with the AWS CLI. You can optionally provide a profile name. As default region, please use the AWS region that your S3 bucket is located in.
The example below configures a profile called S3
Connect SDF to AWS
Connect SDF to AWS by telling SDF which of your AWS profiles to use.
This will create a new credential in
a ~/.sdf/
directory in the root of your system. This credential will be used
to authenticate with AWS services. By default, the credential’s name is default.
As such, the credential does not need to be explicitly referenced in the integrations
configuration below.
To validate the connection, run:
Add S3 Integration in your SDF Workspace
Once authenticated, add an integrations block in your workspace.sdf.yml
workspace block. This tells SDF to pull data from the specified bucket,
For more information on SDF integrations, see the integration documentation.
Create a Table that Pulls Data from S3
Next, we’ll need to define an external table in SQL that references data in our S3 bucket. Let’s imagine this table was pulling data from a CSV file, if so the definition would like like so:
We recommend placing this file in the models
directory, or whichever directory you have specified in your workspace.sdf.yml
’s includes block.
Try it out!
Now that you’re connected and you have a table defined, let’s make sure SDF can pull the schema information it needs.
Run sdf compile -q "select * from database.schema.my_external_table" --show all
If the connection is successful, you will see the schema for the table you selected.
The ARN you created also needs user access to the database. Depending on your DBs Access Policy, you might also have to give explicit access to this new user for this S3 bucket. You can do this through by adding the ARN user to the appropriate group.