Getting Started with Snowflake and SDF
SDF as a best-in-class transformation layer for Snowflake
Overview
SDF can hydrate table schemas and execute queries against a Snowflake DB. This guide will walk you through the steps of creating a Snowflake integration. By the end, you will be able to reference tables in your snowflake instance in SDF.
SDF will use remote Snowflake schema information to do type bindings and other static analysis checks on your workspace.
Prerequisites
In order to connect to Snowflake, you will need to have on hand the following:
- A Snowflake account to connect to.
- Valid credentials with read access to materialize tables in your Snowflake account.
Guide
Collect Required Information
To connect to Snowflake, you need the following materials:
- Account ID - the 8 digit organization identifier
- Username - your Snowflake username
- Password - your Snowflake password
- Role - a role with appropriate permissions
- Warehouse - The warehouse you are connecting to.
Log in to your Snowflake account to find the relevant information.
Run sdf auth login snowflake --help
to see all login options.
Connect SDF to Snowflake
Connect SDF to snowflake. SDF will prompt you for your Snowflake password.
sdf auth login snowflake --account-id <ACCOUNT_ID> --username <USERNAME> --role <ROLE> --warehouse <WAREHOUSE_NAME>
This will create a new credential in a ~/.sdf/
directory in the root of your system. This credential will be used to authenticate with Snowflake. By default, the
credential’s name is default. As such, the credential does not need to be explicitly referenced in the integrations configuration below.
To validate the connection, run:
sdf auth status
Add Snowflake Provider in Workspace
Once authenticated, add an integration block to your workspace.sdf.yml
. This tells SDF to use Snowflake to hydrate missing table schemas.
workspace:
...
integrations:
- provider: snowflake
type: database
sources:
- pattern: <DATABASE>.*.*
Replace <DATABASE>
with the name of the database you want to hydrate. Note this is configurable and can be changed to any database you have access to. For example, if I wanted SDF to pull from two databases, called db1
and db2
, I would write:
workspace:
...
integrations:
- provider: snowflake
type: database
sources:
- pattern: db1.*.*
- pattern: db2.*.*
Integrations can also be configured to use specific credentials by referencing the credential name in the integration block. For example, if you wanted to use a credential called snowflake-creds
, you would write:
workspace:
...
integrations:
- provider: snowflake
type: database
credential: snowflake-creds
sources:
- pattern: db1.schema1.mytable
- pattern: db2.schema2.*
For more information on integration configuration, see the integration documentation.
Try it out!
Now that you’re connected, let’s make sure SDF can pull the schema information it needs.
Run sdf compile -q "select * from <DATABASE>.<SCHEMA>.<TABLE>" --show all
If the connection is successful, you will see the schema for the table you selected.
The term ‘DATABASE’ in Snowflake is interchangeable with the term ‘CATALOG’ in SDF.
Configuring an SDF Role for Snowflake
Different SDF commands require different permissions from Snowflake, as such the Snowflake role SDF assumes should be configured to have the following permissions:
SELECT TABLE | SELECT VIEW | CREATE <MATERIALIZATION> | CREATE SCHEMA | CREATE DATABASE | |
---|---|---|---|---|---|
compile | 🟢 | 🟢 | |||
run | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
The CREATE MATERIALIZATION
permission listed above is meant to be subsituted with the materialization type you are using. For example, if you are using table
materializations, the permission should be CREATE TABLE
. If using views or transient tables, the permission should be CREATE VIEW
or CREATE TRANSIENT TABLE
respectively.
If you’d like to grant all write access on a database, these can be easily achieved with a grant all
command on the database.
Required permissions for your Snowflake role are based on your integration configuration, as this determines where SDF will attempt to write models out to and where it will read schemas from.
Let’s imagine we had the following integration configured:
integrations:
- provider: snowflake
type: database
credential: limited
sources:
- pattern: my_production_database.*.*
targets:
- pattern: my_production_database.*.*
rename-as: dev_sandbox.${1}.${2}
As we can see from the integration, SDF will read schemas from the my_production_database
database and write models to the dev_sandbox
database.
Despite my_production_database
being included as a target, SDF only requires write access to the databases and schemas after the rename operation. As such, the dev_sandbox
schema should be the only schema that requires write access.
Now let’s create a role sdf_dev_role
with permissions to write to the dev_sandbox
database and read from the my_production_database
database. This role will model exactly what’s required to compile and run with the integration above.
-- Create the role
CREATE ROLE IF NOT EXISTS sdf_dev_role;
-- Allow your user to assume this role. Replace \<YOUR_USERNAME\> with the Snowflake username you're using with SDF.
GRANT ROLE sdf_dev_role TO USER \<YOUR_USERNAME\>;
-- Grant the role access to the warehouse. Replace \<MY_WAREHOUSE\> with the name of your warehouse.
GRANT USAGE ON WAREHOUSE \<MY_WAREHOUSE\> TO ROLE sdf_dev_role;
-- Grant the role read access to all tables and views in `my_production_database`
GRANT SELECT ON ALL TABLES IN DATABASE my_production_database TO ROLE sdf_dev_role;
GRANT SELECT ON ALL VIEWS IN DATABASE my_production_database TO ROLE sdf_dev_role;
-- Everything above is all that is necessary for `sdf compile`. If you want to run models, you'll need to grant the following permissions as well.
-- Grant the role write access to the `dev_sandbox` database
GRANT ALL ON DATABASE elias_sandbox TO ROLE sdf_dev_role;
-- Lastly, if you'd like SDF to be able to create databases for you as you author them locally, you'll need to grant the following permissions as well.
-- This is useful for provisioning new sandbox datases per-developer.
GRANT CREATE DATABASE ON ACCOUNT TO ROLE sdf_dev_role;
Next Steps
Now that you have connected to Snowflake, you can start materializing tables in Snowflake. Check out the Snowflake materialization guide to get started.