Advanced Snowflake Features & Configurations
This guide covers advanced features and configurations for Snowflake integrations in SDF, like role configuration, case-preserving identifiers, and more.
Configuring an SDF Role for Snowflake
Different SDF commands require different permissions from Snowflake, as such the Snowflake role SDF assumes should be configured to have the following permissions:
SELECT TABLE | SELECT VIEW | CREATE <MATERIALIZATION> | CREATE SCHEMA | CREATE DATABASE | |
---|---|---|---|---|---|
compile | 🟢 | 🟢 | |||
run | 🟢 | 🟢 | 🟢 | 🟢 | 🟢 |
The CREATE MATERIALIZATION
permission listed above is meant to be subsituted with the materialization type you are using. For example, if you are using table
materializations, the permission should be CREATE TABLE
. If using views or transient tables, the permission should be CREATE VIEW
or CREATE TRANSIENT TABLE
respectively.
If you’d like to grant all write access on a database, these can be easily achieved with a grant all
command on the database.
Required permissions for your Snowflake role are based on your integration configuration, as this determines where SDF will attempt to write models out to and where it will read schemas from.
Let’s imagine we had the following integration configured:
As we can see from the integration, SDF will read schemas from the my_production_database
database and write models to the dev_sandbox
database.
Despite my_production_database
being included as a target, SDF only requires write access to the databases and schemas after the rename operation. As such, the dev_sandbox
schema should be the only schema that requires write access.
Now let’s create a role sdf_dev_role
with permissions to write to the dev_sandbox
database and read from the my_production_database
database. This role will model exactly what’s required to compile and run with the integration above.
Case-Preserving Identifiers
SDF strongly recommends using the default to_upper
dialect for all models in Snowflae, so that you’re local model specification matches Snowflake’s behavior in the cloud. Rarely if not ever should identifiers have their case preserved from the file system. Doing so likely creates unnecessary confusion in how models will be materialized and how they should be referenced in Snowflake.
In Snowflake, all names (including table names and column names) are case-sensitive. In addition, Snowflake normalizes all unquoted SQL identifiers to uppercase. This means if you execute a DDL CREATE TABLE my_model AS ...
in Snowflake, the newly created table will be called MY_MODEL
.
Snowflake, however, will preserve the case of identifiers if they are enclosed in double quotes. This means the following SQL would produce a table called MyModel
:
If you’d like to write a model in SDF that, when materialized, preserves the case of the filename that produces the model name, you can set the casing-policy
option to preserve
instead of to_upper
for that model.
SDF determines the casing of identifiers using the casing-policy
property. Since Snowflake post-normalization differs in its behavior from Snowflake pre-normalization, we introduced a casing-policy
property to capture that behavior. In most dialects, this defaults to preserve
. However, Snowflake is the special child! Since Snowflake normalizes to upper case, we default the casing-policy
to preserve
for Snowflake. You should only overwrite this if you need to preserve identifier casing, or handle SQL filenames that begin with numbers.
The casing-policy
property can be set at an individual table-block level, like so:
For all supported casing-policy options, see the casing-policy reference.
Snowflake Warehouse Specification
The Snowflake warehouse that a model runs on can be overwritten on the model-level. This allows for fine-grained configuration of how resources are utilized on a per-model basis, opening the door for cost and performance optimization through intelligent warehouse selection.
To specify the warehouse for a model, add the warehouse
property to the table-block of an sdf.yml
file.
The warehouse can be specified easily on the model level with the simple top-level warehouse specification like so:
In incremental and snapshot scenarios, the top-level warehouse specification will be used by default for the first run, full refreshes, and incremental/snapshot runs.
However, in certain scenarios you might want to use a smaller warehouse for incremental or snapshot runs. This is configurable within the incremental-options
and snapshot-options
configs respectively. Here’s an example:
In this example, BIG_WH
will be used by default for the first run, full refresh runs, and all tests, since tests will scan the history of all increments. Then, due to the compact-mode-warehouse
property, SMALL_WH
will be used for incremental runs.