Overview
In this guide, we’ll materialize our first tables and views in Snowflake with SDF. We’ll be using some datasets from our partner, Cybersyn, to bootstrap some example data through Snowflake’s data marketplace.Prerequisites
This guide should be followed after completing the Getting Started with Snowflake and SDF guide.
- A Snowflake account with this Cybersyn dataset installed.
- Valid Snowflake username / password credentials with write access to at least one database we can materialize tables to.
- Instantiated credentials completed in the previous guide.
When installing the Cybersyn dataset, make sure to grant the Snowflake role you’ll use with SDF read access to
TECH__INNOVATION_ESSENTIALS
(i.e. the Cybersyn database).Guide
1
Create a New SDF Project from the Cybersyn Tech Innovation Sample
Create a new SDF project using the Cybersyn Tech Innovation sample. This will create a new project in your current working directory with the sample project files.
2
Compile to Test Credentials
To ensure your credentials are working and have read access to the new Cybersyn database, let’s try compiling one of the models.
Working set 3 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.OPENALEX_FUNDERS_INDEX (schema)
Compiling sdf_snowflake.cybersyn_tech_innovation.funder_aggregates (./models/sdf_snowflake/cybersyn_tech_innovation/funder_aggregates.sql)
Finished 2 models [2 succeeded] in 1.985 secsSchema sdf_snowflake.cybersyn_tech_innovation.FUNDER_AGGREGATES
┌──────────────────────┬────────────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞══════════════════════╪════════════════╪════════════╪═════════════╡
│ FUNDER_NAME ┆ varchar ┆ ┆ │
│ TOTAL_FUNDERS ┆ decimal(38, 0) ┆ ┆ │
│ TOTAL_WORKS_COUNT ┆ decimal(38, 0) ┆ ┆ │
│ TOTAL_CITATIONS ┆ decimal(38, 0) ┆ ┆ │
│ TOTAL_GRANTS ┆ decimal(38, 0) ┆ ┆ │
│ UNIQUE_COUNTRY_COUNT ┆ decimal(38, 0) ┆ ┆ │
│ FIRST_CREATED_DATE ┆ timestamp ┆ ┆ │
│ LAST_UPDATED_DATE ┆ timestamp ┆ ┆ │
└──────────────────────┴────────────────┴────────────┴─────────────┘
If you do not see a successful compilation, please ensure you’ve:
- Followed the Getting Started with Snowflake and SDF guide to authenticate to your Snowflake.
- Granted the correct role read access to the
TECH__INNOVATION_ESSENTIALS
database.
3
Materialize Tables in Snowflake
If you compiled successfully, great! That means SDF was able to read table schemas from the Cybersyn database. Now, let’s materialize in a separate database called Let’s go through these one by one:This tells SDF to hydrate missing table schemas from the This is great, but what if we want to see the data our views produced?Let’s add small flag to Congratulations! You’ve just materialized your first tables in Snowflake with SDF, and even got a sneak peek at the data they produced.
sdf_snowflake
.You’ll notice in the workspace.sdf.yml
file specifies the following defaults:dialect: snowflake
specifies that the SQL we’re working with uses Snowflake’s syntax and functions.preprocessor: jinja
specifies that we’re using Jinja templating in our SQL models (this will come in handy later).materialization: view
specifies that we’re materializing our models as views by default in Snowflake. Note this can be overwritten on a per-model, or per-includes-path basis.
tech__innovation_essentials.cybersyn
database and schema and materialize tables to the sdf_snowflake
database. For more information on integration configuration, see the integration documentation.Now, let’s materialize the tables in the sdf_snowflake
database.sdf run
that tells SDF to pull down a sample of the data from the views we just materialized.4
Materialize a New Table
Let’s say we want to materialize a new table in the The index here means we’ll interpret the database and schema name from the file path like so - This table will select the latest Great! Everything compiled. Now, let’s make sure this table is materialized as a Now, let’s run the model to materialize the table in the Congratulations! You’ve just materialized your first table in Snowflake with SDF. In the next step, we’ll add one more table and a downstream view to create a DAG.
sdf_snowflake
database, but in a different schema. You’ll notice in the workspace.sdf.yml
file, we have the following includes block:models/<database>/<schema>/<table>.sql
.Since we want to add a new table, let’s create a new file in the models
directory called sdf_snowflake/staging/latest_repo_name.sql
with the following content:repo_name
and repo_id
from github_repos
created in the last 24 hours.Now let’s compile this new table to ensure the SQL is valid and selects columns that actually exist:Working set 4 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_REPOS (schema)
Compiling sdf_snowflake.staging.latest_repo_name (./models/sdf_snowflake/staging/latest_repo_name.sql)
Finished 2 models [2 succeeded] in 1.750 secsSchema sdf_snowflake.staging.LATEST_REPO_NAME
┌─────────────┬────────────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪════════════════╪════════════╪═════════════╡
│ REPO_NAME ┆ varchar ┆ ┆ │
│ REPO_ID ┆ decimal(38, 0) ┆ ┆ │
└─────────────┴────────────────┴────────────┴─────────────┘
table
instead of a view
in the sdf_snowflake
database. We’re doing this now since later one we’ll want to use this table as an upstream dependency in our DAG.To accomplish this, we’ll need to overwrite our materialization
default in the workspace.sdf.yml
file, and specify the materialization
as table
for this specific model.Under our workspace.sdf.yml
block, we can add:SDF YML blocks are composable like legos. You can add as many blocks as you need to your
workspace.sdf.yml
file, or even split them out into separate files and include them with the type: metadata
in your includes
block.
Creating separate YML files for each purpose or model can help keep your configuration organized and easy to maintain.sdf_snowflake
database.Working set 4 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_REPOS (schema)
Running sdf_snowflake.staging.latest_repo_name (./models/sdf_snowflake/staging/latest_repo_name.sql)
Finished 2 models [2 succeeded] in 5.496 secsTable sdf_snowflake.staging.LATEST_REPO_NAME
┌─────────────────────────────────┬───────────┐
│ REPO_NAME ┆ REPO_ID │
╞═════════════════════════════════╪═══════════╡
│ 0-Anonymous/Secure-SMTP-Server ┆ 926399969 │
│ 0-robinson-1/pizzeria ┆ 926565957 │
│ 00-berni/proj_0 ┆ 537524710 │
│ 00-berni/proj_3 ┆ 538682956 │
│ 00-kat/ghostty ┆ 926370638 │
│ 00-kat/wraith ┆ 926401364 │
│ 0000yuyu/youtube ┆ 926296773 │
│ 0001Moksh/Deva_voice_assistant ┆ 926354200 │
│ 0001Moksh/Deva_voice_assistant- ┆ 926350659 │
│ 0001Moksh/c_-_cpp_language ┆ 926330641 │
└─────────────────────────────────┴───────────┘
10 rows.
5
Create a DAG
Now that we’ve materialized a new table, let’s create a DAG by adding a downstream view that depends on the table we just created and one more table we’ll materialize now.The DAG will look like this:Let’s first create a new file in the This table will select all Great! Everything compiled. Now, let’s make sure this table is materialized as a Now, let’s run the model to materialize the table in the Great! Now that we have our two tables, let’s finish up our DAG by creating a downstream view that depends on these two tables.Create a new file in the Since our workspace’s default Woohoo! Another successful compile. Now, let’s try one final run of everything in this workspace to materialize our new view and run the entire DAG.Congratulations! You’ve just created your first DAG in Snowflake with SDF.
models
directory called sdf_snowflake/staging/push_events.sql
with the following content:push
events from github_events
created in the last 24 hours.Now let’s compile this new table to ensure the SQL is valid and selects columns that actually exist and are properly typed:Working set 5 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_EVENTS (schema)
Compiling sdf_snowflake.staging.push_events (./models/sdf_snowflake/staging/push_events.sql)
Finished 2 models [2 succeeded] in 1.732 secsSchema sdf_snowflake.staging.PUSH_EVENTS
┌─────────────┬────────────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪════════════════╪════════════╪═════════════╡
│ ID ┆ varchar ┆ ┆ │
│ REPO_ID ┆ decimal(38, 0) ┆ ┆ │
└─────────────┴────────────────┴────────────┴─────────────┘
table
instead of a view
in the sdf_snowflake
database.Let’s add another table
block to our workspace.sdf.yml
file:sdf_snowflake
database.models
directory called sdf_snowflake/cybersyn_tech_innovation/events_by_repo.sql
with the following content:materialization
is view, we don’t need to specify the materialization for this model.Now, let’s compile again to ensure our entire workspace is still in a good state:Working set 6 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_STARS (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.OPENALEX_FUNDERS_INDEX (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.USPTO_PATENT_INDEX (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_EVENTS (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.USPTO_CONTRIBUTOR_INDEX (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.USPTO_PATENT_CONTRIBUTOR_RELATIONSHIPS (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_REPOS (schema)
Compiling sdf_snowflake.staging.push_events (./models/sdf_snowflake/staging/push_events.sql)
Compiling sdf_snowflake.cybersyn_tech_innovation.funder_aggregates (./models/sdf_snowflake/cybersyn_tech_innovation/funder_aggregates.sql)
Compiling sdf_snowflake.cybersyn_tech_innovation.all_nvidia_patents (./models/sdf_snowflake/cybersyn_tech_innovation/all_nvidia_patents.sql)
Compiling sdf_snowflake.cybersyn_tech_innovation.most_starred_repos (./models/sdf_snowflake/cybersyn_tech_innovation/most_starred_repos.sql)
Compiling sdf_snowflake.staging.latest_repo_name (./models/sdf_snowflake/staging/latest_repo_name.sql)
Compiling sdf_snowflake.cybersyn_tech_innovation.events_by_repo (./models/sdf_snowflake/cybersyn_tech_innovation/events_by_repo.sql)
Finished 13 models [13 succeeded] in 2.054 secs