Materialize tables and views in Snowflake with SDF.
TECH__INNOVATION_ESSENTIALS
(i.e. the Cybersyn database).Create a New SDF Project from the Cybersyn Tech Innovation Sample
Compile to Test Credentials
Working set 3 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.OPENALEX_FUNDERS_INDEX (schema)
Compiling sdf_snowflake.cybersyn_tech_innovation.funder_aggregates (./models/sdf_snowflake/cybersyn_tech_innovation/funder_aggregates.sql)
Finished 2 models [2 succeeded] in 1.985 secsSchema sdf_snowflake.cybersyn_tech_innovation.FUNDER_AGGREGATES
┌──────────────────────┬────────────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞══════════════════════╪════════════════╪════════════╪═════════════╡
│ FUNDER_NAME ┆ varchar ┆ ┆ │
│ TOTAL_FUNDERS ┆ decimal(38, 0) ┆ ┆ │
│ TOTAL_WORKS_COUNT ┆ decimal(38, 0) ┆ ┆ │
│ TOTAL_CITATIONS ┆ decimal(38, 0) ┆ ┆ │
│ TOTAL_GRANTS ┆ decimal(38, 0) ┆ ┆ │
│ UNIQUE_COUNTRY_COUNT ┆ decimal(38, 0) ┆ ┆ │
│ FIRST_CREATED_DATE ┆ timestamp ┆ ┆ │
│ LAST_UPDATED_DATE ┆ timestamp ┆ ┆ │
└──────────────────────┴────────────────┴────────────┴─────────────┘
TECH__INNOVATION_ESSENTIALS
database.Materialize Tables in Snowflake
sdf_snowflake
.You’ll notice in the workspace.sdf.yml
file specifies the following defaults:dialect: snowflake
specifies that the SQL we’re working with uses Snowflake’s syntax and functions.preprocessor: jinja
specifies that we’re using Jinja templating in our SQL models (this will come in handy later).materialization: view
specifies that we’re materializing our models as views by default in Snowflake. Note this can be overwritten on a per-model, or per-includes-path basis.tech__innovation_essentials.cybersyn
database and schema and materialize tables to the sdf_snowflake
database. For more information on integration configuration, see the integration documentation.Now, let’s materialize the tables in the sdf_snowflake
database.sdf run
that tells SDF to pull down a sample of the data from the views we just materialized.Materialize a New Table
sdf_snowflake
database, but in a different schema. You’ll notice in the workspace.sdf.yml
file, we have the following includes block:models/<database>/<schema>/<table>.sql
.Since we want to add a new table, let’s create a new file in the models
directory called sdf_snowflake/staging/latest_repo_name.sql
with the following content:repo_name
and repo_id
from github_repos
created in the last 24 hours.Now let’s compile this new table to ensure the SQL is valid and selects columns that actually exist:Working set 4 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_REPOS (schema)
Compiling sdf_snowflake.staging.latest_repo_name (./models/sdf_snowflake/staging/latest_repo_name.sql)
Finished 2 models [2 succeeded] in 1.750 secsSchema sdf_snowflake.staging.LATEST_REPO_NAME
┌─────────────┬────────────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪════════════════╪════════════╪═════════════╡
│ REPO_NAME ┆ varchar ┆ ┆ │
│ REPO_ID ┆ decimal(38, 0) ┆ ┆ │
└─────────────┴────────────────┴────────────┴─────────────┘
table
instead of a view
in the sdf_snowflake
database. We’re doing this now since later one we’ll want to use this table as an upstream dependency in our DAG.To accomplish this, we’ll need to overwrite our materialization
default in the workspace.sdf.yml
file, and specify the materialization
as table
for this specific model.Under our workspace.sdf.yml
block, we can add:workspace.sdf.yml
file, or even split them out into separate files and include them with the type: metadata
in your includes
block.
Creating separate YML files for each purpose or model can help keep your configuration organized and easy to maintain.sdf_snowflake
database.Working set 4 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_REPOS (schema)
Running sdf_snowflake.staging.latest_repo_name (./models/sdf_snowflake/staging/latest_repo_name.sql)
Finished 2 models [2 succeeded] in 5.496 secsTable sdf_snowflake.staging.LATEST_REPO_NAME
┌─────────────────────────────────┬───────────┐
│ REPO_NAME ┆ REPO_ID │
╞═════════════════════════════════╪═══════════╡
│ 0-Anonymous/Secure-SMTP-Server ┆ 926399969 │
│ 0-robinson-1/pizzeria ┆ 926565957 │
│ 00-berni/proj_0 ┆ 537524710 │
│ 00-berni/proj_3 ┆ 538682956 │
│ 00-kat/ghostty ┆ 926370638 │
│ 00-kat/wraith ┆ 926401364 │
│ 0000yuyu/youtube ┆ 926296773 │
│ 0001Moksh/Deva_voice_assistant ┆ 926354200 │
│ 0001Moksh/Deva_voice_assistant- ┆ 926350659 │
│ 0001Moksh/c_-_cpp_language ┆ 926330641 │
└─────────────────────────────────┴───────────┘
10 rows.
Create a DAG
models
directory called sdf_snowflake/staging/push_events.sql
with the following content:push
events from github_events
created in the last 24 hours.Now let’s compile this new table to ensure the SQL is valid and selects columns that actually exist and are properly typed:Working set 5 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_EVENTS (schema)
Compiling sdf_snowflake.staging.push_events (./models/sdf_snowflake/staging/push_events.sql)
Finished 2 models [2 succeeded] in 1.732 secsSchema sdf_snowflake.staging.PUSH_EVENTS
┌─────────────┬────────────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪════════════════╪════════════╪═════════════╡
│ ID ┆ varchar ┆ ┆ │
│ REPO_ID ┆ decimal(38, 0) ┆ ┆ │
└─────────────┴────────────────┴────────────┴─────────────┘
table
instead of a view
in the sdf_snowflake
database.Let’s add another table
block to our workspace.sdf.yml
file:sdf_snowflake
database.models
directory called sdf_snowflake/cybersyn_tech_innovation/events_by_repo.sql
with the following content:materialization
is view, we don’t need to specify the materialization for this model.Now, let’s compile again to ensure our entire workspace is still in a good state:Working set 6 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_STARS (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.OPENALEX_FUNDERS_INDEX (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.USPTO_PATENT_INDEX (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_EVENTS (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.USPTO_CONTRIBUTOR_INDEX (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.USPTO_PATENT_CONTRIBUTOR_RELATIONSHIPS (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_REPOS (schema)
Compiling sdf_snowflake.staging.push_events (./models/sdf_snowflake/staging/push_events.sql)
Compiling sdf_snowflake.cybersyn_tech_innovation.funder_aggregates (./models/sdf_snowflake/cybersyn_tech_innovation/funder_aggregates.sql)
Compiling sdf_snowflake.cybersyn_tech_innovation.all_nvidia_patents (./models/sdf_snowflake/cybersyn_tech_innovation/all_nvidia_patents.sql)
Compiling sdf_snowflake.cybersyn_tech_innovation.most_starred_repos (./models/sdf_snowflake/cybersyn_tech_innovation/most_starred_repos.sql)
Compiling sdf_snowflake.staging.latest_repo_name (./models/sdf_snowflake/staging/latest_repo_name.sql)
Compiling sdf_snowflake.cybersyn_tech_innovation.events_by_repo (./models/sdf_snowflake/cybersyn_tech_innovation/events_by_repo.sql)
Finished 13 models [13 succeeded] in 2.054 secs