> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Basic Materialization

> Materialize tables and views in Snowflake with SDF.

## Overview

In this guide, we'll materialize our first tables and views in Snowflake with SDF. We'll be using some datasets from our partner, [Cybersyn](https://www.cybersyn.com/), to bootstrap some example data through Snowflake's data marketplace.

## Prerequisites

<Info>
  This guide should be followed after completing the [Getting Started with Snowflake and SDF](/integrations/snowflake/getting-started) guide.
</Info>

Before beginning, we'll need to make sure we have the following:

* A Snowflake account with [this](https://app.snowflake.com/marketplace/listing/GZTSZAS2KII/cybersyn-tech-innovation-essentials?search=cybersyn) Cybersyn dataset installed.
* Valid Snowflake username / password credentials with write access to at least one database we can materialize tables to.
* Instantiated credentials completed in the previous guide.

<Note>
  When installing the Cybersyn dataset, make sure to grant the Snowflake role you'll use with SDF read access to `TECH__INNOVATION_ESSENTIALS` (i.e. the Cybersyn database).
</Note>

## Guide

<Steps>
  <Step title="Create a New SDF Project from the Cybersyn Tech Innovation Sample">
    Create a new SDF project using the Cybersyn Tech Innovation sample. This will create a new project in your current working directory with the sample project files.

    ```shell theme={null}
    sdf new --sample cybersyn_tech_innovation 
    ```
  </Step>

  <Step title="Compile to Test Credentials">
    To ensure your credentials are working and have read access to the new Cybersyn database, let's try compiling one of the models.

    ```shell theme={null}
    sdf compile sdf_snowflake.cybersyn_tech_innovation.funder_aggregates
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 3 model files, 1 .sdf file
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.OPENALEX\_FUNDERS\_INDEX (schema)
            Compiling sdf\_snowflake.cybersyn\_tech\_innovation.funder\_aggregates (./models/sdf\_snowflake/cybersyn\_tech\_innovation/funder\_aggregates.sql)
             Finished 2 models \[2 succeeded] in 1.985 secs

          Schema sdf\_snowflake.cybersyn\_tech\_innovation.FUNDER\_AGGREGATES
          ┌──────────────────────┬────────────────┬────────────┬─────────────┐
          │ column\_name          ┆ data\_type      ┆ classifier ┆ description │
          ╞══════════════════════╪════════════════╪════════════╪═════════════╡
          │ FUNDER\_NAME          ┆ varchar        ┆            ┆             │
          │ TOTAL\_FUNDERS        ┆ decimal(38, 0) ┆            ┆             │
          │ TOTAL\_WORKS\_COUNT    ┆ decimal(38, 0) ┆            ┆             │
          │ TOTAL\_CITATIONS      ┆ decimal(38, 0) ┆            ┆             │
          │ TOTAL\_GRANTS         ┆ decimal(38, 0) ┆            ┆             │
          │ UNIQUE\_COUNTRY\_COUNT ┆ decimal(38, 0) ┆            ┆             │
          │ FIRST\_CREATED\_DATE   ┆ timestamp      ┆            ┆             │
          │ LAST\_UPDATED\_DATE    ┆ timestamp      ┆            ┆             │
          └──────────────────────┴────────────────┴────────────┴─────────────┘
        </code>
      </pre>
    </div>

    <Warning>
      If you do not see a successful compilation, please ensure you've:

      1. Followed the [Getting Started with Snowflake and SDF](/integrations/snowflake/getting-started) guide to authenticate to your Snowflake.
      2. Granted the correct role read access to the `TECH__INNOVATION_ESSENTIALS` database.
    </Warning>
  </Step>

  <Step title="Materialize Tables in Snowflake">
    If you compiled successfully, great! That means SDF was able to read table schemas from the Cybersyn database. Now, let's materialize in a separate database called `sdf_snowflake`.

    You'll notice in the `workspace.sdf.yml` file specifies the following defaults:

    ```yaml theme={null}
    defaults:
        dialect: snowflake
        preprocessor: jinja
        materialization: view
    ```

    Let's go through these one by one:

    * `dialect: snowflake` specifies that the SQL we're working with uses Snowflake's syntax and functions.
    * `preprocessor: jinja` specifies that we're using Jinja templating in our SQL models (this will come in handy later).
    * `materialization: view` specifies that we're materializing our models as views by default in Snowflake. Note this can be overwritten on a per-model, or per-includes-path basis.

    Next, let's look at the integrations block:

    ```yaml theme={null}
    integrations:
      - provider: snowflake
        type: database
        sources: 
           - pattern: tech__innovation_essentials.cybersyn.*
        targets:
           - pattern: sdf_snowflake.*.*
    ```

    This tells SDF to hydrate missing table schemas from the `tech__innovation_essentials.cybersyn` database and schema and materialize tables to the `sdf_snowflake` database. For more information on integration configuration, see the [integration documentation](/guide/setup/integrations).

    Now, let's materialize the tables in the `sdf_snowflake` database.

    ```shell theme={null}
    sdf run
    ```

    This is great, but what if we want to see the data our views produced?

    Let's add small flag to `sdf run` that tells SDF to pull down a sample of the data from the views we just materialized.

    ```shell theme={null}
    sdf run --show all
    ```

    {/* ```run shell
              cd tmp/cybersyn_tech_innovation && $sdf run --show all
              ``` */}

    Congratulations! You've just materialized your first tables in Snowflake with SDF, and even got a sneak peek at the data they produced.
  </Step>

  <Step title="Materialize a New Table">
    Let's say we want to materialize a new *table* in the `sdf_snowflake` database, but in a different schema. You'll notice in the `workspace.sdf.yml` file, we have the following includes block:

    ```yaml theme={null}
    - path: models 
      index: catalog-schema-table-name
    ```

    The index here means we'll interpret the database and schema name from the file path like so - `models/<database>/<schema>/<table>.sql`.

    Since we want to add a new table, let's create a new file in the `models` directory called `sdf_snowflake/staging/latest_repo_name.sql` with the following content:

    ```sql theme={null}
    WITH partitioned_repo_names AS (
        SELECT
            repo_name,
            repo_id,
            ROW_NUMBER() OVER (
                PARTITION BY repo_id
                ORDER BY
                first_seen DESC
            ) AS rn
        FROM
            tech__innovation_essentials.cybersyn.github_repos
        WHERE
            first_seen >= DATEADD(DAY, -1, CURRENT_TIMESTAMP())
    )
    SELECT
        repo_name,
        repo_id
    FROM partitioned_repo_names
    WHERE rn = 1;
    ```

    This table will select the latest `repo_name` and `repo_id` from `github_repos` created in the last 24 hours.

    Now let's compile this new table to ensure the SQL is valid and selects columns that actually exist:

    ```shell theme={null}
    sdf compile models/sdf_snowflake/staging/latest_repo_name.sql
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 4 model files, 1 .sdf file
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.GITHUB\_REPOS (schema)
            Compiling sdf\_snowflake.staging.latest\_repo\_name (./models/sdf\_snowflake/staging/latest\_repo\_name.sql)
             Finished 2 models \[2 succeeded] in 1.750 secs

          Schema sdf\_snowflake.staging.LATEST\_REPO\_NAME
          ┌─────────────┬────────────────┬────────────┬─────────────┐
          │ column\_name ┆ data\_type      ┆ classifier ┆ description │
          ╞═════════════╪════════════════╪════════════╪═════════════╡
          │ REPO\_NAME   ┆ varchar        ┆            ┆             │
          │ REPO\_ID     ┆ decimal(38, 0) ┆            ┆             │
          └─────────────┴────────────────┴────────────┴─────────────┘
        </code>
      </pre>
    </div>

    Great! Everything compiled. Now, let's make sure this table is materialized as a `table` instead of a `view` in the `sdf_snowflake` database. We're doing this now since later one we'll want to use this table as an upstream dependency in our DAG.

    To accomplish this, we'll need to overwrite our `materialization` default in the `workspace.sdf.yml` file, and specify the `materialization` as `table` for this specific model.

    Under our `workspace.sdf.yml` block, we can add:

    ```yml theme={null}
    ---
    table:
        name: sdf_snowflake.staging.latest_repo_name
        materialization: table
    ```

    <Tip>
      SDF YML blocks are composable like legos. You can add as many blocks as you need to your `workspace.sdf.yml` file, or even split them out into separate files and include them with the `type: metadata` in your `includes` block.
      Creating separate YML files for each purpose or model can help keep your configuration organized and easy to maintain.
    </Tip>

    Now, let's run the model to materialize the table in the `sdf_snowflake` database.

    ```shell theme={null}
    sdf run models/sdf_snowflake/staging/latest_repo_name.sql
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 4 model files, 1 .sdf file
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.GITHUB\_REPOS (schema)
              Running sdf\_snowflake.staging.latest\_repo\_name (./models/sdf\_snowflake/staging/latest\_repo\_name.sql)
             Finished 2 models \[2 succeeded] in 5.496 secs

          Table sdf\_snowflake.staging.LATEST\_REPO\_NAME
          ┌─────────────────────────────────┬───────────┐
          │ REPO\_NAME                       ┆ REPO\_ID   │
          ╞═════════════════════════════════╪═══════════╡
          │ 0-Anonymous/Secure-SMTP-Server  ┆ 926399969 │
          │ 0-robinson-1/pizzeria           ┆ 926565957 │
          │ 00-berni/proj\_0                 ┆ 537524710 │
          │ 00-berni/proj\_3                 ┆ 538682956 │
          │ 00-kat/ghostty                  ┆ 926370638 │
          │ 00-kat/wraith                   ┆ 926401364 │
          │ 0000yuyu/youtube                ┆ 926296773 │
          │ 0001Moksh/Deva\_voice\_assistant  ┆ 926354200 │
          │ 0001Moksh/Deva\_voice\_assistant- ┆ 926350659 │
          │ 0001Moksh/c\_-\_cpp\_language      ┆ 926330641 │
          └─────────────────────────────────┴───────────┘
          10 rows.
        </code>
      </pre>
    </div>

    Congratulations! You've just materialized your first table in Snowflake with SDF. In the next step, we'll add one more table and a downstream view to create a DAG.
  </Step>

  <Step title="Create a DAG">
    Now that we've materialized a new table, let's create a DAG by adding a downstream view that depends on the table we just created and one more table we'll materialize now.

    The DAG will look like this:

    ```mermaid theme={null}
    graph LR
        A[github_repos] --> B[latest_repo_name]
        C[github_events] --> D[push_events]
        B --> E[events_by_repo]
        D --> E
    ```

    Let's first create a new file in the `models` directory called `sdf_snowflake/staging/push_events.sql` with the following content:

    ```sql theme={null}
    SELECT
        id,
        repo_id
    FROM
        tech__innovation_essentials.cybersyn.github_events
    WHERE
        type like '%Push%'
    AND 
        created_at_timestamp >= DATEADD(DAY, -1, CURRENT_TIMESTAMP())
    ```

    This table will select all `push` events from `github_events` created in the last 24 hours.

    Now let's compile this new table to ensure the SQL is valid and selects columns that actually exist and are properly typed:

    ```shell theme={null}
    sdf compile models/sdf_snowflake/staging/push_events.sql
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 5 model files, 1 .sdf file
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.GITHUB\_EVENTS (schema)
            Compiling sdf\_snowflake.staging.push\_events (./models/sdf\_snowflake/staging/push\_events.sql)
             Finished 2 models \[2 succeeded] in 1.732 secs

          Schema sdf\_snowflake.staging.PUSH\_EVENTS
          ┌─────────────┬────────────────┬────────────┬─────────────┐
          │ column\_name ┆ data\_type      ┆ classifier ┆ description │
          ╞═════════════╪════════════════╪════════════╪═════════════╡
          │ ID          ┆ varchar        ┆            ┆             │
          │ REPO\_ID     ┆ decimal(38, 0) ┆            ┆             │
          └─────────────┴────────────────┴────────────┴─────────────┘
        </code>
      </pre>
    </div>

    Great! Everything compiled. Now, let's make sure this table is materialized as a `table` instead of a `view` in the `sdf_snowflake` database.

    Let's add another `table` block to our `workspace.sdf.yml` file:

    ```yaml theme={null}
    ---
    table:
        name: sdf_snowflake.staging.push_events
        materialization: table
    ```

    Now, let's run the model to materialize the table in the `sdf_snowflake` database.

    ```shell theme={null}
    sdf run models/sdf_snowflake/staging/push_events.sql
    ```

    {/* ```run shell
              cd tmp/cybersyn_tech_innovation && $sdf run models/sdf_snowflake/staging/push_events.sql
              ``` */}

    Great! Now that we have our two tables, let's finish up our DAG by creating a downstream view that depends on these two tables.

    Create a new file in the `models` directory called `sdf_snowflake/cybersyn_tech_innovation/events_by_repo.sql` with the following content:

    ```sql theme={null}
    SELECT
        r.repo_name,
        COUNT(DISTINCT e.id) AS event_count
    FROM
        sdf_snowflake.staging.push_events AS e
    JOIN sdf_snowflake.staging.latest_repo_name AS r ON e.repo_id = r.repo_id
    GROUP BY
        r.repo_name
    ORDER BY
        event_count DESC
    LIMIT 10;
    ```

    Since our workspace's default `materialization` is view, we don't need to specify the materialization for this model.

    Now, let's compile again to ensure our entire workspace is still in a good state:

    ```shell theme={null}
        sdf compile
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 6 model files, 1 .sdf file
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.GITHUB\_STARS (schema)
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.OPENALEX\_FUNDERS\_INDEX (schema)
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.USPTO\_PATENT\_INDEX (schema)
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.GITHUB\_EVENTS (schema)
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.USPTO\_CONTRIBUTOR\_INDEX (schema)
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.USPTO\_PATENT\_CONTRIBUTOR\_RELATIONSHIPS (schema)
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.GITHUB\_REPOS (schema)
            Compiling sdf\_snowflake.staging.push\_events (./models/sdf\_snowflake/staging/push\_events.sql)
            Compiling sdf\_snowflake.cybersyn\_tech\_innovation.funder\_aggregates (./models/sdf\_snowflake/cybersyn\_tech\_innovation/funder\_aggregates.sql)
            Compiling sdf\_snowflake.cybersyn\_tech\_innovation.all\_nvidia\_patents (./models/sdf\_snowflake/cybersyn\_tech\_innovation/all\_nvidia\_patents.sql)
            Compiling sdf\_snowflake.cybersyn\_tech\_innovation.most\_starred\_repos (./models/sdf\_snowflake/cybersyn\_tech\_innovation/most\_starred\_repos.sql)
            Compiling sdf\_snowflake.staging.latest\_repo\_name (./models/sdf\_snowflake/staging/latest\_repo\_name.sql)
            Compiling sdf\_snowflake.cybersyn\_tech\_innovation.events\_by\_repo (./models/sdf\_snowflake/cybersyn\_tech\_innovation/events\_by\_repo.sql)
             Finished 13 models \[13 succeeded] in 2.054 secs
        </code>
      </pre>
    </div>

    Woohoo! Another successful compile. Now, let's try one final run of everything in this workspace to materialize our new view and run the entire DAG.

    ```shell theme={null}
    sdf run 
    ```

    {/* ```run shell
              cd tmp/cybersyn_tech_innovation && $sdf run
              ``` */}

    Congratulations! You've just created your first DAG in Snowflake with SDF.
  </Step>
</Steps>

## Next Steps

Now that you've materialized your first tables and views in Snowflake with SDF, you can move on to more advanced topics like [incremental materialization](/integrations/snowflake/incremental-materialization).
