> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Incremental Materialization

> Materialize incremental models in Snowflake to save time and compute.

## Overview

This guide introduces the incremental materialization of models in Snowflake using SDF. We'll build on the basic concepts of materializing tables and views, focusing on incremental updates to optimize performance and resource usage. We continue to use datasets from Cybersyn available through Snowflake's marketplace.

## Prerequisites

<Info>
  This guide should be followed after completing the [Getting Started with Snowflake and SDF](/integrations/snowflake/getting-started) guide and the [Basic Materialization with Snowflake](/integrations/snowflake/basic-materialization) guide.
</Info>

You'll also need:

* A Snowflake account with [this](https://app.snowflake.com/marketplace/listing/GZTSZAS2KII/cybersyn-tech-innovation-essentials?search=cybersyn) Cybersyn dataset installed.
* Valid Snowflake username / password credentials with write access to at least one database we can materialize tables to.
* Instantiated credentials completed in the previous guide.

<Note>
  When installing the Cybersyn dataset, make sure to grant the Snowflake role you'll use with SDF read access to `TECH__INNOVATION_ESSENTIALS` (i.e. the Cybersyn database).
</Note>

## Guide

<Steps>
  <Step title="Create a New SDF Project from the Cybersyn Tech Innovation Sample">
    <Note>
      If you just completed the [Basic Materialization with Snowflake](/integrations/snowflake/basic-materialization) guide, you can skip the next two steps and use the project you created there.
    </Note>

    Create a new SDF project using the Cybersyn Tech Innovation sample. This will create a new project in your current working directory with the sample project files.

    ```shell theme={null}
    sdf new --sample cybersyn_tech_innovation cybersyn_tech_innovation_incremental
    ```
  </Step>

  <Step title="Compile to Test Credentials">
    To ensure your credentials are working and have read access to the new Cybersyn database, let's try compiling one of the models.

    ```shell theme={null}
    sdf compile sdf_snowflake.cybersyn_tech_innovation.funder_aggregates
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 3 model files, 1 .sdf file
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.OPENALEX\_FUNDERS\_INDEX (schema)
            Compiling sdf\_snowflake.cybersyn\_tech\_innovation.funder\_aggregates (./models/sdf\_snowflake/cybersyn\_tech\_innovation/funder\_aggregates.sql)
             Finished 2 models \[2 succeeded] in 1.782 secs

          Schema sdf\_snowflake.cybersyn\_tech\_innovation.FUNDER\_AGGREGATES
          ┌──────────────────────┬────────────────┬────────────┬─────────────┐
          │ column\_name          ┆ data\_type      ┆ classifier ┆ description │
          ╞══════════════════════╪════════════════╪════════════╪═════════════╡
          │ FUNDER\_NAME          ┆ varchar        ┆            ┆             │
          │ TOTAL\_FUNDERS        ┆ decimal(38, 0) ┆            ┆             │
          │ TOTAL\_WORKS\_COUNT    ┆ decimal(38, 0) ┆            ┆             │
          │ TOTAL\_CITATIONS      ┆ decimal(38, 0) ┆            ┆             │
          │ TOTAL\_GRANTS         ┆ decimal(38, 0) ┆            ┆             │
          │ UNIQUE\_COUNTRY\_COUNT ┆ decimal(38, 0) ┆            ┆             │
          │ FIRST\_CREATED\_DATE   ┆ timestamp      ┆            ┆             │
          │ LAST\_UPDATED\_DATE    ┆ timestamp      ┆            ┆             │
          └──────────────────────┴────────────────┴────────────┴─────────────┘
        </code>
      </pre>
    </div>

    <Warning>
      If you do not see a successful compilation, please ensure you've:

      1. Followed the [Getting Started with Snowflake and SDF](/integrations/snowflake/getting-started) guide to authenticate to your Snowflake.
      2. Granted the correct role read access to the `TECH__INNOVATION_ESSENTIALS` database.
    </Warning>
  </Step>

  <Step title="Materialize an Increment Model">
    Now that we've confirmed our credentials are working, let's materialize an incremental model. Cybersyn offers a table called `github_events` which gets updated frequently with new events across all repositories on GitHub. We'll use this table to demonstrate incremental materialization with the `append` merge strategy.

    Let's say we want to track push events to the [Apache DataFusion](https://github.com/apache/datafusion) repo. We can create a new model that finds all push events to DataFusion. Since the `github_events` source data is updated frequently, we can use incremental materialization to only query new push events, thereby saving on compute and optimizing our pipeline.
    Since we don't care about updates to existing rows, we can use the `append` merge strategy. This strategy appends new rows to the target table without updating existing rows.

    Let's start creating a file called `datafusion_push_events.sql` in the directory `models/sdf_snowflake/staging/` and adding the following SQL to it:

    ```sql models/sdf_snowflake/staging/datafusion_push_events.sql theme={null}
    SELECT
      *
    FROM
      tech__innovation_essentials.cybersyn.github_events
    WHERE
    {% if builtin.is_incremental_mode %}
      -- Only fetch rows that are newer than the newest row in the previous materialization of this table
      created_at_timestamp >= (SELECT MAX(created_at_timestamp) FROM sdf_snowflake.staging.datafusion_push_events)
    {% else %}
      created_at_timestamp >= DATEADD(WEEK, -1, CURRENT_TIMESTAMP())
    {% endif %}
    AND 
      type = 'PushEvent'
    AND
      repo_name = 'apache/datafusion';
    ```

    Let's unpack a few things here:

    * This relatively simple query fetches events and filters by their `type`, `repository name`, and `created_at_timestamp`.
    * The `{% if builtin.is_incremental_mode %}` block is a Jinja conditional that checks if the model is being materialized incrementally. If it is, we only fetch rows that are newer than the newest row in the previous materialization of this table. If not, we fetch rows from the last week.

    <Note>
      In a production scenario, you would likely want to fetch events from all time for non-incremental mode run as this would be a full refresh of the data. We are adding the week filter to prevent any major compute costs in this guide.
    </Note>

    You might be wondering, how does SDF know when to set `builtin.is_incremental_mode` to `True`? SDF sets this variable to `True` when the model has already been materialized in the cloud.

    Before running this model, we'll need need to tell SDF to overwrite the default materialization for this table. We can do this by adding the following to the `workspace.sdf.yml` file:

    ```yml workspace.sdf.yml theme={null}
    ---
    table:
      name: sdf_snowflake.staging.datafusion_push_events
      materialization: incremental-table  
    ```

    SDF defaults the incremental strategy to `append`, as such we don't need to specify it in this YML (we'll explore this later in the guide).
    Now, let's first compile our workspace with this new model:

    ```shell theme={null}
    sdf compile sdf_snowflake.staging.datafusion_push_events
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 4 model files, 1 .sdf file
          Downloading SDF\_SNOWFLAKE.STAGING.DATAFUSION\_PUSH\_EVENTS (exists\_remotely)
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.GITHUB\_EVENTS (schema)
            Compiling sdf\_snowflake.staging.datafusion\_push\_events (./models/sdf\_snowflake/staging/datafusion\_push\_events.sql)
             Finished 2 models \[2 succeeded] in 2.526 secs

          Schema sdf\_snowflake.staging.DATAFUSION\_PUSH\_EVENTS
          ┌────────────────────────────────┬────────────────┬────────────┬─────────────┐
          │ column\_name                    ┆ data\_type      ┆ classifier ┆ description │
          ╞════════════════════════════════╪════════════════╪════════════╪═════════════╡
          │ ID                             ┆ varchar        ┆            ┆             │
          │ CREATED\_AT\_TIMESTAMP           ┆ timestamp      ┆            ┆             │
          │ TYPE                           ┆ varchar        ┆            ┆             │
          │ ACTOR\_AVATAR\_URL               ┆ varchar        ┆            ┆             │
          │ ACTOR\_DISPLAY\_LOGIN            ┆ varchar        ┆            ┆             │
          │ ACTOR\_GRAVATAR\_ID              ┆ varchar        ┆            ┆             │
          │ ACTOR\_ID                       ┆ decimal(38, 0) ┆            ┆             │
          │ ACTOR\_LOGIN                    ┆ varchar        ┆            ┆             │
          │ ACTOR\_URL                      ┆ varchar        ┆            ┆             │
          │ REPO\_ID                        ┆ decimal(38, 0) ┆            ┆             │
          │ REPO\_NAME                      ┆ varchar        ┆            ┆             │
          │ REPO\_URL                       ┆ varchar        ┆            ┆             │
          │ ORG\_AVATAR\_URL                 ┆ varchar        ┆            ┆             │
          │ ORG\_GRAVATAR\_ID                ┆ varchar        ┆            ┆             │
          │ ORG\_ID                         ┆ decimal(38, 0) ┆            ┆             │
          │ ORG\_LOGIN                      ┆ varchar        ┆            ┆             │
          │ ORG\_URL                        ┆ varchar        ┆            ┆             │
          │ PAYLOAD                        ┆ variant        ┆            ┆             │
          │ PAYLOAD\_ACTION                 ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_DESCRIPTION            ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_COMMENT                ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_MASTER\_BRANCH          ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_PULL\_REQUEST           ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_PUSHER\_TYPE            ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_PUSH\_ID                ┆ decimal(38, 0) ┆            ┆             │
          │ PAYLOAD\_HEAD                   ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_REF                    ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_REF\_TYPE               ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_ISSUE\_ID               ┆ decimal(38, 0) ┆            ┆             │
          │ PAYLOAD\_ISSUE                  ┆ variant        ┆            ┆             │
          │ PUBLIC                         ┆ boolean        ┆            ┆             │
          │ LOAD\_DATE                      ┆ varchar        ┆            ┆             │
          │ CREATED\_AT                     ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_BODY                   ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_COMMIT\_ID              ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_CREATED\_AT             ┆ varchar        ┆            ┆             │
          │ PAYLOAD\_USER\_ID                ┆ decimal(38, 0) ┆            ┆             │
          │ PAYLOAD\_USER\_LOGIN             ┆ varchar        ┆            ┆             │
          │ ISSUE                          ┆ variant        ┆            ┆             │
          │ ISSUE\_ACTIVE\_LOCK\_REASON       ┆ varchar        ┆            ┆             │
          │ ISSUE\_ASSIGNEE                 ┆ varchar        ┆            ┆             │
          │ ISSUE\_ASSIGNEES                ┆ variant        ┆            ┆             │
          │ ISSUE\_AUTHOR\_ASSOCIATION       ┆ varchar        ┆            ┆             │
          │ ISSUE\_BODY                     ┆ varchar        ┆            ┆             │
          │ ISSUE\_CLOSED\_AT                ┆ varchar        ┆            ┆             │
          │ ISSUE\_COMMENTS                 ┆ decimal(38, 0) ┆            ┆             │
          │ ISSUE\_COMMENTS\_URL             ┆ varchar        ┆            ┆             │
          │ ISSUE\_CREATED\_AT               ┆ varchar        ┆            ┆             │
          │ ISSUE\_DRAFT                    ┆ boolean        ┆            ┆             │
          │ ISSUE\_EVENTS\_URL               ┆ varchar        ┆            ┆             │
          │ ISSUE\_HTML\_URL                 ┆ varchar        ┆            ┆             │
          │ ISSUE\_ID                       ┆ decimal(38, 0) ┆            ┆             │
          │ ISSUE\_LABELS                   ┆ variant        ┆            ┆             │
          │ ISSUE\_LABELS\_URL               ┆ varchar        ┆            ┆             │
          │ ISSUE\_LOCKED                   ┆ boolean        ┆            ┆             │
          │ ISSUE\_MILESTONE                ┆ varchar        ┆            ┆             │
          │ ISSUE\_NODE\_ID                  ┆ varchar        ┆            ┆             │
          │ ISSUE\_NUMBER                   ┆ decimal(38, 0) ┆            ┆             │
          │ ISSUE\_PERFORMED\_VIA\_GITHUB\_APP ┆ varchar        ┆            ┆             │
          │ ISSUE\_PULL\_REQUEST             ┆ variant        ┆            ┆             │
          │ ISSUE\_REACTIONS                ┆ variant        ┆            ┆             │
          │ ISSUE\_REPOSITORY\_URL           ┆ varchar        ┆            ┆             │
          │ ISSUE\_STATE                    ┆ varchar        ┆            ┆             │
          │ ISSUE\_TIMELINE\_URL             ┆ varchar        ┆            ┆             │
          │ ISSUE\_TITLE                    ┆ varchar        ┆            ┆             │
          │ ISSUE\_UPDATED\_AT               ┆ varchar        ┆            ┆             │
          │ ISSUE\_URL                      ┆ varchar        ┆            ┆             │
          │ ISSUE\_USER\_ID                  ┆ decimal(38, 0) ┆            ┆             │
          │ ISSUE\_USER\_LOGIN               ┆ varchar        ┆            ┆             │
          │ ISSUE\_USER\_TYPE                ┆ varchar        ┆            ┆             │
          │ LANGUAGE                       ┆ varchar        ┆            ┆             │
          └────────────────────────────────┴────────────────┴────────────┴─────────────┘
        </code>
      </pre>
    </div>

    Before running our new incremental model, let's inspect the compiled query output to see what exactly will be run against Snowflake. To do so, open up the file `sdftarget/dbg/materialized/sdf_snowflake/staging/datafusion_push_events.sql` and inspect the SQL query.

    It should show this:

    ```sql sdftarget/dbg/materialized/sdf_snowflake/staging/datafusion_push_events.sql theme={null}
    create or replace table sdf_snowflake.staging.datafusion_push_events as (
      SELECT
        *
      FROM
        tech__innovation_essentials.cybersyn.github_events
      WHERE

        created_at_timestamp >= DATEADD(WEEK, -1, CURRENT_TIMESTAMP())

      AND 
        type = 'PushEvent'
      AND
        repo_name = 'apache/datafusion'
    );
    ```

    As you can see, the query was compiled with `builtin.is_incremental_mode` set to `False`. This is because we haven't materialized the table yet.

    <Tip>
      SDF outputs all configurations and settings for the compiled query in comments at the bottom of the file. This is useful for debugging and understanding how SDF is compiling your models.
    </Tip>

    Let's run the model now to materialize the table in non-incremental mode.

    ```shell theme={null}
    sdf run models/sdf_snowflake/staging/datafusion_push_events.sql
    ```

    Nice! The model should have successfully been materialized in Snowflake. Next, we'll try running the model in incremental mode.

    All we need to do is run the model again and SDF will automatically detect that the model has already been materialized and set `builtin.is_incremental_mode` to `True`.

    ```shell theme={null}
    sdf run sdf_snowflake/staging/datafusion_push_events.sql
    ```

    Notice how the model ran much faster this time? That's because SDF only fetched new rows from the `github_events` table that were created after the last materialization.

    Furthermore, you might notice something slightly different in the run output, specifically a line that says `Preloading` like so:

    ```shell theme={null}
    Preloading sdf_snowflake.staging.datafusion_push_events (schema & last_altered)
    ```

    This indicates SDF is preloading the schema and last altered time of the table before running the query. As was previously mentioned, this is to set the `is_incremental_mode` builtin variable.

    Lastly, if you inspect the compiled query output again, you should see `builtin.is_incremental_mode` set to `True` and the query's SQL reflective of that.

    ```sql sdftarget/dbg/materialized/sdf_snowflake/staging/datafusion_push_events.sql theme={null}
    insert into sdf_snowflake.staging.datafusion_push_events
    SELECT
      *
    FROM
      tech__innovation_essentials.cybersyn.github_events
    WHERE

      -- Only fetch rows that are newer than the newest row in the previous materialization of this table
      created_at_timestamp >= (SELECT MAX(created_at_timestamp) FROM sdf_snowflake.staging.datafusion_push_events)

    AND 
      type = 'PushEvent'
    AND
      repo_name = 'apache/datafusion'
    ;
    ```
  </Step>

  <Step title="Utilize the Merge Incremental Strategy">
    Now that we've successfully materialized an incremental model with the `append` strategy, let's explore the `merge` strategy. The `merge` strategy is useful when you want to update existing rows in the target table with new data from the source table.

    <Note>
      For a full reference of all supported merge strategies, see the [SDF Reference](/reference/sdf-yml#nested-element-incrementaloptions) section.
    </Note>

    Let's say we now want to track a running count of the total push events per each unique contributor to the DataFusion repo. Since we want to update the row corresponding to the contributor, or create a new row if the contributor is new, the `merge` strategy is a perfect fit for this use case.

    Let's start by adding a new SQL file to our workspace called `top_datafusion_contributors.sql` in the directory `models/sdf_snowflake/cybersyn_tech_innovation/` with the following SQL:

    ```sql models/sdf_snowflake/cybersyn_tech_innovation/top_datafusion_contributors.sql theme={null}
      SELECT 
        actor_id,
        actor_display_login,
        COUNT(DISTINCT ID) as contribution_count
      FROM sdf_snowflake.staging.datafusion_push_events
      GROUP BY 1,2
      ORDER BY contribution_count DESC;
    ```

    Next, let's update the `workspace.sdf.yml` to set materialization to `incremental-table` for this model.

    ```yml workspace.sdf.yml theme={null}
    ---
    table:
      name: sdf_snowflake.cybersyn_tech_innovation.top_datafusion_contributors
      materialization: incremental-table
      incremental-options: 
        strategy: merge
        unique-key: actor_id
        merge-update-columns:
          - contribution_count
    ```

    In this YML, we've set the `incremental-options` to use the `merge` strategy. We've also specified the `unique-key` as `actor_id` and the `merge-update-columns` as `contribution_count`.
    This tells SDF to update the `contribution_count` column in the target table with the new count from the source table when it finds two rows that match on their `actor_id`.

    Let's compile the workspace with this new model:

    ```shell theme={null}
    sdf compile models/sdf_snowflake/cybersyn_tech_innovation/top_datafusion_contributors.sql
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 5 model files, 1 .sdf file
          Downloading SDF\_SNOWFLAKE.STAGING.GITHUB\_PUSH\_EVENTS (exists\_remotely)
          Downloading SDF\_SNOWFLAKE.CYBERSYN\_TECH\_INNOVATION.TOP\_DATAFUSION\_CONTRIBUTORS (exists\_remotely)
          Downloading TECH\_\_INNOVATION\_ESSENTIALS.CYBERSYN.GITHUB\_EVENTS (schema)
            Compiling sdf\_snowflake.staging.datafusion\_push\_events (./models/sdf\_snowflake/staging/datafusion\_push\_events.sql)
            Compiling sdf\_snowflake.cybersyn\_tech\_innovation.top\_datafusion\_contributors (./models/sdf\_snowflake/cybersyn\_tech\_innovation/top\_datafusion\_contributors.sql)
             Finished 3 models \[3 succeeded] in 2.466 secs

          Schema sdf\_snowflake.cybersyn\_tech\_innovation.TOP\_DATAFUSION\_CONTRIBUTORS
          ┌─────────────────────┬────────────────┬────────────┬─────────────┐
          │ column\_name         ┆ data\_type      ┆ classifier ┆ description │
          ╞═════════════════════╪════════════════╪════════════╪═════════════╡
          │ ACTOR\_ID            ┆ decimal(38, 0) ┆            ┆             │
          │ ACTOR\_DISPLAY\_LOGIN ┆ varchar        ┆            ┆             │
          │ CONTRIBUTION\_COUNT  ┆ decimal(38, 0) ┆            ┆             │
          └─────────────────────┴────────────────┴────────────┴─────────────┘
        </code>
      </pre>
    </div>

    Before running the model, let's inspect the compiled query output to see what exactly will be run against Snowflake. To do so, open up the file `sdftarget/dbg/materialized/sdf_snowflake/cybersyn_tech_innovation/top_datafusion_contributors.sql` and inspect the SQL query.

    It should show this:

    ```sql sdftarget/dbg/materialized/sdf_snowflake/cybersyn_tech_innovation/top_datafusion_contributors.sql theme={null}
    create or replace table sdf_snowflake.cybersyn_tech_innovation.top_datafusion_contributors as (
      SELECT 
        actor_id,
        actor_display_login,
        COUNT(DISTINCT ID) as contribution_count
      from sdf_snowflake.staging.datafusion_push_events
      GROUP BY 1,2
      ORDER BY contribution_count DESC
    );
    ```

    Since this table hasn't materialized yet, `builtin.is_incremental_mode` is set to `False` and the table is simply being created. Let's run the model now to materialize the table in non-incremental mode.

    ```shell theme={null}
    sdf run models/sdf_snowflake/cybersyn_tech_innovation/top_datafusion_contributors.sql
    ```

    The model should have successfully been materialized in Snowflake. Next, let's try running the model in incremental mode. Like before, we'll just run the model again and SDF will automatically detect that the model has already been materialized and set `builtin.is_incremental_mode` to `True`.

    ```shell theme={null}
    sdf run models/sdf_snowflake/cybersyn_tech_innovation/top_datafusion_contributors.sql
    ```

    Let's inspect the compiled query output again to see the SQL query that was run. You should see `builtin.is_incremental_mode` set to `True` and the query's SQL reflective of that.

    ```sql sdftarget/dbg/materialized/sdf_snowflake/cybersyn_tech_innovation/top_datafusion_contributors.sql theme={null}
    merge into sdf_snowflake.cybersyn_tech_innovation.top_datafusion_contributors as SDF_DEST
        using (
    SELECT 
      actor_id,
      actor_display_login,
      COUNT(DISTINCT ID) as contribution_count
    from sdf_snowflake.staging.datafusion_push_events
    GROUP BY 1,2
    ORDER BY contribution_count DESC
    ) as SDF_SRC
            on (
                    SDF_SRC.actor_id = SDF_DEST.actor_id
                )
        when matched then update set SDF_DEST.contribution_count = SDF_SRC.contribution_count
        when not matched then insert ("ACTOR_ID", "ACTOR_DISPLAY_LOGIN", "CONTRIBUTION_COUNT")
        values ("ACTOR_ID", "ACTOR_DISPLAY_LOGIN", "CONTRIBUTION_COUNT");
    ```

    <Note>
      If the file isn't updated, try reopening it. Sometimes, because of VSCode's cache, the file appears to be unchanged.
    </Note>

    As you can see, the query was compiled with `builtin.is_incremental_mode` set to `True` and, as a byproduct, the SQL query is a `merge` statement that updates the `contribution_count` column in the target table with the new count from the source table.

    <Tip>
      The `merge` strategy is a powerful tool for updating existing rows in the target table with new data from the source table. It's especially useful when you want to keep a running count of events or other metrics.
    </Tip>

    For a full list of our supported incremental options and strategies, see the [SDF Reference](/reference/sdf-yml#nested-element-incrementaloptions) section.
  </Step>
</Steps>
