Overview

This guide introduces the incremental materialization of models in BigQuery using SDF. We’ll build on the basic concepts of materializing tables and views, focusing on incremental updates to optimize performance and resource usage.

Prerequisites

This guide should be followed after completing the Getting Started with BigQuery and SDF guide and the Basic Materialization with BigQuery guide.

You’ll also need:

  • A GCP account & project with billing enabled and with this BigQuery dataset installed.
  • Valid BigQuery username / password credentials with write access to at least one database we can materialize tables to.
  • Instantiated credentials completed in the previous guide.

Guide

1

Create a New SDF Project from the SDF BigQuery Incremental Example

Create a new SDF project using the NYC TLC Trips public dataset. This will create a new project in your current working directory with the sample project files.

sdf new --sample bigquery_incremental
2

Replace Project ID with Your Project ID

Next, let’s replace the project ID in the workspace.sdf.yml file with your own project ID. This is the project ID we’ll read table metadata from and materialize tables to.

On line 14 of the workspace.sdf.yml, replace <REPLACE_WITH_BQ_PROJECT> with the project ID from your BigQuery account.

The catalog in SDF is equivalent to project ID in BigQuery.

3

Compile to Test Credentials

Now that we’ve replaced the project ID, let’s ensure your credentials are working and have read access to the public BigQuery datasets used in this example. We can do this by compiling one of the models.

sdf compile compile last_hn_timestamp

Working set 2 model files, 1 .sdf file Downloading “bigquery-public-data”.hacker_news.”full” (schema)   Compiling integration-test-sdf.pub.last_hn_timestamp (./models/last_hn_timestamp.sql)    Finished 2 models [2 succeeded] in 1.715 secs

Schema “integration-test-sdf”.pub.last_hn_timestamp ┌─────────────┬───────────┬────────────┬─────────────┐ │ column_name ┆ data_type ┆ classifier ┆ description │ ╞═════════════╪═══════════╪════════════╪═════════════╡ │ ts          ┆ timestamp ┆            ┆             │ └─────────────┴───────────┴────────────┴─────────────┘

If you do not see a successful compilation, please ensure you’ve followed the Getting Started with BigQuery and SDF guide to authenticate to your BigQuery.

4

Materialize an Increment Model

Now that we’ve confirmed our credentials are working, let’s materialize an incremental model. Our example workspace contains a table called popular_articles.sql which filters to hacker news articles with a score over 100. We’ll use this table to demonstrate incremental materialization with the append merge strategy.

Since many hacker news articles are frequently being added, this is a great candidate for an incremental model. There is not need to re-scan all models with a score over 100. We can use incremental materialization to only query articles published later than the last time we queried. This will significantly save on compute and optimize our pipeline. Since we don’t care about updates to existing rows, we can use the append merge strategy. This strategy appends new rows to the target table without updating existing rows.

Let’s investigate the SQL file popular_articles.sql in the models/ directory:

models/popular_articles.sql
SELECT
  *
FROM
  bigquery-public-data.hacker_news.`full`
WHERE
{% if builtin.is_incremental_mode %}
  -- Only fetch rows that are newer than the newest row in the previous materialization of this table
  `timestamp` >= (SELECT MAX(`timestamp`) FROM popular_articles)
{% else %}
  `timestamp` >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
{% endif %}
  AND title IS NOT NULL
  AND (dead IS NULL OR dead = FALSE)
  AND score > 100

Let’s unpack a few things here:

  • This relatively simple query fetches events and filters by their title, dead attribute, and score.
  • The {% if builtin.is_incremental_mode %} block is a Jinja conditional that checks if the model is being materialized incrementally. If it is, we only fetch rows that are newer than the newest row in the previous materialization of this table. If not, we fetch rows from the last week.

In a production scenario, you would likely want to fetch events from all time for non-incremental mode run as this would be a full refresh of the data. We are adding the week filter to prevent any major compute costs in this guide.

You might be wondering, how does SDF know when to set builtin.is_incremental_mode to True? SDF sets this variable to True when the model has already been materialized in the cloud.

Before running this model, we’ll need need to tell SDF to overwrite the default materialization for this table. We can do this by adding the following to the workspace.sdf.yml file:

workspace.sdf.yml
---
table:
  name: popular_articles
  materialization: incremental-table  

SDF defaults the incremental strategy to append, as such we don’t need to specify it in this YML (we’ll explore this later in the guide). Now, let’s first compile our workspace with this new model:

sdf compile popular_articles

Working set 2 model files, 1 .sdf file Downloading “integration-test-sdf”.pub.popular_articles (exists_remotely) Downloading “bigquery-public-data”.hacker_news.”full” (schema)   Compiling integration-test-sdf.pub.popular_articles (./models/popular_articles.sql)    Finished 2 models [2 succeeded] in 2.299 secs

Schema “integration-test-sdf”.pub.popular_articles ┌─────────────┬───────────┬────────────┬─────────────┐ │ column_name ┆ data_type ┆ classifier ┆ description │ ╞═════════════╪═══════════╪════════════╪═════════════╡ │ title       ┆ string    ┆            ┆             │ │ url         ┆ string    ┆            ┆             │ │ text        ┆ string    ┆            ┆             │ │ dead        ┆ boolean   ┆            ┆             │ │ by          ┆ string    ┆            ┆             │ │ score       ┆ integer   ┆            ┆             │ │ time        ┆ integer   ┆            ┆             │ │ timestamp   ┆ timestamp ┆            ┆             │ │ type        ┆ string    ┆            ┆             │ │ id          ┆ integer   ┆            ┆             │ │ parent      ┆ integer   ┆            ┆             │ │ descendants ┆ integer   ┆            ┆             │ │ ranking     ┆ integer   ┆            ┆             │ │ deleted     ┆ boolean   ┆            ┆             │ └─────────────┴───────────┴────────────┴─────────────┘

Before running our new incremental model, let’s inspect the compiled query output to see what exactly will be run against BigQuery. To do so, open up the file sdftarget/dbg/materialized/<your-project-id>/pub/popular_articles.sql and inspect the SQL query.

It should show this:

sdftarget/dbg/materialized/<your-project-id>/pub/popular_articles.sql
  create or replace table `integration-test-sdf`.sdf_ecommerce_example.popular_articles as (SELECT
    *
  FROM
    `bigquery-public-data`.hacker_news.`full`
  WHERE

    `timestamp` >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)

    AND title IS NOT NULL
    AND (dead IS NULL OR dead = FALSE)
    AND score > 100);

As you can see, the query was compiled with builtin.is_incremental_mode set to False. This is because we haven’t materialized the table yet.

Let’s run the model now to materialize the table in non-incremental mode.

sdf run models/popular_articles.sql

Nice! The model should have successfully been materialized in BigQuery. Next, we’ll try running the model in incremental mode.

All we need to do is run the model again and SDF will automatically detect that the model has already been materialized and set builtin.is_incremental_mode to True.

sdf run models/popular_articles.sql

Working set 2 model files, 1 .sdf file Downloading “integration-test-sdf”.pub.popular_articles (exists_remotely) Downloading “bigquery-public-data”.hacker_news.”full” (schema)     Running integration-test-sdf.pub.popular_articles (./models/popular_articles.sql)    Finished 2 models [2 succeeded] in 5.704 secs

Table “integration-test-sdf”.pub.popular_articles ┌──────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────┬──────┬──────┬────────────────┬───────┬────────────┬─────────────────────┬───────┬──────────┬────────┬─────────────┬─────────┬─────────┐ │ title                                                                ┆ url                                                                                  ┆ text ┆ dead ┆ by             ┆ score ┆ time       ┆ timestamp           ┆ type  ┆ id       ┆ parent ┆ descendants ┆ ranking ┆ deleted │ ╞══════════════════════════════════════════════════════════════════════╪══════════════════════════════════════════════════════════════════════════════════════╪══════╪══════╪════════════════╪═══════╪════════════╪═════════════════════╪═══════╪══════════╪════════╪═════════════╪═════════╪═════════╡ │ “Eat What You Kill”                                                  ┆ https://www.propublica.org/article/thomas&#45;weiner&#45;montana&#45;st&#45;peters&#45;hospital&#45;oncology&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;ivanech&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;108&nbsp;&nbsp;&nbsp;┆&nbsp;1733604426&nbsp;┆&nbsp;2024&#45;12&#45;07T20:47:06&nbsp;┆&nbsp;story&nbsp;┆&nbsp;42352756&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;22&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│ │ “Hetzner decided to cancel our account and terminate all servers”    ┆ https://mastodon.social/@kiwix/113622081750449356&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;unbelauscht&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;383&nbsp;&nbsp;&nbsp;┆&nbsp;1733744925&nbsp;┆&nbsp;2024&#45;12&#45;09T11:48:45&nbsp;┆&nbsp;story&nbsp;┆&nbsp;42365295&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;277&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│ │ “Nvidia is so far ahead that all the 4090s are nerfed to half speed” ┆ https://twitter.com/realGeorgeHotz/status/1868356459542770087&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;BIackSwan&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;185&nbsp;&nbsp;&nbsp;┆&nbsp;1734349524&nbsp;┆&nbsp;2024&#45;12&#45;16T11:45:24&nbsp;┆&nbsp;story&nbsp;┆&nbsp;42430184&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;159&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│ │ “Rules” that terminal programs follow                                ┆ https://jvns.ca/blog/2024/11/26/terminal&#45;rules/&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;charlieok&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;174&nbsp;&nbsp;&nbsp;┆&nbsp;1734023365&nbsp;┆&nbsp;2024&#45;12&#45;12T17:09:25&nbsp;┆&nbsp;story&nbsp;┆&nbsp;42401011&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;94&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│ │ “This is not a joke, Funko just called my mom”                       ┆ https://twitter.com/itchio/status/1866239798924763227&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;haunter&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;555&nbsp;&nbsp;&nbsp;┆&nbsp;1733785004&nbsp;┆&nbsp;2024&#45;12&#45;09T22:56:44&nbsp;┆&nbsp;story&nbsp;┆&nbsp;42371481&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;140&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│ │ ‘Brain rot’ named Oxford Word of the Year 2024                       ┆ https://corp.oup.com/news/brain&#45;rot&#45;named&#45;oxford&#45;word&#45;of&#45;the&#45;year&#45;2024/&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;ChrisArchitect&nbsp;┆&nbsp;127&nbsp;&nbsp;&nbsp;┆&nbsp;1733103573&nbsp;┆&nbsp;2024&#45;12&#45;02T01:39:33&nbsp;┆&nbsp;story&nbsp;┆&nbsp;42292294&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;112&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│ │ ‘Brain rot’ named Oxford Word of the Year 2024                       ┆ https://corp.oup.com/news/brain&#45;rot&#45;named&#45;oxford&#45;word&#45;of&#45;the&#45;year&#45;2024/&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;ChrisArchitect&nbsp;┆&nbsp;127&nbsp;&nbsp;&nbsp;┆&nbsp;1733103573&nbsp;┆&nbsp;2024&#45;12&#45;02T01:39:33&nbsp;┆&nbsp;story&nbsp;┆&nbsp;42292294&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;112&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│ │ ‘Brain rot’ named Oxford Word of the Year 2024                       ┆ https://corp.oup.com/news/brain&#45;rot&#45;named&#45;oxford&#45;word&#45;of&#45;the&#45;year&#45;2024/&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;ChrisArchitect&nbsp;┆&nbsp;127&nbsp;&nbsp;&nbsp;┆&nbsp;1733103573&nbsp;┆&nbsp;2024&#45;12&#45;02T01:39:33&nbsp;┆&nbsp;story&nbsp;┆&nbsp;42292294&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;112&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│ │ ‘Brain rot’ named Oxford Word of the Year 2024                       ┆ https://corp.oup.com/news/brain&#45;rot&#45;named&#45;oxford&#45;word&#45;of&#45;the&#45;year&#45;2024/&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;ChrisArchitect&nbsp;┆&nbsp;127&nbsp;&nbsp;&nbsp;┆&nbsp;1733103573&nbsp;┆&nbsp;2024&#45;12&#45;02T01:39:33&nbsp;┆&nbsp;story&nbsp;┆&nbsp;42292294&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;112&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│ │ ‘Brain rot’ named Oxford Word of the Year 2024                       ┆ https://corp.oup.com/news/brain&#45;rot&#45;named&#45;oxford&#45;word&#45;of&#45;the&#45;year&#45;2024/&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;ChrisArchitect&nbsp;┆&nbsp;127&nbsp;&nbsp;&nbsp;┆&nbsp;1733103573&nbsp;┆&nbsp;2024&#45;12&#45;02T01:39:33&nbsp;┆&nbsp;story&nbsp;┆&nbsp;42292294&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;112&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┆&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│ └──────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────┴──────┴──────┴────────────────┴───────┴────────────┴─────────────────────┴───────┴──────────┴────────┴─────────────┴─────────┴─────────┘ 10 rows.

Notice how the model ran much faster this time? That’s because SDF only fetched new rows from the customers_over_100 table that were created after the last materialization.

Furthermore, you might notice something slightly different in the run output, specifically a line that says Preloading like so:

Downloading <your-project-id>.pub.popular_articles (exists_remotely)

This indicates SDF is preloading the schema and last altered time of the table before running the query. As was previously mentioned, this is to set the is_incremental_mode builtin variable.

Lastly, if you inspect the compiled query output again, you should see builtin.is_incremental_mode set to True and the query’s SQL reflective of that.

sdftarget/dbg/materialized/sdf_query/staging/customer_spend_last_month.sql
insert into `integration-test-sdf`.pub.popular_articles (title, url, text, dead, `by`, score, time, timestamp, type, id, parent, descendants, ranking, deleted) SELECT
  *
FROM
  `bigquery-public-data`.hacker_news.`full`
WHERE

  -- Only fetch rows that are newer than the newest row in the previous materialization of this table
  `timestamp` >= (SELECT MAX(`timestamp`) FROM `integration-test-sdf`.pub.popular_articles)

  AND title IS NOT NULL
  AND (dead IS NULL OR dead = FALSE)
  AND score > 100;
5

Utilize the Merge Incremental Strategy

Now that we’ve successfully materialized an incremental model with the append strategy, let’s explore the merge strategy. The merge strategy is useful when you want to update existing rows in the target table with new data from the source table.

For a full reference of all supported merge strategies, see the SDF Reference section.

Let’s say we now want to make sure new details about articles, that may have been updated after they were published, are reflected in the previously added rows of our incremental model popular_articles. Since we want to update existing rows, and not only add new ones like we did with append, we’ll need to use the merge strategy.

Let’s update the workspace.sdf.yml to set the materialization strategy to merge for this model.

workspace.sdf.yml
---
table:
  name: popular_articles
  materialization: incremental-table 
  incremental-options: 
    strategy: merge
    unique-key: id
    merge-update-columns:
      - dead
      - title
      - url
      - text

In this YML, we’ve set the incremental-options to use the merge strategy. We’ve also specified the unique-key as id and the merge-update-columns as four columns: dead, title, url, and text. This tells SDF to update these four columns in the target table with the new data from the source table when it finds two articles that match on their id.

Let’s compile the workspace with this new model:

sdf compile popular_articles

Working set 2 model files, 1 .sdf file Downloading “integration-test-sdf”.pub.popular_articles (exists_remotely) Downloading “bigquery-public-data”.hacker_news.”full” (schema)   Compiling integration-test-sdf.pub.popular_articles (./models/popular_articles.sql)    Finished 2 models [2 succeeded] in 2.377 secs

Schema “integration-test-sdf”.pub.popular_articles ┌─────────────┬───────────┬────────────┬─────────────┐ │ column_name ┆ data_type ┆ classifier ┆ description │ ╞═════════════╪═══════════╪════════════╪═════════════╡ │ title       ┆ string    ┆            ┆             │ │ url         ┆ string    ┆            ┆             │ │ text        ┆ string    ┆            ┆             │ │ dead        ┆ boolean   ┆            ┆             │ │ by          ┆ string    ┆            ┆             │ │ score       ┆ integer   ┆            ┆             │ │ time        ┆ integer   ┆            ┆             │ │ timestamp   ┆ timestamp ┆            ┆             │ │ type        ┆ string    ┆            ┆             │ │ id          ┆ integer   ┆            ┆             │ │ parent      ┆ integer   ┆            ┆             │ │ descendants ┆ integer   ┆            ┆             │ │ ranking     ┆ integer   ┆            ┆             │ │ deleted     ┆ boolean   ┆            ┆             │ └─────────────┴───────────┴────────────┴─────────────┘

Before running the model, let’s inspect the compiled query output to see what exactly will be run against BigQuery. To do so, open up the file sdftarget/dbg/materialized/<your-project-id>/pub/popular_articles.sql and inspect the SQL query.

It should show this:

sdftarget/dbg/materialized/sdf_bigquery/bigquery_starter/top_customer_contributors.sql
merge into `integration-test-sdf`.pub.popular_articles as SDF_DEST
using (SELECT
  *
FROM
  `bigquery-public-data`.hacker_news.`full`
WHERE

  -- Only fetch rows that are newer than the newest row in the previous materialization of this table
  `timestamp` >= (SELECT MAX(`timestamp`) FROM `integration-test-sdf`.pub.popular_articles)

  AND title IS NOT NULL
  AND (dead IS NULL OR dead = FALSE)
  AND score > 100) as SDF_SRC
        on SDF_SRC.id = SDF_DEST.id
        when matched then update set SDF_DEST.dead = SDF_SRC.dead, SDF_DEST.title = SDF_SRC.title, SDF_DEST.url = SDF_SRC.url, SDF_DEST.text = SDF_SRC.text
        when not matched then insert (title, url, text, dead, `by`, score, time, timestamp, type, id, parent, descendants, ranking, deleted)
        values (title, url, text, dead, `by`, score, time, timestamp, type, id, parent, descendants, ranking, deleted);

If the file isn’t updated, try reopening it. Sometimes, because of VSCode’s cache, the file appears to be unchanged.

As you can see, the query was compiled with builtin.is_incremental_mode set to True and, as a byproduct, the SQL query is a merge statement that updates four columns in the target table with the new values from the source table.

Let’s run the model now to materialize the table in incremental mode with the merge strategy.

sdf run models/sdf_bigquery/bigquery_starter/top_customer_contributors.sql

The model should have successfully been materialized in BigQuery using the merge statement.

The merge strategy is a powerful tool for updating existing rows in the target table with new data from the source table. It’s especially useful when you want to keep a running count of events or other metrics.

For a full list of our supported incremental options and strategies, see the SDF Reference section.