Materialize incremental models in BigQuery to save time and compute.
Create a New SDF Project from the SDF BigQuery Incremental Example
Replace Project ID with Your Project ID
workspace.sdf.yml
file with your own project ID. This is the project ID we’ll read table metadata from and materialize tables to.On line 14 of the workspace.sdf.yml
, replace <REPLACE_WITH_BQ_PROJECT>
with the project ID from your BigQuery account.catalog
in SDF is equivalent to project ID in BigQuery.Compile to Test Credentials
Working set 2 model files, 1 .sdf file
Downloading “bigquery-public-data”.hacker_news.”full” (schema)
Compiling integration-test-sdf.pub.last_hn_timestamp (./models/last_hn_timestamp.sql)
Finished 2 models [2 succeeded] in 2.932 secsSchema “integration-test-sdf”.pub.last_hn_timestamp
┌─────────────┬───────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪═══════════╪════════════╪═════════════╡
│ ts ┆ timestamp ┆ ┆ │
└─────────────┴───────────┴────────────┴─────────────┘
Materialize an Increment Model
popular_articles.sql
which filters to hacker news articles with a score over 100.
We’ll use this table to demonstrate incremental materialization with the append
merge strategy.Since many hacker news articles are frequently being added, this is a great candidate for an incremental model. There is not need to re-scan all models with a score over 100. We can use incremental materialization to only query articles published later than the last time we queried. This will significantly save on compute and optimize our pipeline.
Since we don’t care about updates to existing rows, we can use the append
merge strategy. This strategy appends new rows to the target table without updating existing rows.Let’s investigate the SQL file popular_articles.sql
in the models/
directory:title
, dead
attribute, and score
.{% if builtin.is_incremental_mode %}
block is a Jinja conditional that checks if the model is being materialized incrementally. If it is, we only fetch rows that are newer than the newest row in the previous materialization of this table. If not, we fetch rows from the last week.builtin.is_incremental_mode
to True
? SDF sets this variable to True
when the model has already been materialized in the cloud.Before running this model, we’ll need need to tell SDF to overwrite the default materialization for this table. We can do this by adding the following to the workspace.sdf.yml
file:append
, as such we don’t need to specify it in this YML (we’ll explore this later in the guide).
Now, let’s first compile our workspace with this new model:Working set 2 model files, 1 .sdf file
Downloading “integration-test-sdf”.pub.popular_articles (exists_remotely)
Downloading “bigquery-public-data”.hacker_news.”full” (schema)
Compiling integration-test-sdf.pub.popular_articles (./models/popular_articles.sql)
Finished 2 models [2 succeeded] in 3.602 secsSchema “integration-test-sdf”.pub.popular_articles
┌─────────────┬───────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪═══════════╪════════════╪═════════════╡
│ title ┆ string ┆ ┆ │
│ url ┆ string ┆ ┆ │
│ text ┆ string ┆ ┆ │
│ dead ┆ boolean ┆ ┆ │
│ by ┆ string ┆ ┆ │
│ score ┆ integer ┆ ┆ │
│ time ┆ integer ┆ ┆ │
│ timestamp ┆ timestamp ┆ ┆ │
│ type ┆ string ┆ ┆ │
│ id ┆ integer ┆ ┆ │
│ parent ┆ integer ┆ ┆ │
│ descendants ┆ integer ┆ ┆ │
│ ranking ┆ integer ┆ ┆ │
│ deleted ┆ boolean ┆ ┆ │
└─────────────┴───────────┴────────────┴─────────────┘
sdftarget/dbg/materialized/<your-project-id>/pub/popular_articles.sql
and inspect the SQL query.It should show this:builtin.is_incremental_mode
set to False
. This is because we haven’t materialized the table yet.Let’s run the model now to materialize the table in non-incremental mode.builtin.is_incremental_mode
to True
.Working set 2 model files, 1 .sdf file
Downloading “integration-test-sdf”.pub.popular_articles (exists_remotely)
Downloading “bigquery-public-data”.hacker_news.”full” (schema)
Running integration-test-sdf.pub.popular_articles (./models/popular_articles.sql)
Finished 2 models [2 succeeded] in 7.467 secsTable “integration-test-sdf”.pub.popular_articles
┌──────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬──────┬──────┬────────────────┬───────┬────────────┬─────────────────────┬───────┬──────────┬────────┬─────────────┬─────────┬─────────┐
│ title ┆ url ┆ text ┆ dead ┆ by ┆ score ┆ time ┆ timestamp ┆ type ┆ id ┆ parent ┆ descendants ┆ ranking ┆ deleted │
╞══════════════════════════════════════════════════════════════════════╪═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╪══════╪══════╪════════════════╪═══════╪════════════╪═════════════════════╪═══════╪══════════╪════════╪═════════════╪═════════╪═════════╡
│ “A Course of Pure Mathematics” – G. H. Hardy (1921) [pdf] ┆ https://www.gutenberg.org/files/38769/38769-pdf.pdf ┆ ┆ ┆ bikenaga ┆ 191 ┆ 1735593652 ┆ 2024-12-30T21:20:52 ┆ story ┆ 42553682 ┆ ┆ 39 ┆ ┆ │
│ “Eat What You Kill” ┆ https://www.propublica.org/article/thomas-weiner-montana-st-peters-hospital-oncology ┆ ┆ ┆ ivanech ┆ 108 ┆ 1733604426 ┆ 2024-12-07T20:47:06 ┆ story ┆ 42352756 ┆ ┆ 22 ┆ ┆ │
│ “Hetzner decided to cancel our account and terminate all servers” ┆ https://mastodon.social/@kiwix/113622081750449356 ┆ ┆ ┆ unbelauscht ┆ 383 ┆ 1733744925 ┆ 2024-12-09T11:48:45 ┆ story ┆ 42365295 ┆ ┆ 277 ┆ ┆ │
│ “Nvidia is so far ahead that all the 4090s are nerfed to half speed” ┆ https://twitter.com/realGeorgeHotz/status/1868356459542770087 ┆ ┆ ┆ BIackSwan ┆ 185 ┆ 1734349524 ┆ 2024-12-16T11:45:24 ┆ story ┆ 42430184 ┆ ┆ 159 ┆ ┆ │
│ “Rules” that terminal programs follow ┆ https://jvns.ca/blog/2024/11/26/terminal-rules/ ┆ ┆ ┆ charlieok ┆ 174 ┆ 1734023365 ┆ 2024-12-12T17:09:25 ┆ story ┆ 42401011 ┆ ┆ 94 ┆ ┆ │
│ “This is not a joke, Funko just called my mom” ┆ https://twitter.com/itchio/status/1866239798924763227 ┆ ┆ ┆ haunter ┆ 555 ┆ 1733785004 ┆ 2024-12-09T22:56:44 ┆ story ┆ 42371481 ┆ ┆ 140 ┆ ┆ │
│ “Twelfth Night Till Candlemas” – A 40-year book-quest ┆ https://davidallengreen.com/2024/12/twelfth-night-till-candlemas-the-story-of-a-forty-year-book-quest-and-of-its-remarkable-ending/ ┆ ┆ ┆ ColinWright ┆ 181 ┆ 1736442460 ┆ 2025-01-09T17:07:40 ┆ story ┆ 42647633 ┆ ┆ 53 ┆ ┆ │
│ “We’re building a new static type checker for Python” ┆ https://twitter.com/charliermarsh/status/1884651482009477368 ┆ ┆ ┆ shlomo_z ┆ 282 ┆ 1738173411 ┆ 2025-01-29T17:56:51 ┆ story ┆ 42868576 ┆ ┆ 140 ┆ ┆ │
│ ‘Brain rot’ named Oxford Word of the Year 2024 ┆ https://corp.oup.com/news/brain-rot-named-oxford-word-of-the-year-2024/ ┆ ┆ ┆ ChrisArchitect ┆ 127 ┆ 1733103573 ┆ 2024-12-02T01:39:33 ┆ story ┆ 42292294 ┆ ┆ 112 ┆ ┆ │
│ ‘Brain rot’ named Oxford Word of the Year 2024 ┆ https://corp.oup.com/news/brain-rot-named-oxford-word-of-the-year-2024/ ┆ ┆ ┆ ChrisArchitect ┆ 127 ┆ 1733103573 ┆ 2024-12-02T01:39:33 ┆ story ┆ 42292294 ┆ ┆ 112 ┆ ┆ │
└──────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────┴──────┴────────────────┴───────┴────────────┴─────────────────────┴───────┴──────────┴────────┴─────────────┴─────────┴─────────┘
10 rows.
customers_over_100
table that were created after the last materialization.Furthermore, you might notice something slightly different in the run output, specifically a line that says Preloading
like so:is_incremental_mode
builtin variable.Lastly, if you inspect the compiled query output again, you should see builtin.is_incremental_mode
set to True
and the query’s SQL reflective of that.Utilize the Merge Incremental Strategy
append
strategy, let’s explore the merge
strategy. The merge
strategy is useful when you want to update existing rows in the target table with new data from the source table.popular_articles
.
Since we want to update existing rows, and not only add new ones like we did with append
, we’ll need to use the merge
strategy.Let’s update the workspace.sdf.yml
to set the materialization strategy to merge
for this model.incremental-options
to use the merge
strategy. We’ve also specified the unique-key
as id
and the merge-update-columns
as four columns: dead
, title
, url
, and text
.
This tells SDF to update these four columns in the target table with the new data from the source table when it finds two articles that match on their id
.Let’s compile the workspace with this new model:Working set 2 model files, 1 .sdf file
Downloading “integration-test-sdf”.pub.popular_articles (exists_remotely)
Downloading “bigquery-public-data”.hacker_news.”full” (schema)
Compiling integration-test-sdf.pub.popular_articles (./models/popular_articles.sql)
Finished 2 models [2 succeeded] in 3.114 secsSchema “integration-test-sdf”.pub.popular_articles
┌─────────────┬───────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪═══════════╪════════════╪═════════════╡
│ title ┆ string ┆ ┆ │
│ url ┆ string ┆ ┆ │
│ text ┆ string ┆ ┆ │
│ dead ┆ boolean ┆ ┆ │
│ by ┆ string ┆ ┆ │
│ score ┆ integer ┆ ┆ │
│ time ┆ integer ┆ ┆ │
│ timestamp ┆ timestamp ┆ ┆ │
│ type ┆ string ┆ ┆ │
│ id ┆ integer ┆ ┆ │
│ parent ┆ integer ┆ ┆ │
│ descendants ┆ integer ┆ ┆ │
│ ranking ┆ integer ┆ ┆ │
│ deleted ┆ boolean ┆ ┆ │
└─────────────┴───────────┴────────────┴─────────────┘
sdftarget/dbg/materialized/<your-project-id>/pub/popular_articles.sql
and inspect the SQL query.It should show this:builtin.is_incremental_mode
set to True
and, as a byproduct, the SQL query is a merge
statement that updates four columns in the target table with the new values from the source table.Let’s run the model now to materialize the table in incremental mode with the merge
strategy.merge
strategy is a powerful tool for updating existing rows in the target table with new data from the source table. It’s especially useful when you want to keep a running count of events or other metrics.