Materialize snapshot models in Snowflake.
Create a new SDF Project from the Snapshots Sample
Compile to Test Credentials
workspace.sdf.yml
with the following in order to configure our Snowflake integration:Working set 3 model files, 4 .sdf files
Compiling SNAPSHOTS.PUB.A1 (./models/test1/a1.sql)
Compiling SNAPSHOTS.PUB.B (./models/test1/b.sql)
Finished 2 models [2 succeeded] in 3.921 secsSchema SNAPSHOTS.PUB.B
┌────────────────┬───────────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞════════════════╪═══════════════╪════════════╪═════════════╡
│ id ┆ decimal(1, 0) ┆ ┆ │
│ name ┆ varchar ┆ ┆ │
│ event time ┆ varchar ┆ ┆ │
│ sdf_hash ┆ varchar ┆ ┆ │
│ sdf_updated_at ┆ varchar ┆ ┆ │
│ sdf_valid_from ┆ varchar ┆ ┆ │
│ sdf_valid_to ┆ varchar ┆ ┆ │
└────────────────┴───────────────┴────────────┴─────────────┘
Initialize the snapshot
id
, name
, and "event time"
) each.name
value to VARCHAR
because otherwise Snowflake will infer VARCHAR(4)
for this column, and SDF
currently does not support constrained VARCHAR
types.builtin.is_snapshot_mode
Jinja variable to indicate whether this is the initial
run of the snapshot or not. On the initial run, we use a1
as the input; on the subsequent runs, we will use a2
.The snapshot model must also have an associated YML configuration. In this first example, we use the following one:timestamp
strategy which determines whether a row is up-to-date based
on the value of a pre-determined datetime column. The name of this column must be specified in
the required parameter updated-at
. The second required parameter unique-key
indicates
the unique key of the input data set.Let’s now run the initial step:Working set 3 model files, 4 .sdf files
Running SNAPSHOTS.PUB.A1 (./models/test1/a1.sql)
Running SNAPSHOTS.PUB.B (./models/test1/b.sql)
Finished 2 models [2 succeeded] in 6.913 secsTable SNAPSHOTS.PUB.B
┌────┬──────┬────────────┬──────────────────────────────────┬────────────────┬────────────────┬──────────────┐
│ id ┆ name ┆ event time ┆ sdf_hash ┆ sdf_updated_at ┆ sdf_valid_from ┆ sdf_valid_to │
╞════╪══════╪════════════╪══════════════════════════════════╪════════════════╪════════════════╪══════════════╡
│ 1 ┆ Jack ┆ 2022-01-01 ┆ 199dc952035c90f47a765784178ae2be ┆ 2022-01-01 ┆ 2022-01-01 ┆ │
│ 2 ┆ Bob ┆ 2022-01-01 ┆ fc6ca1611012246ab16ad73bf7cc0804 ┆ 2022-01-01 ┆ 2022-01-01 ┆ │
│ 3 ┆ Jane ┆ 2022-01-01 ┆ 3d9f0a45a46f2c6be345cc37db0ae26e ┆ 2022-01-01 ┆ 2022-01-01 ┆ │
└────┴──────┴────────────┴──────────────────────────────────┴────────────────┴────────────────┴──────────────┘
3 rows.
Take the next snapshot
a2
:"event time"
); one row was deleted; one new row was added; one new column age
was added, and
the type of id
was widened from NUMERIC(1, 0)
to NUMERIC(2, 0)
.Let’s see how this impacts the snapshot table:Working set 3 model files, 4 .sdf files
Downloading SNAPSHOTS.PUB.B (exists_remotely)
Running SNAPSHOTS.PUB.A2 (./models/test1/a2.sql)
Running SNAPSHOTS.PUB.B (./models/test1/b.sql)
Finished 3 models [2 succeeded, 1 reused] in 7.641 secsTable SNAPSHOTS.PUB.B
┌────┬───────┬─────┬────────────┬──────────────────────────────────┬────────────────┬────────────────┬──────────────┐
│ id ┆ name ┆ age ┆ event time ┆ sdf_hash ┆ sdf_updated_at ┆ sdf_valid_from ┆ sdf_valid_to │
╞════╪═══════╪═════╪════════════╪══════════════════════════════════╪════════════════╪════════════════╪══════════════╡
│ 1 ┆ Jacob ┆ 10 ┆ 2022-01-02 ┆ ea571ed0c11bd37ca976d34c4b16f926 ┆ 2022-01-02 ┆ 2022-01-02 ┆ │
│ 44 ┆ Cloe ┆ 30 ┆ 2022-01-02 ┆ c1b374e761a344daa96718c16264b925 ┆ 2022-01-02 ┆ 2022-01-02 ┆ │
│ 1 ┆ Jack ┆ ┆ 2022-01-01 ┆ 199dc952035c90f47a765784178ae2be ┆ 2022-01-01 ┆ 2022-01-01 ┆ 2022-01-02 │
│ 2 ┆ Bob ┆ ┆ 2022-01-01 ┆ fc6ca1611012246ab16ad73bf7cc0804 ┆ 2022-01-01 ┆ 2022-01-01 ┆ │
│ 3 ┆ Jane ┆ ┆ 2022-01-01 ┆ 3d9f0a45a46f2c6be345cc37db0ae26e ┆ 2022-01-01 ┆ 2022-01-01 ┆ │
└────┴───────┴─────┴────────────┴──────────────────────────────────┴────────────────┴────────────────┴──────────────┘
5 rows.
Jake
was successfully changed to Jakob
because it had an updated value in the updated-at
column "event time"
id
and to add the new column age
age
values are set to NULL
for the records that didn’t changedCloe
) was successfully addedBob
) is still considered valid in the snapshot because SDF doesn’t
yet support invalidating on deletes.Strategy `check`
check
snapshot strategy which uses the contents of the specified columns
to determine whether the row has changed or not. We will use the same two versions of the input data a1
and a2
.
Make sure to drop b
from Snowflake before running the commands below:timestamp
strategy, the check
strategy requires unique-key
to be defined, but, instead of updated-at
,
the second required setting is check-cols
, defining the columns used for
change detection. Two options are available: all
(seen in this example), or an explicit list of column names.Let’s see the result after two runs of this snapshot model:Working set 3 model files, 4 .sdf files
Downloading SNAPSHOTS.PUB.B (exists_remotely)
Running SNAPSHOTS.PUB.A2 (./models/test2/a2.sql)
Running SNAPSHOTS.PUB.B (./models/test2/b.sql)
Finished 3 models [2 succeeded, 1 reused] in 6.921 secsTable SNAPSHOTS.PUB.B
┌────┬───────────┬─────┬────────────┬──────────────────────────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┐
│ id ┆ name ┆ age ┆ event time ┆ sdf_hash ┆ sdf_updated_at ┆ sdf_valid_from ┆ sdf_valid_to │
╞════╪═══════════╪═════╪════════════╪══════════════════════════════════╪═════════════════════════╪═════════════════════════╪═════════════════════════╡
│ 1 ┆ Jacob ┆ 10 ┆ 2022-01-02 ┆ 0703576642735ca52b9cc99813872e41 ┆ 2024-06-28T02:11:58.147 ┆ 2024-06-28T02:11:58.147 ┆ │
│ 3 ┆ Mary Jane ┆ 20 ┆ 2022-01-01 ┆ 73e49865acff389b13ff40264c02cdb7 ┆ 2024-06-28T02:11:58.147 ┆ 2024-06-28T02:11:58.147 ┆ │
│ 44 ┆ Cloe ┆ 30 ┆ 2022-01-02 ┆ fbe60e2541ca3191878c98d378106908 ┆ 2024-06-28T02:11:58.147 ┆ 2024-06-28T02:11:58.147 ┆ │
│ 2 ┆ Bob ┆ ┆ 2022-01-01 ┆ 075a864f927091575de99e4ff637bb49 ┆ 2024-06-28T02:11:51.004 ┆ 2024-06-28T02:11:51.004 ┆ │
│ 1 ┆ Jack ┆ ┆ 2022-01-01 ┆ ba90dbb448819f622e98b3f114a362f9 ┆ 2024-06-28T02:11:51.004 ┆ 2024-06-28T02:11:51.004 ┆ 2024-06-28T02:11:58.147 │
│ 3 ┆ Jane ┆ ┆ 2022-01-01 ┆ d7677d7bc22a7eb3572ade9a214652db ┆ 2024-06-28T02:11:51.004 ┆ 2024-06-28T02:11:51.004 ┆ 2024-06-28T02:11:58.147 │
└────┴───────────┴─────┴────────────┴──────────────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┘
6 rows.
timestamp
-based snapshot above except that the change for the Jane
row took effect. This is because SDF compares the content of all columns, not "event time"
, to recognize
there was an update.Finally, let’s look at a check
scenario with explicitly specified check-cols
columns. Make sure to drop
b
again for this example to work."event time"
column should be used in change detection.
Let’s see the result after two runs of this snapshot model:Working set 3 model files, 4 .sdf files
Downloading SNAPSHOTS.PUB.B (exists_remotely)
Running SNAPSHOTS.PUB.A2 (./models/test3/a2.sql)
Running SNAPSHOTS.PUB.B (./models/test3/b.sql)
Finished 3 models [2 succeeded, 1 reused] in 7.444 secsTable SNAPSHOTS.PUB.B
┌────┬───────┬─────┬────────────┬──────────────────────────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┐
│ id ┆ name ┆ age ┆ event time ┆ sdf_hash ┆ sdf_updated_at ┆ sdf_valid_from ┆ sdf_valid_to │
╞════╪═══════╪═════╪════════════╪══════════════════════════════════╪═════════════════════════╪═════════════════════════╪═════════════════════════╡
│ 1 ┆ Jacob ┆ 10 ┆ 2022-01-02 ┆ 1eea253ca4f8514070e9a40df6de6a9b ┆ 2024-06-28T02:12:16.584 ┆ 2024-06-28T02:12:16.584 ┆ │
│ 44 ┆ Cloe ┆ 30 ┆ 2022-01-02 ┆ e7d967e9000fec8b9603f463d42f72c6 ┆ 2024-06-28T02:12:16.584 ┆ 2024-06-28T02:12:16.584 ┆ │
│ 1 ┆ Jack ┆ ┆ 2022-01-01 ┆ df08b6f3f8220bd51a3443bc5c1db4d2 ┆ 2024-06-28T02:12:08.806 ┆ 2024-06-28T02:12:08.806 ┆ 2024-06-28T02:12:16.584 │
│ 2 ┆ Bob ┆ ┆ 2022-01-01 ┆ 667051c30a92f42cbc7ff2cfb80a6e0f ┆ 2024-06-28T02:12:08.806 ┆ 2024-06-28T02:12:08.806 ┆ │
│ 3 ┆ Jane ┆ ┆ 2022-01-01 ┆ b0b9cc51ca0963dcd7706d3ae6edc8ac ┆ 2024-06-28T02:12:08.806 ┆ 2024-06-28T02:12:08.806 ┆ │
└────┴───────┴─────┴────────────┴──────────────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┘
5 rows.
timestamp
-based snapshot because the same "event time"
column is looked at in both cases."event time"
decreased as a result of the change, the timestamp
snapshot would not create an updated
record because it only does so if the updated-at
value increases. Conversely, the check
snapshot would
record an update because it looks for any change in the specified columns.