Overview
In the previous tutorial we learned how to create a new model with SDF. As we were working ondim_marketing_campaigns
, the result of the query just didn’t make sense.
In this tutorial we will see how SDF can simplify complicated debugging scenarios and accelerate
their resolution. It can also provide real-time feedback as you author your queries with unprecedented
speed and accuracy.
More specifically, we will use SDF to:
- Identify the issue upstream with our built-in column-level lineage.
- Run impact analysis after fixing the issue.
- Test some syntax static analysis examples.
Prerequisites
Completion of the previous tutorial.Guide
1
Understand Column Lineage
SDF provides column-level lineage out of the box, all the way up to source.To understand the lineage of the column Now we can pinpoint exactly which models we need to check. Out of the entire warehouse, we only need to check 3 models!
total_num_installs
in our new model, all
we need to do is run:moms_flower_shop.analytics.dim_marketing_campaigns.total_num_installs
│
│ copy
└──────┐
moms_flower_shop.staging.stg_installs_per_campaign.total_num_installs
│
│ mod
└──────┐
moms_flower_shop.staging.app_installs.event_id
│
│ copy
└──────┐
moms_flower_shop.raw.raw_marketing_campaign_events.event_id
moms_flower_shop.staging.inapp_events.event_id
│
│ copy
└──────┐
moms_flower_shop.raw.raw_inapp_events.event_id
To learn more about SDF lineage, visit our lineage guide
2
Identify the Issue Upstream
Let’s check the first upstream model — There is not much going on here. We need to keep moving up.In the upstream model
stg_installs_per_campaign.sql
located under the directory models/staging/
.models/staging/stg_installs_per_campaign.sql
app_installs
defined in models/staging/app_installs.sql
,
let’s zoom in on the FROM
statement:models/staging/app_installs.sql
Aha! Who said that inapp_events and marketing events are the same?
3
Compare to Other Models
You remember another model,
customers
, has correct marketing campaign data.
Let’s check out the lineage for the customers model:moms_flower_shop.staging.customers.campaign_id
│
│ copy
└──────┐
moms_flower_shop.staging.app_installs_v2.campaign_id
│
│ copy
└──────┐
moms_flower_shop.raw.raw_marketing_campaign_events.campaign_id
Aha #2! There’s a V2 version of
app_installs
— app_installs_v2
4
Update the Model
Once we identify the issue, it’s an easy fix.
Update the
FROM
statement of models/staging/stg_installs_per_campaign.sql
, to the use app_installs_v2
:models/staging/stg_installs_per_campaign.sql
5
Analyze
Let’s analyze the impact.First, we should make sure our fix compiles:We expect to see a successful run:Next, we can look at all the tables and columns this change will impact. If we add the
flag It should look like this:Only our new model will be impacted.
Working set 12 model files, 15 .sdf files
Compiling moms_flower_shop.staging.stg_installs_per_campaign (./models/staging/stg_installs_per_campaign.sql)
Compiling moms_flower_shop.analytics.dim_marketing_campaigns (./models/analytics/dim_marketing_campaigns.sql)
Finished 11 models [2 succeeded, 9 reused] in 0.891 secs
--forward
to the lineage command, we will see the lineage of each column of
stg_installs_per_campaign
and how it is used by other models.To do that, add a forward
tag to your lineage command:Table: moms_flower_shop.staging.stg_installs_per_campaignmoms_flower_shop.staging.stg_installs_per_campaign.campaign_idmoms_flower_shop.staging.stg_installs_per_campaign.total_num_installs
│
│ copy
├──────┐
│ moms_flower_shop.analytics.dim_marketing_campaigns.total_num_installs
│ mod
└──────┐
moms_flower_shop.analytics.dim_marketing_campaigns.avg_customer_acquisition_cost
moms_flower_shop.analytics.dim_marketing_campaigns.install_duration_ratio
6
Fix it!
To materialize the fix, all we need to do is run the queries:Perfect! We are all set!
Working set 12 model files, 15 .sdf files
Running moms_flower_shop.raw.raw_marketing_campaign_events (./models/raw/raw_marketing_campaign_events.sql)
Running moms_flower_shop.raw.raw_inapp_events (./models/raw/raw_inapp_events.sql)
Running moms_flower_shop.staging.inapp_events (./models/staging/inapp_events.sql)
Running moms_flower_shop.staging.marketing_campaigns (./models/staging/marketing_campaigns.sql)
Running moms_flower_shop.staging.app_installs_v2 (./models/staging/app_installs_v2.sql)
Running moms_flower_shop.staging.stg_installs_per_campaign (./models/staging/stg_installs_per_campaign.sql)
Running moms_flower_shop.analytics.dim_marketing_campaigns (./models/analytics/dim_marketing_campaigns.sql)
Finished 7 models [7 succeeded] in 0.989 secsTable moms_flower_shop.analytics.dim_marketing_campaigns
┌─────────────┬─────────────────────────────────┬────────────────────┬───────────────────────────────┬────────────────────────┐
│ campaign_id ┆ campaign_name ┆ total_num_installs ┆ avg_customer_acquisition_cost ┆ install_duration_ratio │
╞═════════════╪═════════════════════════════════╪════════════════════╪═══════════════════════════════╪════════════════════════╡
│ 3 ┆ instagram_ads_3_3e31b541d8 ┆ 192 ┆ 7.958249909930306 ┆ 0 │
│ 1 ┆ friends_referrals_1_9b24ce4c65 ┆ 160 ┆ 10.391763883087673 ┆ 0 │
│ 0 ┆ instagram_ads_0_8c1fdf7cb0 ┆ 91 ┆ 4.513104055002336 ┆ 0 │
│ 13 ┆ friends_referrals_13_c6e177ff0a ┆ 72 ┆ 22.915686962366962 ┆ 0 │
│ 14 ┆ google_search_14_3775b59a56 ┆ 48 ┆ 54.12730118839374 ┆ 0 │
│ 19 ┆ friends_referrals_19_3f4a9d9fc3 ┆ 26 ┆ 59.38710281622366 ┆ 0 │
│ 11 ┆ google_search_11_56e2a09ed5 ┆ 19 ┆ 58.38329050623466 ┆ 0 │
│ 4 ┆ friends_referrals_4_c554b77114 ┆ 17 ┆ 163.69881288433993 ┆ 1 │
│ 22 ┆ friends_referrals_22_69b6886d4a ┆ 15 ┆ 116.4817634058549 ┆ 0 │
│ 17 ┆ google_search_17_a400715565 ┆ 15 ┆ 147.1283543422433 ┆ 1 │
└─────────────┴─────────────────────────────────┴────────────────────┴───────────────────────────────┴────────────────────────┘
114 rows, showing only 10 rows.
Run with —limit 0 to show all rows.
Bonus
Lineage is not all we can do. SDF provides out of the box static analysis to support the development process as you author your transformations. This capability has two benefits:- Quicker development thanks to the instant feedback on your syntax and semantics
- Compute cost savings since analysis happens on compile, before we send the query to run
1
Open the File
To keep it simple, we will use
models/staging/stg_installs_per_campaign.sql
throughout the following steps.As a reminder, the query looks like this:models/staging/stg_installs_per_campaign.sql
2
Error: No Column Found
Let’s try to change the column campaign_id to a non-existing column.For example, we can change the column name from If we try to compile:The run will fail and the error message will look like this:
campaign_id
to non_existing_column
.
This column definitely doesn’t exist in app_installs_v2
(or app_installs
)models/staging/stg_installs_per_campaign.sql
error: SDF227: No column non_existing_column found. Available are app_installs.install_time, app_installs.campaign_name, app_installs.campaign_id, app_installs.campaign_type, app_installs.event_id, app_installs.platform, app_installs.customer_id
—> models/staging/stg_installs_per_campaign.sql:2:5
If you are using Visual Studio Code, you can easily click into the error location
by CMD-Clicking on the bottom of the error:
--> models/staging/stg_installs_per_campaign.sql:2:5
3
Error: Function Arguments Do Not Match
Similarly, we can write a function incorrectly.For example, the function If we try to compile:The run will fail and the error message will look like this:
ABS
takes one numeric argument. What if we try to give it
a string like campaign_name
? In the file models/staging/stg_installs_per_campaign.sql
,
change the campaign_id
selection with the following:models/staging/stg_installs_per_campaign.sql
error: SDF209: Failed to resolve function ABS: Argument type mismatch: actual: (varchar); candidates: (tinyint); (smallint); (bigint); (double); (decimal(p, s)); (real).
—> models/staging/stg_installs_per_campaign.sql:2:5
4
Error: Un-aggregated columns in aggregation context
We can even try some more complicated logic.For example, we can add another column to the Watch what happens when you try to compile:The run will fail and the error message will look like this:
SELECT
without updating the GROUP BY
.error: SDF209: Failed to resolve function ABS: Argument type mismatch: actual: (varchar); candidates: (tinyint); (smallint); (bigint); (double); (decimal(p, s)); (real).
—> models/staging/stg_installs_per_campaign.sql:2:5
5
Try it Yourself
What is that one syntax mistake you always forget about until you see that
run error? Try it out! See if SDF can save you the wait next time :).
6
Don't Forget to Revert!
While experimenting, we introduced some breaking logic to our data warehouse.
Let’s revert the model to the original SQL code.Update the file
models/staging/stg_installs_per_campaign.sql
with the original SQL:models/staging/stg_installs_per_campaign.sql
Next Steps
Now that we fixed the issue, we want to prevent anything similar from happening again. In our next tutorial, “Deprecating a Model” we will deprecateapp_installs
, the old version of app_installs_v2
that we initially used.