Debugging
The part we like the least, yet spend the most time on. SDF is here to help!
Overview
In the previous tutorial we learned how to create a new model with SDF.
As we were working on dim_marketing_campaigns
, the result of the query just didn’t make sense.
In this tutorial we will see how SDF can simplify complicated debugging scenarios and accelerate their resolution. It can also provide real-time feedback as you author your queries with unprecedented speed and accuracy.
More specifically, we will use SDF to:
- Identify the issue upstream with our built-in column-level lineage.
- Run impact analysis after fixing the issue.
- Test some syntax static analysis examples.
Prerequisites
Completion of the previous tutorial.
Guide
Understand Column Lineage
SDF provides column-level lineage out of the box, all the way up to source.
To understand the lineage of the column total_num_installs
in our new model, all
we need to do is run:
moms_flower_shop.analytics.dim_marketing_campaigns.total_num_installs
│
│ copy
└──────┐
moms_flower_shop.staging.stg_installs_per_campaign.total_num_installs
│
│ mod
└──────┐
moms_flower_shop.staging.app_installs.event_id
│
│ copy
└──────┐
moms_flower_shop.raw.raw_marketing_campaign_events.event_id
moms_flower_shop.staging.inapp_events.event_id
│
│ copy
└──────┐
moms_flower_shop.raw.raw_inapp_events.event_id
Now we can pinpoint exactly which models we need to check. Out of the entire warehouse, we only need to check 3 models!
To learn more about SDF lineage, visit our lineage guide
Identify the Issue Upstream
Let’s check the first upstream model — stg_installs_per_campaign.sql
located under the directory models/staging/
.
There is not much going on here. We need to keep moving up.
In the upstream model app_installs
defined in models/staging/app_installs.sql
,
let’s zoom in on the FROM
statement:
Aha! Who said that inapp_events and marketing events are the same?
Compare to Other Models
You remember another model, customers
, has correct marketing campaign data.
Let’s check out the lineage for the customers model:
moms_flower_shop.staging.customers.campaign_id
│
│ copy
└──────┐
moms_flower_shop.staging.app_installs_v2.campaign_id
│
│ copy
└──────┐
moms_flower_shop.raw.raw_marketing_campaign_events.campaign_id
Aha #2! There’s a V2 version of app_installs
— app_installs_v2
Update the Model
Once we identify the issue, it’s an easy fix.
Update the FROM
statement of models/staging/stg_installs_per_campaign.sql
, to the use app_installs_v2
:
Analyze
Let’s analyze the impact.
First, we should make sure our fix compiles:
We expect to see a successful run:
Working set 12 model files, 15 .sdf files
Compiling moms_flower_shop.staging.stg_installs_per_campaign (./models/staging/stg_installs_per_campaign.sql)
Compiling moms_flower_shop.analytics.dim_marketing_campaigns (./models/analytics/dim_marketing_campaigns.sql)
Finished 11 models [2 succeeded, 9 reused] in 0.745 secs
Next, we can look at all the tables and columns this change will impact. If we add the
flag --forward
to the lineage command, we will see the lineage of each column of
stg_installs_per_campaign
and how it is used by other models.
To do that, add a forward
tag to your lineage command:
It should look like this:
Table: moms_flower_shop.staging.stg_installs_per_campaign
moms_flower_shop.staging.stg_installs_per_campaign.campaign_id
moms_flower_shop.staging.stg_installs_per_campaign.total_num_installs
│
│ copy
├──────┐
│ moms_flower_shop.analytics.dim_marketing_campaigns.total_num_installs
│ mod
└──────┐
moms_flower_shop.analytics.dim_marketing_campaigns.avg_customer_acquisition_cost
moms_flower_shop.analytics.dim_marketing_campaigns.install_duration_ratio
Only our new model will be impacted.
Fix it!
To materialize the fix, all we need to do is run the queries:
Working set 12 model files, 15 .sdf files
Running moms_flower_shop.raw.raw_inapp_events (./models/raw/raw_inapp_events.sql)
Running moms_flower_shop.raw.raw_marketing_campaign_events (./models/raw/raw_marketing_campaign_events.sql)
Running moms_flower_shop.staging.inapp_events (./models/staging/inapp_events.sql)
Running moms_flower_shop.staging.marketing_campaigns (./models/staging/marketing_campaigns.sql)
Running moms_flower_shop.staging.app_installs_v2 (./models/staging/app_installs_v2.sql)
Running moms_flower_shop.staging.stg_installs_per_campaign (./models/staging/stg_installs_per_campaign.sql)
Running moms_flower_shop.analytics.dim_marketing_campaigns (./models/analytics/dim_marketing_campaigns.sql)
Finished 7 models [7 succeeded] in 1.105 secs
Table moms_flower_shop.analytics.dim_marketing_campaigns
┌─────────────┬─────────────────────────────────┬────────────────────┬───────────────────────────────┬────────────────────────┐
│ campaign_id ┆ campaign_name ┆ total_num_installs ┆ avg_customer_acquisition_cost ┆ install_duration_ratio │
╞═════════════╪═════════════════════════════════╪════════════════════╪═══════════════════════════════╪════════════════════════╡
│ 3 ┆ instagram_ads_3_3e31b541d8 ┆ 192 ┆ 7.958249909930306 ┆ 0 │
│ 1 ┆ friends_referrals_1_9b24ce4c65 ┆ 160 ┆ 10.391763883087673 ┆ 0 │
│ 0 ┆ instagram_ads_0_8c1fdf7cb0 ┆ 91 ┆ 4.513104055002336 ┆ 0 │
│ 13 ┆ friends_referrals_13_c6e177ff0a ┆ 72 ┆ 22.915686962366962 ┆ 0 │
│ 14 ┆ google_search_14_3775b59a56 ┆ 48 ┆ 54.12730118839374 ┆ 0 │
│ 19 ┆ friends_referrals_19_3f4a9d9fc3 ┆ 26 ┆ 59.38710281622366 ┆ 0 │
│ 11 ┆ google_search_11_56e2a09ed5 ┆ 19 ┆ 58.38329050623466 ┆ 0 │
│ 4 ┆ friends_referrals_4_c554b77114 ┆ 17 ┆ 163.69881288433993 ┆ 1 │
│ 22 ┆ friends_referrals_22_69b6886d4a ┆ 15 ┆ 116.4817634058549 ┆ 0 │
│ 17 ┆ google_search_17_a400715565 ┆ 15 ┆ 147.1283543422433 ┆ 1 │
└─────────────┴─────────────────────────────────┴────────────────────┴───────────────────────────────┴────────────────────────┘
114 rows, showing only 10 rows.
Run with —limit 0 to show all rows.
Perfect! We are all set!
Bonus
Lineage is not all we can do.
SDF provides out of the box static analysis to support the development process as you author your transformations. This capability has two benefits:
- Quicker development thanks to the instant feedback on your syntax and semantics
- Compute cost savings since analysis happens on compile, before we send the query to run
Let’s try a few things that will fail. In this case, getting an error early is a good thing. Notice how quickly you get the response!
Open the File
To keep it simple, we will use models/staging/stg_installs_per_campaign.sql
throughout the following steps.
As a reminder, the query looks like this:
Error: No Column Found
Let’s try to change the column campaign_id to a non-existing column.
For example, we can change the column name from campaign_id
to non_existing_column
.
This column definitely doesn’t exist in app_installs_v2
(or app_installs
)
If we try to compile:
The run will fail and the error message will look like this:
error: SDF227: No column non_existing_column found. Available are app_installs.install_time, app_installs.campaign_name, app_installs.campaign_id, app_installs.campaign_type, app_installs.event_id, app_installs.platform, app_installs.customer_id
—> models/staging/stg_installs_per_campaign.sql:2:5
If you are using Visual Studio Code, you can easily click into the error location by CMD-Clicking on the bottom of the error:
--> models/staging/stg_installs_per_campaign.sql:2:5
Error: Function Arguments Do Not Match
Similarly, we can write a function incorrectly.
For example, the function ABS
takes one numeric argument. What if we try to give it
a string like campaign_name
? In the file models/staging/stg_installs_per_campaign.sql
,
change the campaign_id
selection with the following:
If we try to compile:
The run will fail and the error message will look like this:
error: SDF209: Failed to resolve function ABS: Argument type mismatch: actual: (varchar); candidates: (tinyint); (smallint); (bigint); (double); (decimal(p, s)); (real).
—> models/staging/stg_installs_per_campaign.sql:2:5
Error: Un-aggregated columns in aggregation context
We can even try some more complicated logic.
For example, we can add another column to the SELECT
without updating the GROUP BY
.
Watch what happens when you try to compile:
The run will fail and the error message will look like this:
error: SDF209: Failed to resolve function ABS: Argument type mismatch: actual: (varchar); candidates: (tinyint); (smallint); (bigint); (double); (decimal(p, s)); (real).
—> models/staging/stg_installs_per_campaign.sql:2:5
Try it Yourself
What is that one syntax mistake you always forget about until you see that run error? Try it out! See if SDF can save you the wait next time :).
Don't Forget to Revert!
While experimenting, we introduced some breaking logic to our data warehouse. Let’s revert the model to the original SQL code.
Update the file models/staging/stg_installs_per_campaign.sql
with the original SQL:
Next Steps
Now that we fixed the issue, we want to prevent anything similar from happening again.
In our next tutorial, “Deprecating a Model” we will
deprecate app_installs
, the old version of app_installs_v2
that we initially used.