Overview

In the previous tutorial, we discovered app_installs was replaced by app_installs_v2.

This tutorial will walk through the deprecation of app_installs and demonstrate how SDF can help overcome the two big hurdles in deprecating a model:

  1. Preventing prod from breaking by tracking and resolving downstream dependencies
  2. Setting up guardrails to prevent future mistakes

Prerequisites

Completion of the previous tutorial.

Setup

Reference to Metadata Files

SDF’s semantic understanding is based on metadata files, defining your custom business logic, and smart propagation of values. To get started, we need to reference the metadata directory by simply including the relevant paths in our workspace configuration file, workspace.sdf.yml.

Open the file workspace.sdf.yml and uncomment the following:

workspace.sdf.yml
...
# Uncomment here to begin the "Deprecating a Model" tutorial >>>>>>
- path: classifications  # Classifiers
    type: metadata
- path: reports  # Reports based on SDF s information schema
    type: report
- path: checks  # Checks against SDF's information schema
    type: check
# <<<<<<<

Note

  • The classifications folder contains one YAML file for table classifiers, and one for column classifiers.
  • The metadata folder is structured based on our models folder.
  • The reports folder is currently empty.

We will talk about all of these a bit later.

Now, the paths are included in our project.

Guide

1

Before Deprecation

Before deprecating a model, we can run an impact analysis on the its downstream dependencies using SDF’s lineage command. We simply run:

sdf lineage staging.app_installs --forward --show result

Table: staging.app_installs

staging.app_installs.campaign_id

staging.app_installs.campaign_name │ │ copy └──────┐        analytics.agg_installs_and_campaigns.campaign_name

staging.app_installs.campaign_type

staging.app_installs.customer_id │ │ mod └──────┐        analytics.agg_installs_and_campaigns.distinct_installs

staging.app_installs.event_id

staging.app_installs.install_time │ │ mod └──────┐        analytics.agg_installs_and_campaigns.install_date

staging.app_installs.platform │ │ copy └──────┐        analytics.agg_installs_and_campaigns.platform

Wow! The model agg_installs_and_campaigns is also using the old version table!

2

Classify the Model as Deprecated

The file classifications/table_classifiers.sdf.yml holds definitions of a table level classifier called TABLE_STATUS. You can see that one of the table status options is deprecated.

To add the classifier to our table, navigate to the table’s metadata file metadata/staging/app_installs.sdf.yml. Remove the commented section assigning the TABLE_STATUS.deprecated classifier to the table:

metadata/staging/app_installs.sdf.yml
# Uncomment here to add a "deprecated" classifier to the table
classifiers:
  - TABLE_STATUS.deprecated

Let’s compile and see how it looks:

sdf compile staging.app_installs --show result

Schema moms_flower_shop.staging.app_installs        This table is a staging table which adds campaign information to app install events

This table is a staging table which adds campaign information to app install events        [TABLE_STATUS.deprecated] ┌───────────────┬───────────┬────────────┬─────────────┐ │ column_name   ┆ data_type ┆ classifier ┆ description │ ╞═══════════════╪═══════════╪════════════╪═════════════╡ │ event_id      ┆ bigint    ┆            ┆             │ │ customer_id   ┆ bigint    ┆            ┆             │ │ install_time  ┆ timestamp ┆            ┆             │ │ platform      ┆ varchar   ┆            ┆             │ │ campaign_id   ┆ bigint    ┆            ┆             │ │ campaign_name ┆ varchar   ┆            ┆             │ │ campaign_type ┆ varchar   ┆            ┆             │ └───────────────┴───────────┴────────────┴─────────────┘

Read more about classifiers in the Classifiers Guide.

3

Create a Report of Deprecated Tables Reference

To receive an instant snapshot all models referencing deprecated tables through the entire data warehouse we can create an SDF report that will display all deprecated table is references.

Create a new SQL file called deprecated_table_reference.sql in the reports/ directory. Copy the following query to your new file:

reports/deprecated_table_reference.sql
WITH
deprecated_tables AS (
    SELECT
        table_id
    FROM sdf.information_schema.tables 
    WHERE 
        CONTAINS(classifiers, 'TABLE_STATUS.deprecated')
)

SELECT 
    to_table_id AS table_id, 
    from_table_id AS upstream_deprecated_table_id
FROM sdf.information_schema.table_lineage
WHERE from_table_id IN (SELECT table_id FROM deprecated_tables)
    AND to_table_id IS NOT NULL

To learn more about SDF’s rich information schema and how you can leverage it to write your custom own reports, read our information schema reference page.

Thanks to SDF’s lineage, we already knew that agg_installs_and_campaigns is referencing the old table. It’s a good test to our check - we expect it to fail.

Let’s check it out (pun intended):

sdf report deprecated_table_reference

Report moms_flower_shop.pub.deprecated_table_reference ┌───────────────────────────────────────────────────────┬───────────────────────────────────────┐ │ table_id                                              ┆ upstream_deprecated_table_id          │ ╞═══════════════════════════════════════════════════════╪═══════════════════════════════════════╡ │ moms_flower_shop.analytics.agg_installs_and_campaigns ┆ moms_flower_shop.staging.app_installs │ └───────────────────────────────────────────────────────┴───────────────────────────────────────┘ 1 rows.

It works! Now, we should fix agg_installs_and_campaigns.

For more examples on getting health reports of your data warehouse and aggregate custom logic and static checks, read our full Reports Guide.

4

Resolve Downstream Dependencies

To make sure we won’t break prod if we delete our table, we can choose to resolve the dependency by either:

  • If the model is not being used, we can just delete it alltogether.
  • Otherwise, we can update the FROM statement of models/analytics/agg_installs_and_campaigns.sql like we did for our previous model:
models/analytics/agg_installs_and_campaigns.sql
...
FROM staging.app_installs_v2  -- Change from app_installs to app_installs_v2
...

Let’s run our report again:

sdf report deprecated_table_reference

All clear!

Report moms_flower_shop.pub.deprecated_table_reference ┌──────────┬──────────────────────────────┐ │ table_id ┆ upstream_deprecated_table_id │ ╞══════════╪══════════════════════════════╡ └──────────┴──────────────────────────────┘ 0 rows.

That was really cool, ha? Let’s explore what other cool things we can do with SDF’s information scehma and semantic understanding in our next tutorial - enriching your warehouse