GitHub
SDF is a powerful tool on its own, but its utility is amplified when integrated into CI/CD workflows.
SDF’s static analyis can supercharge CI/CD workflows for data teams. By compiling your SQL on each pull request, SDF can statically ensure new updates to your models are valid SQL and do not break anything downstream. Additionally, SDF supports the use of checks in CI/CD. This feature allows users to define their own static tests, further enhancing the robustness and reliability of your data models. Unlike other CI/CD data tools, SDF checks everything statically, preventing you from having to incur compute costs to validate SQL and run impact analysis.
This guide will walk you through how to integrate SDF into an example CI/CD workflow. Specifically, we’ll be using the official SDF GitHub Action to run sdf compile
and sdf test
on new pull requests and subsequent updates. This guide assumes you have a basic understanding of GitHub Actions and how to create a workflow file. If you’re new to GitHub Actions, we recommend checking out the official documentation.
If you’d like to skip ahead and see the final result, you can find the example repository here.
Prerequisites
There are two open source GitHub repositories we’ll be using today, to follow along, fork the starting repository and clone it to your local machine.
- A GitHub account
- A GitHub repository cloned locally that can be used for testing. See this guide for how to set this up.
Guide
Create an Example Workspace
To start off, we’ll create a new SDF workspace. We’ll skaffold our GitHub action around this workspace. Today, we’ll be using the pii_saas_platform
example. You can create a new workspace by running the following command:
This command should be run within the Git repository you’d like to test with. We recommend using a fresh repository for this example
Run Compile
Next, let’s confirm our workspace is working as expected by running sdf compile
. This command will compile all of the SQL in your workspace and ensure it is valid. If you have any errors in your SQL, this command will fail.
Add the GitHub Action
Now that we have a working workspace, let’s add a GitHub action to run sdf compile
on each pull request. We’ll need to add a workflow file to our workspace that utilizes the SDF GitHub Action to do this.
Add the following yml
to a file called sdf.yml
in the .github/workflows
directory of your repository. This file will run the trigger the SDF GitHub action on each pull request.
As you can see, we’re specifying sdf-labs/sdf-action@v0
as the action to use. Version v0
gets the latest version under 0.*.*
. If you’d like to use a specific version, you can specify it in the uses
field by specifying the entire semantic version (e.g. sdf-labs/sdf-action@v0.1.0
).
For documentation on each of the allowed inputs, see the README in the SDF GitHub Action repository.
The last step in the workflow is to pretty print the output from the SDF command. This is optional, but will result in the output seen in the screenshot below on the workflow summary page.
(Optional) Use the Action with DBT
If you’re not using DBT alongside SDF, you can skip step. Otherwise, you’ll need to ensure DBT compile is run before SDF compile. This is because SDF relies on the compiled DBT models to run its static analysis on a DBT project.
We recommend accomplishing the dbt compile
using this GitHub Action.
Furthermore, if you don’t have the DBT profiles.yml
configured in your DBT project (i.e. adjacent to your dbt_project.yml
), you’ll need to write the profiles.yml
file in the GitHub action. Use the modified sdf.yml
workflow file (called sdf_dbt.yml
) below to accomplish this.
Note the example below is for a Snowflake configuration.
The snippet below for writing out a profiles.yml
was contributed by Chris Hronek. Thanks Chris!
Notice that, critically, is_dbt
is set to true
in the sdf compile
step. Furthermore, we are specifying the workspace_dir
to be the path to the DBT project, since the SDF workspace must be defined adjacent to the DBT project in order to function properly.
Under the hood, the GitHub action runs sdf dbt refresh
and looks for models in the target/compiled/models/sdf
directory for compilation when is_dbt
is set to true
.
Create a Pull Request on GitHub
Now, we’ll commit out changes to a new branch and push them to GitHub to see our new action in action. Let’s call this branch add-sdf-action
.
This will add all staged changes to the new branch. If there is anything you don’t want to commit, make sure to unstage or stash it before running git add .
Now that we have a new branch in our remote repository, let’s make a pull request on GitHub. Follow this official guide from GitHub to do so, and make sure to select our sdf-add-action
branch as the branch to merge into main
.
Once the pull request is created, you should see our new GitHub action start running and succeed. Congrats! You’ve successfully integrated SDF into your CI/CD workflow.
Test a Breaking Change (Optional)
If you’d like to see the action fail, you can make a breaking change to your SQL and push it to the add-sdf-action
branch. This will trigger the GitHub action to run again, and it should fail.
To demonstrate SDF’s static impact analysis, let’s remove a column that’s being used in a downstream model. This will cause the sdf compile
command to fail, and in a perfect world, would prevent us from merging this update into our main branch.
To do this, remove line 7
from the file found at ddls/payment/public/invoices.sql
. This will remove the payer_user_id
column from the invoices
DDL.
After committing the change, push it to the branch with an open pull request. You should see the GitHub action fail, and the pull request will not be mergeable if the repository is configured in this way.
Here’s an example job summary of an SDF compilation failure in CI/CD
Summary
In this guide, we walked through how to integrate SDF into a CI/CD workflow using GitHub Actions. We created a new SDF workspace, added a GitHub action to run sdf compile
on each pull request, and tested the action by making a breaking change to our SQL. By integrating SDF into your CI/CD workflow, you can ensure that your SQL is always valid and downstream models are compliant with your new changes. This can help prevent costly errors and ensure that your data is always reliable.
For a completed example of this guide, check out the example repository