Overview

SDF can integrate seemlessly with your existing Dagster projects, providing the best-in-class transformation layer while enabling you to schedule, orchestrate, and monitor your dags in Dagster.

When it comes time to materialize your Dagster assets, you can be confident that SDF has successfully compiled your workspace, making it safe to execute locally or against your cloud data warehouse.

The dagster-sdf package is currently in alpha. We recommend trying it out and if you experience any issues or find any bugs, please open an issue against Dagster’s repository.

Setting up your Dagster Project

The following guide assumes you already have an active SDF workspace. If not, you can create one in this SDF Tutorial Series.

1

Set up your environment

Getting started with Dagster and SDF is as easy as installing both with pip:

Note: We strongly recommend installing dagster and the sdf-cli package inside a Python virtualenv.

pip install dagster-sdf dagster-webserver

The dagster-sdf library installs both sdf-cli and dagster as python dependencies. If you’re starting from scratch, this will make the sdf CLI available to you, which you can use to interact with your SDF workspace.

To validate that you’ve installed the packages correctly, run the following commands and confirm that their output matches:

sdf --version

sdf 0.10.8

dagster-sdf --help
Usage: dagster-sdf [OPTIONS] COMMAND [ARGS]...                                                                                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                                                                                                                        
CLI tools for working with Dagster and sdf.                                                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                                                                                            
 Options
  --help  -h  Show this message and exit.

 Commands
  workspace   Commands for using an sdf workspace in Dagster.

Note: Version numbers and formatting may vary, but the output should be similar to the above.

2

Initialize your SDF Workspace with Dagster

Select from one of the following to create a Dagster project alongside your SDF workspace. If you don’t have an SDF workspace on hand, create one from our samples using the following command:

  sdf new --sample moms_flower_shop_completed

Create a new project using the dagster-sdf cli

This approach uses the dagster-sdf CLI to create a new Dagster project that references an existing SDF workspace. The dagster-sdf CLI pre-generates a scaffolded Dagster project with the necessary configuration to interface with your SDF workspace.

Initializing your project is as easy as running the following command:

dagster-sdf workspace scaffold --project-name ${DAGSTER_PROJECT_NAME}

with arguments:

  • --project-name - [required] The name of the Dagster project to be created. This will be the name of the directory containing the project files.
  • --sdf-workspace-dir - [optional] The path to an existing SDF workspace, which should contain a valid workspace.sdf.yml file. Optional, if executing from the root of an SDF workspace.

This will create a new directory named after the project-name specified above containing a set of files that define a Dagster project (i.e. assets.py, definitions.py etc.) and configuration on how to interface with your SDF workspace.

The output should look similar to:

Running with dagster-sdf version: 1.8.0.  
Initializing Dagster project my_project_name in the current working directory for sdf workspace  
directory /home/user/my_sdf_workspace.
Your Dagster project has been initialized. To view your SDF workspace in Dagster, run the following  
commands:  

cd ./my_project_name  
DAGSTER_SDF_COMPILE_ON_LOAD=1 dagster dev
3

Understanding your Dagster project

Great! Now we have a Dagster project configured alongside our SDF workspace.

You may have noticed some new objects in your python code when generating or authoring your Dagster project. These are all made possible by and exposed through the dagster-sdf package installed in Step 1. We’ll briefly break down the key components here:

  • SdfWorkspace: This is a representation of your SDF workspace in Dagster. Its core responsibility is to define the key components of a workspace, for example, the workspace and target output directories as well as which environment to prepare (i.e. compile) the workspace with.
  • SdfCliResource: This is a Dagster resource that provides access to the SDF cli. It’s used by the @sdf_assets decorator to materialize your SDF workspace by running a command with the sdf-cli. It can be used to run arbitrary SDF commands, but is tested to work with the compile, run and test commands.
  • @sdf_assets: This is a decorator that builds a multi-assets representation of your SDF workspace from the SDF information schema. This is used by Dagster to display your SDF workspace as an asset dag in the Dagster UI.
4

Start the Dagster Web Server

With your Dagster project set up, you can now materialize your SDF workspace using the Dagster UI.

Depending on the approach you took in Step 2, you can start the Dagster development server from either a:

From a Dagster project

  1. Change directories to the Dagster project directory:
cd my_dagster_project/
  1. To start Dagster’s UI, run the following:
DAGSTER_SDF_COMPILE_ON_LOAD=1 dagster dev

Your output should look similar to:

Serving dagster-webserver on http://127.0.0.1:3000 in process 70635

Click on the link to open the Dagster web server in your browser.

5

Materialize your project in the Dagster UI

With the Dagster web server running, you can now navigate to the Dagster UI to view your SDF workspace as an asset dag.

From here, we can view the metadata available at compile time from SDF on a given asset, such as the asset description, raw sql, code location, and more. We can also see the Dagster assets that your SDF workspace depends on, and the assets that depend on your SDF workspace.

In Dagster, running an SDF model corresponds to materializing an asset. Each SDF model is represented as an asset in the Dagster UI. Depending on your configuration, running a model will result in it either executing locally using the SDF DB or executing remotely using the relevant configured integrations.

To see this in action, click on the Materialize all button on the asset lineage view. This will materialize all assets in the workspace, including your SDF assets.

If everything is set up correctly, you should see the assets in your SDF workspace materialize successfully. Inspecting a particular asset will reveal additional runtime metadata, such as the execution time, the materialized sql, and the table schema along with any propagated classifiers.

6

Optional: Defining upstream dependencies

A common use case for using Dagster with SDF is when you have a complex data pipeline, parts of which are not authored in SDF. These assets can be upstream dependencies that your SDF workspace expects to have materialized prior to run.

SDF interprets these dependencies as source tables. Let’s take for example a snowflake warehouse which has tables that are materialized by different assets in your Dagster project.

If you have the following query in your SDF workspace:

internal_sdf_table.sql
SELECT * FROM upstream_dagster_table

Where upstream_dagster_table is not defined in SDF but is materialized in Dagster via:

assets.py
import snowflake.connector
...
@asset
def build_upstream_dagster_table():
    credential = ...
    conn = snowflake.connector.connect(
        user=credential['username'],
        password=credential['password'],
        account=credential['account-id'],
        warehouse=credential['warehouse'],
        role=credential['role'],
    )
    cursor = conn.cursor()
    create_upstream_table = """
    CREATE OR REPLACE TABLE upstream_dagster_table (
        id INTEGER,
        field STRING,
    );
    """
    cursor.execute(create_upstream_table)
    cursor.close()
    conn.close()
...

Then you can define the upstream_dagster_table as a Dagster source in your SDF workspace by specifying a dagster-asset-key in the meta section of the table definition:

upstream_dagster_table.sdf.yml
table:
  name: upstream_dagster_table
  meta:
    dagster-asset-key: build_upstream_dagster_table

This will have no impact on SDF’s execution, but will give Dagster the context that the upstream_dagster_table is a Dagster asset that needs to be materialized before the SDF workspace can be run.

Note: In order to read the upstream_dagster_table materialized by Dagster, you will need to define an integration in your SDF workspace that can read from the source table. See Integration Docs for more information.

Congratulations! You’ve successfully set up your SDF workspace with Dagster and materialized your assets in the dagster UI.