Getting Started with Dagster and SDF
SDF as a best-in-class transformation and authoring layer for Dagster Orchestration
Overview
SDF can integrate seemlessly with your existing Dagster projects, providing the best-in-class transformation layer while enabling you to schedule, orchestrate, and monitor your dags in Dagster.
When it comes time to materialize your Dagster assets, you can be confident that SDF has successfully compiled your workspace, making it safe to execute locally or against your cloud data warehouse.
The dagster-sdf package is currently in alpha. We recommend trying it out and if you experience any issues or find any bugs, please open an issue against Dagster’s repository.
Setting up your Dagster Project
The following guide assumes you already have an active SDF workspace. If not, you can create one in this SDF Tutorial Series.
Set up your environment
Getting started with Dagster and SDF is as easy as installing both with pip:
Note: We strongly recommend installing dagster
and the sdf-cli
package inside a
Python virtualenv.
The dagster-sdf
library installs both sdf-cli
and dagster
as python dependencies. If you’re starting from scratch, this will make the sdf
CLI available to you, which you can use to interact with your SDF workspace.
To validate that you’ve installed the packages correctly, run the following commands and confirm that their output matches:
sdf 0.10.8
Note: Version numbers and formatting may vary, but the output should be similar to the above.
Initialize your SDF Workspace with Dagster
Select from one of the following to create a Dagster project alongside your SDF workspace. If you don’t have an SDF workspace on hand, create one from our samples using the following command:
Create a new project using the dagster-sdf cli
This approach uses the dagster-sdf
CLI to create a new Dagster project that references an existing SDF workspace. The dagster-sdf CLI pre-generates a scaffolded Dagster project with the necessary configuration to interface with your SDF workspace.
Initializing your project is as easy as running the following command:
with arguments:
--project-name
- [required] The name of the Dagster project to be created. This will be the name of the directory containing the project files.--sdf-workspace-dir
- [optional] The path to an existing SDF workspace, which should contain a validworkspace.sdf.yml
file. Optional, if executing from the root of an SDF workspace.
This will create a new directory named after the project-name
specified above containing a set of files that define a Dagster project (i.e. assets.py
, definitions.py
etc.) and configuration on how to interface with your SDF workspace.
The output should look similar to:
Understanding your Dagster project
Great! Now we have a Dagster project configured alongside our SDF workspace.
You may have noticed some new objects in your python code when generating or authoring your Dagster project. These are all made possible by and exposed through the dagster-sdf
package installed in Step 1. We’ll briefly break down the key components here:
- SdfWorkspace: This is a representation of your SDF workspace in Dagster. Its core responsibility is to define the key components of a workspace, for example, the workspace and target output directories as well as which environment to prepare (i.e. compile) the workspace with.
- SdfCliResource: This is a Dagster resource that provides access to the SDF cli. It’s used by the
@sdf_assets
decorator to materialize your SDF workspace by running a command with thesdf-cli
. It can be used to run arbitrary SDF commands, but is tested to work with thecompile
,run
andtest
commands. - @sdf_assets: This is a decorator that builds a multi-assets representation of your SDF workspace from the SDF information schema. This is used by Dagster to display your SDF workspace as an asset dag in the Dagster UI.
Start the Dagster Web Server
With your Dagster project set up, you can now materialize your SDF workspace using the Dagster UI.
Depending on the approach you took in Step 2, you can start the Dagster development server from either a:
From a Dagster project
- Change directories to the Dagster project directory:
- To start Dagster’s UI, run the following:
Your output should look similar to:
Click on the link to open the Dagster web server in your browser.
Materialize your project in the Dagster UI
With the Dagster web server running, you can now navigate to the Dagster UI to view your SDF workspace as an asset dag.
From here, we can view the metadata available at compile time from SDF on a given asset, such as the asset description, raw sql, code location, and more. We can also see the Dagster assets that your SDF workspace depends on, and the assets that depend on your SDF workspace.
In Dagster, running an SDF model corresponds to materializing an asset. Each SDF model is represented as an asset in the Dagster UI. Depending on your configuration, running a model will result in it either executing locally using the SDF DB or executing remotely using the relevant configured integrations.
To see this in action, click on the Materialize all
button on the asset lineage view. This will materialize all assets in the workspace, including your SDF assets.
If everything is set up correctly, you should see the assets in your SDF workspace materialize successfully. Inspecting a particular asset will reveal additional runtime metadata, such as the execution time, the materialized sql, and the table schema along with any propagated classifiers.
Optional: Defining upstream dependencies
A common use case for using Dagster with SDF is when you have a complex data pipeline, parts of which are not authored in SDF. These assets can be upstream dependencies that your SDF workspace expects to have materialized prior to run.
SDF interprets these dependencies as source tables. Let’s take for example a snowflake warehouse which has tables that are materialized by different assets in your Dagster project.
If you have the following query in your SDF workspace:
Where upstream_dagster_table
is not defined in SDF but is materialized in Dagster via:
Then you can define the upstream_dagster_table
as a Dagster source in your SDF workspace by specifying a dagster-asset-key
in the meta section of the table definition:
This will have no impact on SDF’s execution, but will give Dagster the context that the upstream_dagster_table
is a Dagster asset that needs to be materialized before the SDF workspace can be run.
Note: In order to read the upstream_dagster_table
materialized by Dagster, you will need to define an integration in your SDF workspace that can read from the source table. See Integration Docs for more information.
Congratulations! You’ve successfully set up your SDF workspace with Dagster and materialized your assets in the dagster UI.