Getting Started with Dagster and SDF
SDF as a best-in-class transformation and authoring layer for Dagster Orchestration
Overview
SDF can integrate seemlessly with your existing Dagster projects, providing the best-in-class transformation layer while enabling you to schedule, orchestrate, and monitor your dags in Dagster.
When it comes time to materialize your Dagster assets, you can be confident that SDF has successfully compiled your workspace, making it safe to execute locally or against your cloud data warehouse.
The dagster-sdf package is currently in alpha. We recommend trying it out and if you experience any issues or find any bugs, please open an issue against Dagster’s repository.
Setting up your Dagster Project
The following guide assumes you already have an active SDF workspace. If not, you can create one in this SDF Tutorial Series.
Set up your environment
Getting started with Dagster and SDF is as easy as installing both with pip:
Note: We strongly recommend installing dagster
and the sdf-cli
package inside a
Python virtualenv.
The dagster-sdf
library installs both sdf-cli
and dagster
as python dependencies. If you’re starting from scratch, this will make the sdf
CLI available to you, which you can use to interact with your SDF workspace.
To validate that you’ve installed the packages correctly, run the following commands and confirm that their output matches:
sdf 0.10.8
Note: Version numbers and formatting may vary, but the output should be similar to the above.
Initialize your SDF Workspace with Dagster
Select from one of the following to create a Dagster project alongside your SDF workspace. If you don’t have an SDF workspace on hand, create one from our samples using the following command:
Create a new project using the dagster-sdf cli
This approach uses the dagster-sdf
CLI to create a new Dagster project that references an existing SDF workspace. The dagster-sdf CLI pre-generates a scaffolded Dagster project with the necessary configuration to interface with your SDF workspace.
Initializing your project is as easy as running the following command:
with arguments:
--project-name
- [required] The name of the Dagster project to be created. This will be the name of the directory containing the project files.--sdf-workspace-dir
- [optional] The path to an existing SDF workspace, which should contain a validworkspace.sdf.yml
file. Optional, if executing from the root of an SDF workspace.
This will create a new directory named after the project-name
specified above containing a set of files that define a Dagster project (i.e. assets.py
, definitions.py
etc.) and configuration on how to interface with your SDF workspace.
The output should look similar to:
Create a new project using the dagster-sdf cli
This approach uses the dagster-sdf
CLI to create a new Dagster project that references an existing SDF workspace. The dagster-sdf CLI pre-generates a scaffolded Dagster project with the necessary configuration to interface with your SDF workspace.
Initializing your project is as easy as running the following command:
with arguments:
--project-name
- [required] The name of the Dagster project to be created. This will be the name of the directory containing the project files.--sdf-workspace-dir
- [optional] The path to an existing SDF workspace, which should contain a validworkspace.sdf.yml
file. Optional, if executing from the root of an SDF workspace.
This will create a new directory named after the project-name
specified above containing a set of files that define a Dagster project (i.e. assets.py
, definitions.py
etc.) and configuration on how to interface with your SDF workspace.
The output should look similar to:
Author a Dagster project in a single file
Compiling and materializing your SDF workspace with Dagster is as easy as authoring a single definition file.
Simply create a Python file in the same directory as your SDF workspace.sdf.yml
with the following content:
Since this file contains all Dagster definitions required for Dagster to execute your workspace, it is common practice to name this file definitions.py
.
Ensure that the RELATIVE_PATH_TO_MY_SDF_WORKSPACE
variable points to the directory containing your SDF workspace.
Using an existing Dagster project
If you already have an existing Dagster project, you can add an @sdf_assets
asset definition to compile and run your SDF workspace. To do this, you’ll need to:
- Define an
SdfWorkspace
alongside a new@sdf_assets
asset definition - Register the new asset definition, along with an
SdfCliResource
resource that will be accessed by the asset definition to materialize the workspace.
Note: This example assumes that your existing Dagster project includes both assets.py
and definitions.py
files, among other required files like setup.py
and pyproject.toml
. For example, your project might look like this:
- Change directories to the Dagster project directory:
- Create a Python file named
workspace.py
and add the following code:
The SdfWorkspace
is a representation of your SDF workspace in Dagster. It’s core responsibility is to define the key components of a workspace, for example, the workspace and target output directories as well as which environment to prepare (i.e. compile) the workspace with.
Setting the DAGSTER_SDF_COMPILE_ON_LOAD
environment variable to true when starting the Dagster development web-server will ensure that the SdfWorkspace
compiles your workspace and generates the asset dag along with all compile-time available metadata.
- In your project’s
assets.py
file, add the following code:
The @sdf_assets
decorator builds a multi-assets representation of your SDF workspace from the SDF information schema. This is used by Dagster to display your SDF workspace as an asset dag in the Dagster UI. It also defines the set of assets which Dagster will expect the SDF cli to emmit materialization events on run.
- In your project’s
definitions.py
file, update theDefinitions
object to include the newly created asset and workspace, as well as to define theSdfCliResource
resource:
With these changes, you’ve added your SDF Workspace to your existing project and are ready to materialize your models with Dagster.
Understanding your Dagster project
Great! Now we have a Dagster project configured alongside our SDF workspace.
You may have noticed some new objects in your python code when generating or authoring your Dagster project. These are all made possible by and exposed through the dagster-sdf
package installed in Step 1. We’ll briefly break down the key components here:
- SdfWorkspace: This is a representation of your SDF workspace in Dagster. Its core responsibility is to define the key components of a workspace, for example, the workspace and target output directories as well as which environment to prepare (i.e. compile) the workspace with.
- SdfCliResource: This is a Dagster resource that provides access to the SDF cli. It’s used by the
@sdf_assets
decorator to materialize your SDF workspace by running a command with thesdf-cli
. It can be used to run arbitrary SDF commands, but is tested to work with thecompile
,run
andtest
commands. - @sdf_assets: This is a decorator that builds a multi-assets representation of your SDF workspace from the SDF information schema. This is used by Dagster to display your SDF workspace as an asset dag in the Dagster UI.
Start the Dagster Web Server
With your Dagster project set up, you can now materialize your SDF workspace using the Dagster UI.
Depending on the approach you took in Step 2, you can start the Dagster development server from either a:
From a Dagster project
- Change directories to the Dagster project directory:
- To start Dagster’s UI, run the following:
Your output should look similar to:
Click on the link to open the Dagster web server in your browser.
From a Dagster project
- Change directories to the Dagster project directory:
- To start Dagster’s UI, run the following:
Your output should look similar to:
Click on the link to open the Dagster web server in your browser.
From a Dagster file
-
Locate the Dagster file containing your definitions. If you created a single Dagster file in the Step 2, this file will be
definitions.py
. -
To start Dagster’s UI, run the following:
Your output should look similar to:
Click on the link to open the Dagster web server in your browser.
Materialize your project in the Dagster UI
With the Dagster web server running, you can now navigate to the Dagster UI to view your SDF workspace as an asset dag.
From here, we can view the metadata available at compile time from SDF on a given asset, such as the asset description, raw sql, code location, and more. We can also see the Dagster assets that your SDF workspace depends on, and the assets that depend on your SDF workspace.
In Dagster, running an SDF model corresponds to materializing an asset. Each SDF model is represented as an asset in the Dagster UI. Depending on your configuration, running a model will result in it either executing locally using the SDF DB or executing remotely using the relevant configured integrations.
To see this in action, click on the Materialize all
button on the asset lineage view. This will materialize all assets in the workspace, including your SDF assets.
If everything is set up correctly, you should see the assets in your SDF workspace materialize successfully. Inspecting a particular asset will reveal additional runtime metadata, such as the execution time, the materialized sql, and the table schema along with any propagated classifiers.
Optional: Defining upstream dependencies
A common use case for using Dagster with SDF is when you have a complex data pipeline, parts of which are not authored in SDF. These assets can be upstream dependencies that your SDF workspace expects to have materialized prior to run.
SDF interprets these dependencies as source tables. Let’s take for example a snowflake warehouse which has tables that are materialized by different assets in your Dagster project.
If you have the following query in your SDF workspace:
Where upstream_dagster_table
is not defined in SDF but is materialized in Dagster via:
Then you can define the upstream_dagster_table
as a Dagster source in your SDF workspace by specifying a dagster-asset-key
in the meta section of the table definition:
This will have no impact on SDF’s execution, but will give Dagster the context that the upstream_dagster_table
is a Dagster asset that needs to be materialized before the SDF workspace can be run.
Note: In order to read the upstream_dagster_table
materialized by Dagster, you will need to define an integration in your SDF workspace that can read from the source table. See Integration Docs for more information.
Congratulations! You’ve successfully set up your SDF workspace with Dagster and materialized your assets in the dagster UI.