SDF can work alongside an existent DBT project to power column-level lineage, checks, and data classification / governance for DBT models.
v1.7.0
and above.
jaffle_shop
example project from DBT to demonstrate this.
dbt-core
Jaffle Shop example project setup locally. You can clone it from here.
profiles.yml
, the Jaffle Shop example will not be able to compile, and SDF will not work.Compile DBT
dbt compile
will compile the DBT project and generate the manifest file. This must be run first before SDF can work.Initialize the SDF DBT Workspace
workspace.sdf.yml
based on your DBT project’s configuration. This file will be placed adjacent to your dbt_project.yml
file and should be committed to your repository.workspace.sdf.yml
includes block points to files within the target
directory. This is because SDF deals with raw SQL directly and not DBT models.
To accomplish this, SDF copies the necessary compiled DBT models into the sdf
directory with target/compiled
(or your configured target-path
).This command will also compile DBT snapshots into a format SDF can understand, and port over credentials stored in your profiles.yml
file to the ~/.sdf/credentials
directory in the root of your system. This credentials
will be used by SDF later to fetch required table schemas from the cloud warehouse.As new changes to models or snapshots are made, running sdf dbt refresh
will refresh the sdf workspace to point to the latest.sdf dbt init
.Configure the Integrations Block
integrations
block in the workspace.sdf.yml
file. This block will contain the necessary information to connect to your warehouse.In DBT terms, this block replaces your DBT sources
. As such, it enables SDF to pull down the schema information for table dependencies not defined in SQL.
For example, if I had some DBT sources coming from the database my_db
in Snowflake, I would use the following configuration to pull them down at compile-time:integrations
block will be generated by the sdf dbt init
command in future versions. Stay tuned!Compile SDF
workspace.sdf.yml
file configured, we can compile SDF. This will validate your SQL, dependencies, and produce all SDF artifacts (including column-level lineage). These artifacts will be placed it in the sdftarget
directory local to your DBT project.Classify a DBT Model
raw_customers
. This table contains two columns (first_name
, last_name
) with personally identifiable information (PII). Let’s classify these columns as PII
in SDF, and ensure any downstream usage of these columns also inherits this classification.First, let’s define our PII classifier in the workspace.sdf.yml
file.raw_customers
table in the workspace.sdf.yml
file.raw_customers
table, but also to the downstream customers
table and others. This is because SDF is able to infer the lineage between these two tables and propagate the classification.sdf dbt
commands available for us when developing with SDF and DBT locally.
sdf dbt init
workspace.sdf.yml
file adjacent to your dbt_project.yml
file. It will also configure all DBT seeds and compile DBT snapshots into a format SDF can understand. Lastly, it will copy the compiled models into the sdf
directory with target/compiled
(or your configured target-path
).
Initialize a sdf workspace from a dbt project — best effortUsage: sdf dbt init [OPTIONS]Options:
—target <TARGET> Use this DBT target over the default target in profiles.yml
—profiles-dir <PROFILES_DIR> Use this DBT profile instead of the defaults at ~/.dbt/profile.yml — (note dbt uses —profile_dir, this CLI uses —profile-dir)
—workspace-dir <WORKSPACE_DIR> Specifies the workspace directory where we expect to see manifest and dbt project files The SDF workspace file will be placed in the same directory. Default: current directory
-s, —save Save and overwrite the workspace file
-c, —config <CONFIG> Supply a config yml file or provide config as yml string e.g. ‘{key: value}’
—log-level <LOG_LEVEL> Set log level [possible values: trace, debug, debug-pretty, info, warn, error]
—log-file <LOG_FILE> Creates or replaces the log file
—show-all-errors Don’t suppress errors
-h, —help Print help
dbt compile
must be run before running sdf dbt init
sdf dbt refresh
workspace.sdf.yml
file. It will recompile the DBT snapshots and move the compiled models into the sdf
directory with target/compiled
(or your configured target-path
).
Re-initialize a sdf workspace from a dbt project — best effortUsage: sdf dbt refresh [OPTIONS]Options:
—target <TARGET> Use this DBT target over the default target in profiles.yml
—profiles-dir <PROFILES_DIR> Use this DBT profile instead of the defaults at ~/.dbt/profile.yml — (note dbt uses —profile_dir, this CLI uses —profile-dir)
—workspace-dir <WORKSPACE_DIR> Specifies the workspace directory where we expect to see manifest and dbt project files The SDF workspace file will be placed in the same directory. Default: current directory
-s, —save Save and overwrite the workspace file
-c, —config <CONFIG> Supply a config yml file or provide config as yml string e.g. ‘{key: value}’
—log-level <LOG_LEVEL> Set log level [possible values: trace, debug, debug-pretty, info, warn, error]
—log-file <LOG_FILE> Creates or replaces the log file
—show-all-errors Don’t suppress errors
-h, —help Print help
workspace.sdf.yml
to work with the latest DBT models. For example, maybe you’ve added your first snapshot, requiring SDF to add a new includes
path to the workspace.sdf.yml
file.
By default, sdf dbt refresh
will not make necessary changes to your workspace.sdf.yml
in order to reflect any updates to your DBT project. However, you can use the --save
flag to save these changes to the file.
workspace.sdf.yml
are best effort and may result in unintended updates or reformatting.