> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Integrating with DBT

> SDF can work alongside an existent DBT project to power column-level lineage, checks, and data classification / governance for DBT models.

## Overview

This guide will take you through the steps to integrate SDF with an existing DBT project. SDF currently works with DBT `v1.7.0` and above.

* [Example](#example)
* [Commands](#commands)

<a name="example" />

## Example

This first section will walk you through the steps to integrate SDF with an existing DBT project. We'll use the `jaffle_shop` example project from DBT to demonstrate this.

## Prerequisites

Ensure that you have the following installed and configured locally before beginning.

* [DBT](https://docs.getdbt.com/docs/core/installation-overview)
* A valid `profiles.yml` file configured wth DBT. See [here](https://docs.getdbt.com/dbt-cli/configure-your-profile) for more information.

Next, you'll need the `dbt-core` Jaffle Shop example project setup locally. You can clone it from [here](https://github.com/dbt-labs/jaffle_shop?tab=readme-ov-file).

<Warning>
  Without a valid `profiles.yml`, the Jaffle Shop example will not be able to compile, and SDF will not work.
</Warning>

<Info>
  We use this example as it doesn't require authentication to a database or existing warehouse. In a production scenario, you'd likely compile DBT with your own models and remote warehouse, requiring authentication. To use SDF in this context, check our [integrations](/guide/setup/integrations) page to see if we support your warehouse. If not, DDLs can be manually added to SDF alongside the DBT project to enable SDF compilation.
</Info>

## Guide

<Steps>
  <Step title="Compile DBT">
    `dbt compile` will compile the DBT project and generate the manifest file. This must be run first before SDF can work.

    ```shell theme={null}
    dbt compile
    ```
  </Step>

  <Step title="Initialize the SDF DBT Workspace">
    Using the next command, SDF will create a `workspace.sdf.yml` based on your DBT project's configuration. This file will be placed adjacent to your `dbt_project.yml` file and should be committed to your repository.

    ```shell theme={null}
    sdf dbt init
    ```

    Notice that your `workspace.sdf.yml` [includes block](/reference/sdf-yml#nested-element-includepath) points to files within the `target` directory. This is because SDF deals with raw SQL directly and not DBT models.
    To accomplish this, SDF copies the necessary compiled DBT models into the `sdf` directory with `target/compiled` (or your configured `target-path`).

    This command will also compile DBT snapshots into a format SDF can understand, and port over credentials stored in your `profiles.yml` file to the `~/.sdf/credentials` directory in the root of your system. This credentials
    will be used by SDF later to fetch required table schemas from the cloud warehouse.

    As new changes to models or snapshots are made, running `sdf dbt refresh` will refresh the sdf workspace to point to the latest.

    <Warning>
      Our YML processor does not support [YML anchors nor YML aliases](https://smcleod.net/2022/11/yaml-anchors-and-aliases/).
      If you have YML anchors or aliases in your DBT project, you may need to remove them and refactor before running `sdf dbt init`.
    </Warning>
  </Step>

  <Step title="Configure the Integrations Block">
    If using DBT with a cloud warehouse, you'll likely need to configure the `integrations` block in the `workspace.sdf.yml` file. This block will contain the necessary information to connect to your warehouse.

    In DBT terms, this block replaces your DBT `sources`. As such, it enables SDF to pull down the schema information for table dependencies not defined in SQL.
    For example, if I had some DBT sources coming from the database `my_db` in Snowflake, I would use the following configuration to pull them down at compile-time:

    ```yaml theme={null}
    integrations:
    	- provider: snowflake
    		type: database
    		sources:
    			- pattern: my_db.*.*
    ```

    For more information, or for guides on how to configure this for other warehouses, check out our [Integrations](/integrations/overview) section.

    <Tip>
      This `integrations` block will be generated by the `sdf dbt init` command in future versions. Stay tuned!
    </Tip>
  </Step>

  <Step title="Compile SDF">
    Now that we have our `workspace.sdf.yml` file configured, we can compile SDF. This will validate your SQL, dependencies, and produce all SDF artifacts (including column-level lineage). These artifacts will be placed it in the `sdftarget` directory local to your DBT project.

    ```shell theme={null}
    sdf compile
    ```

    Great, now that we've successfully compiled our models, let's try adding some metadata.
  </Step>

  <Step title="Classify a DBT Model">
    In our jaffle shop example, we have a table created from a dbt seed called `raw_customers`. This table contains two columns (`first_name`, `last_name`) with personally identifiable information (PII). Let's classify these columns as `PII` in SDF, and ensure any downstream usage of these columns also inherits this classification.

    First, let's define our PII classifier in the `workspace.sdf.yml` file.

    ```yaml theme={null}
    ---
    classifier: 
    	name: pii
    	description: Personally Identifiable Information
    	labels: 
    		- name: name
    		  description: An individual's first, middle, or last name
    ```

    Next, let's attach this to the right columns `raw_customers` table in the `workspace.sdf.yml` file.

    ```yaml theme={null}
    ---
    table:
    	name: raw_customers
    	columns:
    	  - name: first_name
    		  classifiers:
    		  - pii.name
    	  - name: last_name
    		  classifiers:
    		  - pii.name
    ```

    Great, now let's compile SDF again and see what happens.

    ```shell theme={null}
    sdf compile --show all
    ```

    ```yaml theme={null}
    Working set 6 model files, 1 .sdf file
    Finished 8 models [8 reused] in 0.089 secs

    Schema jaffle_shop.dbt_alice.raw_customers
    +-------------+-----------+------------+
    | column_name | data_type | classifier |
    +-------------+-----------+------------+
    | id          | bigint    |            |
    | first_name  | varchar   | pii.name   |
    | last_name   | varchar   | pii.name   |
    +-------------+-----------+------------+

    ...
    ...

    Schema jaffle_shop.dbt_alice.stg_customers
    +-------------+-----------+------------+
    | column_name | data_type | classifier |
    +-------------+-----------+------------+
    | customer_id | bigint    |            |
    | first_name  | varchar   | pii.name   |
    | last_name   | varchar   | pii.name   |
    +-------------+-----------+------------+

    ...
    ...

    Schema jaffle_shop.dbt_alice.customers
    +-------------------------+-----------+------------+
    | column_name             | data_type | classifier |
    +-------------------------+-----------+------------+
    | customer_id             | bigint    |            |
    | first_name              | varchar   | pii.name   |
    | last_name               | varchar   | pii.name   |
    | first_order             | date      |            |
    | most_recent_order       | date      |            |
    | number_of_orders        | bigint    |            |
    | customer_lifetime_value | bigint    |            |
    +-------------------------+-----------+------------+
    ```

    You'll notice the classification is not only attached to the `raw_customers` table, but also to the downstream `customers` table and others. This is because SDF is able to infer the lineage between these two tables and propagate the classification.
  </Step>
</Steps>

<a name="commands" />

## Commands

Here we layout the `sdf dbt` commands available for us when developing with SDF and DBT locally.

### `sdf dbt init`

This command will initialize the SDF workspace for your DBT project. It will create a `workspace.sdf.yml` file adjacent to your `dbt_project.yml` file. It will also configure all DBT seeds and compile DBT snapshots into a format SDF can understand. Lastly, it will copy the compiled models into the `sdf` directory with `target/compiled` (or your configured `target-path`).

```shell theme={null}
sdf dbt init
```

<div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
  <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
    <code className="language-shell">
      Initialize a sdf workspace from a dbt project -- best effort

      Usage: sdf dbt init \[OPTIONS]

      Options:
            --target \<TARGET>                Use this DBT target over the default target in profiles.yml
            --profiles-dir \<PROFILES\_DIR>    Use this DBT profile instead of the defaults at \~/.dbt/profile.yml -- (note dbt uses --profile\_dir, this CLI uses --profile-dir)
            --workspace-dir \<WORKSPACE\_DIR>  Specifies the workspace directory where we expect to see manifest and dbt project files The SDF workspace file will be placed in the same directory. Default: current directory
        -s, --save                           Save and overwrite the workspace file
        -c, --config \<CONFIG>                Supply a config yml file or provide config as yml string e.g. '\{key: value}'
            --log-level \<LOG\_LEVEL>          Set log level \[possible values: trace, debug, debug-pretty, info, warn, error]
            --log-file \<LOG\_FILE>            Creates or replaces the log file
            --show-all-errors                Don't suppress errors
        -h, --help                           Print help
    </code>
  </pre>
</div>

<Warning>
  `dbt compile` must be run before running `sdf dbt init`
</Warning>

### `sdf dbt refresh`

This command will refresh the SDF workspace for your DBT project. This is useful if you make changes to DBT models during development, then would like to ensure SDF works with your latest models without regenerating the `workspace.sdf.yml` file. It will recompile the DBT snapshots and move the compiled models into the `sdf` directory with `target/compiled` (or your configured `target-path`).

```shell theme={null}
sdf dbt refresh
```

<div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
  <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
    <code className="language-shell">
      Re-initialize a sdf workspace from a dbt project -- best effort

      Usage: sdf dbt refresh \[OPTIONS]

      Options:
            --target \<TARGET>                Use this DBT target over the default target in profiles.yml
            --profiles-dir \<PROFILES\_DIR>    Use this DBT profile instead of the defaults at \~/.dbt/profile.yml -- (note dbt uses --profile\_dir, this CLI uses --profile-dir)
            --workspace-dir \<WORKSPACE\_DIR>  Specifies the workspace directory where we expect to see manifest and dbt project files The SDF workspace file will be placed in the same directory. Default: current directory
        -s, --save                           Save and overwrite the workspace file
        -c, --config \<CONFIG>                Supply a config yml file or provide config as yml string e.g. '\{key: value}'
            --log-level \<LOG\_LEVEL>          Set log level \[possible values: trace, debug, debug-pretty, info, warn, error]
            --log-file \<LOG\_FILE>            Creates or replaces the log file
            --show-all-errors                Don't suppress errors
        -h, --help                           Print help
    </code>
  </pre>
</div>

Sometimes, updates are required to the `workspace.sdf.yml` to work with the latest DBT models. For example, maybe you've added your first snapshot, requiring SDF to add a new `includes` path to the `workspace.sdf.yml` file.
By default, `sdf dbt refresh` will not make necessary changes to your `workspace.sdf.yml` in order to reflect any updates to your DBT project. However, you can use the `--save` flag to save these changes to the file.

```shell theme={null}
sdf dbt refresh --save
```

<Warning>
  Auto-updates to the `workspace.sdf.yml` are best effort and may result in unintended updates or reformatting.
</Warning>

## Conclusion

SDF can work alongside an existent DBT project to power column-level lineage, SQL compilation and validation, impact analysis, data classification, and much more for DBT models. This integration is actively under development, with lots more coming soon.
