Overview

Migrating from DBT to SDF is a straightforward process. The migration process involves:

  1. Converting your DBT models to SDF models - This is a simple process of copying your SQL files into the SDF workspace and removing any ref, source or config jinja macro references.
  2. Mimicking DBT Configuration in SDF - Take any specific YML configurations from your DBT project and add them in .sdf.yml files. See more details in our YML reference.

SDF’s open source DBT migration tool will be available soon to automate this process.

This guide will walk through migrating DBT’s jaffle_shop example to SDF. It will also migrate away from the DuckDB usage in DBT and use SDF’s built-in database, SDF DB, instead.

The result of this guide, i.e. a fully migrated DBT / DuckDB jaffle_shop is available as an SDF sample. Use sdf new --sample jaffle_shop to create it locally, or check out the public source code for the workspace here.

Prerequisites

The DBT DuckDB jaffle_shop example project should be setup locally. You can clone it from here.

Guide

1

Analyze the DBT Project

Upon first inspection of the dbt_project.yml, you’ll notice a few things:

  1. All models are materialized to the jaffle_shop database
  2. By default, models are materialized as tables, but all models in the staging folde are materialized as views.
  3. The jaffle_shop project uses data locally via seeds. These are therefore required to build and run the project.

With this in mind, this guide will first migrate over the data seeds, then the models.

The end result will be a successul sdf compile and sdf run all locally.

2

Create a New SDF Workspace

Let’s start by creating a simple SDF workspace to provide us with the skaffolding for the migration. We provide a sample called hello_from_dbt which skaffolds some of the basic configuration to match a DBT project.

To do so, run the following command:

sdf new jaffle_shop --sample hello_from_dbt && cd jaffle_shop/

This will create a new SDF workspace with the name jaffle_shop in a subdirectory. The cd places us into this directory so that we’re ready to start migrating.

This sample sets preprocessor to jinja and adds a seeds directory to match a DBT project structure. This is represented by the following yml block:

  defaults:
    preprocessor: jinja
3

Migrate Over the Data

Next, we’ll migrate over the seeds to make sure SDF can interpret the schemas from the data.

To do so, copy the files in the seeds directory from the DBT project to the SDF workspace.

The resulting workspace should look like this:

tmp/jaffle_shop/ ├── models │   └── main.sql ├── seeds │   ├── raw_customers.csv │   ├── raw_orders.csv │   └── raw_payments.csv └── workspace.sdf.yml

2 directories, 5 files

Next, let’s tell SDF to materialize these seeds as tables for use in the workspace.

Add the following file in the directory models/raw and call it seeds.sdf.yml.

models/raw/seeds.sdf.yml
table:
  name: raw_customers
  location: seeds/raw_customers.csv
  with-header: true
  file-format: csv
---
table:
  name: raw_orders
  location: seeds/raw_orders.csv
  with-header: true
  file-format: csv
---
table:
  name: raw_payments
  location: seeds/raw_payments.csv
  with-header: true
  file-format: csv

Note that since we put this yml in the raw directory, we intend to materialize these tables into the raw schema. This is merely an organizational choice. Seed tables can be materialized to any database or schema, and will default to the default database and schema if only the table name is specified.

Location specification for seeds is relative to the root of the SDF workspace.

The resulting SDF YML should like like this:

workspace:   name: jaffle_shop   edition: “1.3”

  defaults:     preprocessor: jinja     materialization: table

  includes:     - path: models        index: schema-table-name     - path: seeds       type: resource

Lastly, let’s run a compile to make sure SDF can successfully process the data.

sdf compile --show all

Working set 3 .sdf files    Finished 3 models [3 succeeded] in 0.736 secs

Schema jaffle_shop.raw.raw_customers ┌─────────────┬───────────┬────────────┬─────────────┐ │ column_name ┆ data_type ┆ classifier ┆ description │ ╞═════════════╪═══════════╪════════════╪═════════════╡ │ id          ┆ bigint    ┆            ┆             │ │ first_name  ┆ varchar   ┆            ┆             │ │ last_name   ┆ varchar   ┆            ┆             │ └─────────────┴───────────┴────────────┴─────────────┘

Schema jaffle_shop.raw.raw_orders ┌─────────────┬───────────┬────────────┬─────────────┐ │ column_name ┆ data_type ┆ classifier ┆ description │ ╞═════════════╪═══════════╪════════════╪═════════════╡ │ id          ┆ bigint    ┆            ┆             │ │ user_id     ┆ bigint    ┆            ┆             │ │ order_date  ┆ date      ┆            ┆             │ │ status      ┆ varchar   ┆            ┆             │ └─────────────┴───────────┴────────────┴─────────────┘

Schema jaffle_shop.raw.raw_payments ┌────────────────┬───────────┬────────────┬─────────────┐ │ column_name    ┆ data_type ┆ classifier ┆ description │ ╞════════════════╪═══════════╪════════════╪═════════════╡ │ id             ┆ bigint    ┆            ┆             │ │ order_id       ┆ bigint    ┆            ┆             │ │ payment_method ┆ varchar   ┆            ┆             │ │ amount         ┆ bigint    ┆            ┆             │ └────────────────┴───────────┴────────────┴─────────────┘

Great! Our workspace compiled, and SDF successfully interpreted the table schemas from the data.

4

Migrate the First Models

Now that we’ve migrated our seed data over, let’s migrate a model that selects from these tables.

Let’s start with the staging layer. Copy the stg_orders.sql, stg_payments.sql, and stg_customers.sql files from the DBT project to the SDF workspace, and place them into a directory called staging, we’ll touch on why later.

Your workspace should now look like this:

tmp/jaffle_shop/ ├── models │   ├── main.sql │   ├── raw │   │   └── seeds.sdf.yml │   └── staging │       ├── stg_customers.sql │       ├── stg_orders.sql │       └── stg_payments.sql ├── seeds │   ├── raw_customers.csv │   ├── raw_orders.csv │   └── raw_payments.csv └── workspace.sdf.yml

4 directories, 9 files

Currently, SDF is including all models in the model directory without an index. However, for organization purposes, it’s best practice to have SDF interpret the schema name from the directory structure. This is why we placed the model into the staging directory. For more on this, see our Indexing Guide.

To enable this indexing, let’s slightly modify the workspace YML includes path to have it interpret the schema from the directory name and the table from the file name.

workspace.sdf.yml
includes:
  - path: models 
    index: schema-table-name

Lastly, since this model is using a DBT-native feature, ref(), we’ll need to replace this with a simple table selection. This is the only modification required!

Since SDF understands SQL natively, we can simply select from the table directly by its name. SDF still supports systematic renaming, lineage, and dependency analysis without the requirement for DBT refs in Jinja.

Below are the before and after SQL files for the stg_orders.sql, stg_payments.sql, and stg_customers.sql models. Scroll right within the filename section to see all files and changes.

Why select raw.raw_orders? Since we set an index to schema-table-name, the default schema for models in the staging folder is staging. Therefore, we need to tell SDF to look in the raw schema for this table.

Let’s run a compile to make sure SDF can successfully process the model.

sdf compile

Working set 3 model files, 3 .sdf files   Compiling jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)   Compiling jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)   Compiling jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)    Finished 6 models [6 succeeded] in 0.748 secs

Fantastic! We’ve just compiled two models selecting from our data.

If we want to take this one step further, we can run these models entirely locally with SDF DB.

Let’s try running our staging models with the following command:

sdf run "staging.*"

Working set 3 model files, 2 .sdf files     Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)     Running jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)     Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)    Finished 6 models [6 succeeded] in 0.771 secs

Table jaffle_shop.staging.stg_orders ┌──────────┬─────────────┬────────────┬───────────┐ │ order_id ┆ customer_id ┆ order_date ┆ status    │ ╞══════════╪═════════════╪════════════╪═══════════╡ │ 1        ┆ 1           ┆ 2018-01-01 ┆ returned  │ │ 2        ┆ 3           ┆ 2018-01-02 ┆ completed │ │ 3        ┆ 94          ┆ 2018-01-04 ┆ completed │ │ 4        ┆ 50          ┆ 2018-01-05 ┆ completed │ │ 5        ┆ 64          ┆ 2018-01-05 ┆ completed │ │ 6        ┆ 54          ┆ 2018-01-07 ┆ completed │ │ 7        ┆ 88          ┆ 2018-01-09 ┆ completed │ │ 8        ┆ 2           ┆ 2018-01-11 ┆ returned  │ │ 9        ┆ 53          ┆ 2018-01-12 ┆ completed │ │ 10       ┆ 7           ┆ 2018-01-14 ┆ completed │ └──────────┴─────────────┴────────────┴───────────┘ 99 rows, showing only 10 rows.   Run with —limit 0 to show all rows.

Table jaffle_shop.staging.stg_customers ┌─────────────┬────────────┬───────────┐ │ customer_id ┆ first_name ┆ last_name │ ╞═════════════╪════════════╪═══════════╡ │ 1           ┆ Michael    ┆ P.        │ │ 2           ┆ Shawn      ┆ M.        │ │ 3           ┆ Kathleen   ┆ P.        │ │ 4           ┆ Jimmy      ┆ C.        │ │ 5           ┆ Katherine  ┆ R.        │ │ 6           ┆ Sarah      ┆ R.        │ │ 7           ┆ Martin     ┆ M.        │ │ 8           ┆ Frank      ┆ R.        │ │ 9           ┆ Jennifer   ┆ F.        │ │ 10          ┆ Henry      ┆ W.        │ └─────────────┴────────────┴───────────┘ 100 rows, showing only 10 rows.   Run with —limit 0 to show all rows.

Table jaffle_shop.staging.stg_payments ┌────────────┬──────────┬────────────────┬────────┐ │ payment_id ┆ order_id ┆ payment_method ┆ amount │ ╞════════════╪══════════╪════════════════╪════════╡ │ 1          ┆ 1        ┆ credit_card    ┆ 10     │ │ 2          ┆ 2        ┆ credit_card    ┆ 20     │ │ 3          ┆ 3        ┆ coupon         ┆ 1      │ │ 4          ┆ 4        ┆ coupon         ┆ 25     │ │ 5          ┆ 5        ┆ bank_transfer  ┆ 17     │ │ 6          ┆ 6        ┆ credit_card    ┆ 6      │ │ 7          ┆ 7        ┆ credit_card    ┆ 16     │ │ 8          ┆ 8        ┆ credit_card    ┆ 23     │ │ 9          ┆ 9        ┆ gift_card      ┆ 23     │ │ 10         ┆ 9        ┆ bank_transfer  ┆ 0      │ └────────────┴──────────┴────────────────┴────────┘ 113 rows, showing only 10 rows.   Run with —limit 0 to show all rows.

Great! We’ve successfully migrated our first models and run them locally with SDF DB.

5

Migrate Complex Jinja

Moving on, let’s try migrating a model that uses more complex Jinja. SDF supports Jinja natively via the open source MiniJinja processor. As such, even with complex jinja the only thing we need to change is the ref() macro.

Let’s migrate the orders model to demonstrate this. Let’s imagine we want to materialize the layer that selects from staging into the analytics schema.

Since we are using the schema-table-name index, we can move the orders.sql file from the DBT project into a directory called orders in the SDF workspace.

Just like before, we’ll need to modify the SQL to remove the ref() macro. See below for the before and after.

Let’s run a compile to make sure SDF can successfully process the model.

sdf compile

Working set 4 model files, 3 .sdf files   Compiling jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)   Compiling jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)   Compiling jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)   Compiling jaffle_shop.analytics.orders (./models/analytics/orders.sql)    Finished 7 models [7 succeeded] in 0.772 secs

And lastly, let’s run our new model with the following command:

sdf run analytics.orders

Working set 4 model files, 2 .sdf files     Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)     Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)     Running jaffle_shop.analytics.orders (./models/analytics/orders.sql)    Finished 5 models [5 succeeded] in 0.808 secs

Table jaffle_shop.analytics.orders ┌──────────┬─────────────┬────────────┬───────────┬────────────────────┬───────────────┬──────────────────────┬──────────────────┬────────┐ │ order_id ┆ customer_id ┆ order_date ┆ status    ┆ credit_card_amount ┆ coupon_amount ┆ bank_transfer_amount ┆ gift_card_amount ┆ amount │ ╞══════════╪═════════════╪════════════╪═══════════╪════════════════════╪═══════════════╪══════════════════════╪══════════════════╪════════╡ │ 1        ┆ 1           ┆ 2018-01-01 ┆ returned  ┆ 10                 ┆ 0             ┆ 0                    ┆ 0                ┆ 10     │ │ 2        ┆ 3           ┆ 2018-01-02 ┆ completed ┆ 20                 ┆ 0             ┆ 0                    ┆ 0                ┆ 20     │ │ 4        ┆ 50          ┆ 2018-01-05 ┆ completed ┆ 0                  ┆ 25            ┆ 0                    ┆ 0                ┆ 25     │ │ 6        ┆ 54          ┆ 2018-01-07 ┆ completed ┆ 6                  ┆ 0             ┆ 0                    ┆ 0                ┆ 6      │ │ 8        ┆ 2           ┆ 2018-01-11 ┆ returned  ┆ 23                 ┆ 0             ┆ 0                    ┆ 0                ┆ 23     │ │ 9        ┆ 53          ┆ 2018-01-12 ┆ completed ┆ 0                  ┆ 0             ┆ 0                    ┆ 23               ┆ 23     │ │ 11       ┆ 99          ┆ 2018-01-14 ┆ completed ┆ 27                 ┆ 0             ┆ 0                    ┆ 0                ┆ 27     │ │ 13       ┆ 84          ┆ 2018-01-17 ┆ completed ┆ 5                  ┆ 0             ┆ 14                   ┆ 0                ┆ 19     │ │ 15       ┆ 25          ┆ 2018-01-17 ┆ completed ┆ 0                  ┆ 22            ┆ 0                    ┆ 0                ┆ 22     │ │ 17       ┆ 71          ┆ 2018-01-18 ┆ completed ┆ 0                  ┆ 0             ┆ 2                    ┆ 0                ┆ 2      │ └──────────┴─────────────┴────────────┴───────────┴────────────────────┴───────────────┴──────────────────────┴──────────────────┴────────┘ 99 rows, showing only 10 rows.   Run with —limit 0 to show all rows.

Nice! We just migrated a more complex model to compile and run entirely with SDF.

6

Configure Materialization

Next, let’s ensure our new models match the materialization patterns from DBT. For more on configuring materialization, see our Materialization Guide.

Firstly, let’s set the default materialization for all models to table. To do so, add the following to your workspace YML:

workspace.sdf.yml
defaults:
  ...
  materialization: table

Next, let’s set the materialization for the staging models to view. To do so, we’ll add one more includes block that overwrites the default materialization for any models in the staging directory:

workspace.sdf.yml
includes:
  ...
  - path: models/staging
    defaults:
      materialization: view

Lastly, let’s run a compile to make sure SDF can successfully process the model.

sdf compile

Working set 4 model files, 4 .sdf files   Compiling jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)   Compiling jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)   Compiling jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)   Compiling jaffle_shop.analytics.orders (./models/analytics/orders.sql)    Finished 7 models [7 succeeded] in 0.771 secs

Sweet! Our models are now materialized as views in the staging layer and as tables everywhere else.

7

Migrate Tests

As a final step, let’s migrate the tests from the DBT project to the SDF. For this example, we’ll migrate all the tests that run on the staging layer in DBT’s Jaffle Shop.

Let’s start by moving the schema.yml file within the staging/ directory in the DBT project to the staging/ directory in the SDF workspace. Next, let’s rename it to tests.sdf.yml.

The resulting workspace should look like this:

tmp/jaffle_shop/ ├── models │   ├── analytics │   │   └── orders.sql │   ├── main.sql │   ├── raw │   │   └── seeds.sdf.yml │   └── staging │       ├── stg_customers.sql │       ├── stg_orders.sql │       ├── stg_payments.sql │       └── tests.sdf.yml ├── seeds │   ├── raw_customers.csv │   ├── raw_orders.csv │   └── raw_payments.csv └── workspace.sdf.yml

5 directories, 11 files

Next, let’s migrate the DBT test YML to SDF test YML. The steps look like this:

  1. Remove the version: 2 at the top of the file.
  2. Make each one of the listed models its own table block. Add --- in between each block.
  3. Migrate the tests to the SDF test syntax.

Unlike DBT tests, SDF tests are functions that accept parameters and a severity property. This increases the flexibility of the tests and allows for more complex testing scenarios. For more on tests, see the SDF Test Guide.

After going through these steps, the resulting tests.sdf.yml should look like this:

To run the tests, simply run the following command:

sdf test

Working set 4 model files, 3 test files, 7 .sdf files     Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)     Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)     Running jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)     Testing jaffle_shop.staging.test_stg_payments (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_payments.sql)     Testing jaffle_shop.staging.test_stg_orders (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_orders.sql)     Testing jaffle_shop.staging.test_stg_customers (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_customers.sql)    Finished 6 models [6 succeeded], 3 tests [3 passed] in 0.813 secs [Pass] Test jaffle_shop.staging.test_stg_payments [Pass] Test jaffle_shop.staging.test_stg_orders [Pass] Test jaffle_shop.staging.test_stg_customers

Woohoo! The tests ran locally again the data with the SDF DB, and they are passing.

8

(Optional) Migrate the Rest

If the steps above are followed for the rest of the models and tests, you’ll see that the entire DBT project can be migrated to SDF with ease.

The result is a workspace that looks like this:

tmp/jaffle_shop_complete/ ├── models │   ├── analytics │   │   ├── customers.sql │   │   ├── meta.sdf.yml │   │   └── orders.sql │   ├── raw │   │   └── seeds.sdf.yml │   └── staging │       ├── stg_customers.sql │       ├── stg_orders.sql │       ├── stg_payments.sql │       └── tests.sdf.yml ├── seeds │   ├── raw_customers.csv │   ├── raw_orders.csv │   └── raw_payments.csv └── workspace.sdf.yml

5 directories, 12 files

Then compiles:

sdf compile

Working set 5 model files, 10 .sdf files   Compiling jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)   Compiling jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)   Compiling jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)   Compiling jaffle_shop.analytics.customers (./models/analytics/customers.sql)   Compiling jaffle_shop.analytics.orders (./models/analytics/orders.sql)    Finished 8 models [8 succeeded] in 0.788 secs

Runs:

sdf run

Working set 5 model files, 10 .sdf files     Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)     Running jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)     Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)     Running jaffle_shop.analytics.customers (./models/analytics/customers.sql)     Running jaffle_shop.analytics.orders (./models/analytics/orders.sql)    Finished 8 models [8 succeeded] in 0.822 secs

and tests:

sdf test

Working set 5 model files, 5 test files, 10 .sdf files     Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)     Running jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)     Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)     Testing jaffle_shop.staging.test_stg_customers (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_customers.sql)     Testing jaffle_shop.staging.test_stg_orders (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_orders.sql)     Testing jaffle_shop.staging.test_stg_payments (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_payments.sql)     Running jaffle_shop.analytics.customers (./models/analytics/customers.sql)     Running jaffle_shop.analytics.orders (./models/analytics/orders.sql)     Testing jaffle_shop.analytics.test_customers (./sdftarget/dbg/tests/jaffle_shop/analytics/test_customers.sql)     Testing jaffle_shop.analytics.test_orders (./sdftarget/dbg/tests/jaffle_shop/analytics/test_orders.sql)    Finished 8 models [8 succeeded], 5 tests [5 passed] in 0.922 secs [Pass] Test jaffle_shop.staging.test_stg_customers [Pass] Test jaffle_shop.staging.test_stg_orders [Pass] Test jaffle_shop.staging.test_stg_payments [Pass] Test jaffle_shop.analytics.test_customers [Pass] Test jaffle_shop.analytics.test_orders

all locally with SDF.

Known Jinja Differences

Due to SDF using a slightly different Jinja processor, there are a few differences in how Jinja is processed. Below are some of the key differences:

  1. Iterating over Dictionaries - Use |items instead of .items()

    {% for key, value in my_dict.items() %}
    

    becomes

    {% for key, value in my_dict|items() %}
    

    For more on this, see the section about iteration in our Jinja Docs.

  2. Ternary if Statements with no else - An else is required in ternary conditional statements.

    {{ 'yes' if boolean_var }}
    

    becomes

    {{ 'yes' if boolean_var else 'no' }}
    

    A real life example can be found in the dbt_utils.group_by implementation

    In DBT’s implementation, the else is not required as seen here:

    group by {% for i in range(1, n + 1) -%}
      {{ i }}{{ ',' if not loop.last }}   
    {%- endfor -%}
    

    SDF’s implementation requires an else after loop.last:

    group by {% for i in range(1, n + 1) -%}
      {{ i }}{{ ',' if not loop.last else "" }}   
    {%- endfor -%}
    

    This achieves the same result, but is more explicit in what happens in the negation case.

If you find any others, please let us know via our Slack Community or GitHub Issues.