Migrating from DBT
SDF is best used as an entire transformation layer in itself, effectively replacing DBT and adding all the benefits of SDF.
Overview
Migrating from DBT to SDF is a straightforward process. The migration process involves:
- Converting your DBT models to SDF models - This is a simple process of copying your SQL files into the SDF workspace and removing any
ref
,source
orconfig
jinja macro references. - Mimicking DBT Configuration in SDF - Take any specific YML configurations from your DBT project and add them in
.sdf.yml
files. See more details in our YML reference.
SDF’s open source DBT migration tool will be available soon to automate this process.
This guide will walk through migrating DBT’s jaffle_shop
example to SDF. It will also migrate away from the DuckDB usage in DBT and use SDF’s built-in database, SDF DB, instead.
The result of this guide, i.e. a fully migrated DBT / DuckDB jaffle_shop is available as an SDF sample. Use sdf new --sample jaffle_shop
to create it locally, or
check out the public source code for the workspace here.
Prerequisites
The DBT DuckDB jaffle_shop
example project should be setup locally. You can clone it from here.
Guide
Analyze the DBT Project
Upon first inspection of the dbt_project.yml
, you’ll notice a few things:
- All models are materialized to the
jaffle_shop
database - By default, models are materialized as tables, but all models in the staging folde are materialized as views.
- The
jaffle_shop
project uses data locally via seeds. These are therefore required to build and run the project.
With this in mind, this guide will first migrate over the data seeds, then the models.
The end result will be a successul sdf compile
and sdf run
all locally.
Create a New SDF Workspace
Let’s start by creating a simple SDF workspace to provide us with the skaffolding for the migration. We provide a sample called hello_from_dbt
which skaffolds some of the basic configuration to match a DBT project.
To do so, run the following command:
This will create a new SDF workspace with the name jaffle_shop
in a subdirectory. The cd
places us into this directory so that we’re ready to start migrating.
This sample sets preprocessor
to jinja
and adds a seeds directory to match a DBT project structure. This is represented by the following yml block:
Migrate Over the Data
Next, we’ll migrate over the seeds to make sure SDF can interpret the schemas from the data.
To do so, copy the files in the seeds
directory from the DBT project to the SDF workspace.
The resulting workspace should look like this:
tmp/jaffle_shop/
├── models
│ └── main.sql
├── seeds
│ ├── raw_customers.csv
│ ├── raw_orders.csv
│ └── raw_payments.csv
└── workspace.sdf.yml
2 directories, 5 files
Next, let’s tell SDF to materialize these seeds as tables for use in the workspace.
Add the following file in the directory models/raw
and call it seeds.sdf.yml
.
Note that since we put this yml in the raw
directory, we intend to materialize these tables into the raw
schema. This is merely an organizational choice. Seed tables can be materialized to any database or schema, and will default to the default database and schema if only the table name is specified.
Location
specification for seeds is relative to the root of the SDF workspace.
The resulting SDF YML should like like this:
workspace:
name: jaffle_shop
edition: “1.3”
defaults:
preprocessor: jinja
materialization: table
includes:
- path: models
index: schema-table-name
- path: seeds
type: resource
Lastly, let’s run a compile to make sure SDF can successfully process the data.
Working set 3 .sdf files
Finished 3 models [3 succeeded] in 0.736 secs
Schema jaffle_shop.raw.raw_customers
┌─────────────┬───────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪═══════════╪════════════╪═════════════╡
│ id ┆ bigint ┆ ┆ │
│ first_name ┆ varchar ┆ ┆ │
│ last_name ┆ varchar ┆ ┆ │
└─────────────┴───────────┴────────────┴─────────────┘
Schema jaffle_shop.raw.raw_orders
┌─────────────┬───────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪═══════════╪════════════╪═════════════╡
│ id ┆ bigint ┆ ┆ │
│ user_id ┆ bigint ┆ ┆ │
│ order_date ┆ date ┆ ┆ │
│ status ┆ varchar ┆ ┆ │
└─────────────┴───────────┴────────────┴─────────────┘
Schema jaffle_shop.raw.raw_payments
┌────────────────┬───────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞════════════════╪═══════════╪════════════╪═════════════╡
│ id ┆ bigint ┆ ┆ │
│ order_id ┆ bigint ┆ ┆ │
│ payment_method ┆ varchar ┆ ┆ │
│ amount ┆ bigint ┆ ┆ │
└────────────────┴───────────┴────────────┴─────────────┘
Great! Our workspace compiled, and SDF successfully interpreted the table schemas from the data.
Migrate the First Models
Now that we’ve migrated our seed data over, let’s migrate a model that selects from these tables.
Let’s start with the staging layer. Copy the stg_orders.sql
, stg_payments.sql
, and stg_customers.sql
files from the DBT project to the SDF workspace, and place them into a directory called staging
, we’ll touch on why later.
Your workspace should now look like this:
tmp/jaffle_shop/
├── models
│ ├── main.sql
│ ├── raw
│ │ └── seeds.sdf.yml
│ └── staging
│ ├── stg_customers.sql
│ ├── stg_orders.sql
│ └── stg_payments.sql
├── seeds
│ ├── raw_customers.csv
│ ├── raw_orders.csv
│ └── raw_payments.csv
└── workspace.sdf.yml
4 directories, 9 files
Currently, SDF is including all models in the model directory without an index. However, for organization purposes, it’s best practice to have SDF interpret the schema name from the directory structure. This is why we placed the model into the staging
directory. For more on this, see our Indexing Guide.
To enable this indexing, let’s slightly modify the workspace YML includes path to have it interpret the schema from the directory name and the table from the file name.
Lastly, since this model is using a DBT-native feature, ref()
, we’ll need to replace this with a simple table selection. This is the only modification required!
Since SDF understands SQL natively, we can simply select from the table directly by its name. SDF still supports systematic renaming, lineage, and dependency analysis without the requirement for DBT refs in Jinja.
Below are the before and after SQL files for the stg_orders.sql
, stg_payments.sql
, and stg_customers.sql
models. Scroll right within the filename section to see all files and changes.
Why select raw.raw_orders
? Since we set an index to schema-table-name
, the default schema for models in the staging
folder is staging
. Therefore, we need to tell SDF to look in the raw
schema for this table.
Let’s run a compile to make sure SDF can successfully process the model.
Working set 3 model files, 3 .sdf files
Compiling jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Compiling jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Compiling jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Finished 6 models [6 succeeded] in 0.748 secs
Fantastic! We’ve just compiled two models selecting from our data.
If we want to take this one step further, we can run these models entirely locally with SDF DB.
Let’s try running our staging models with the following command:
Working set 3 model files, 2 .sdf files
Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Running jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Finished 6 models [6 succeeded] in 0.771 secs
Table jaffle_shop.staging.stg_orders
┌──────────┬─────────────┬────────────┬───────────┐
│ order_id ┆ customer_id ┆ order_date ┆ status │
╞══════════╪═════════════╪════════════╪═══════════╡
│ 1 ┆ 1 ┆ 2018-01-01 ┆ returned │
│ 2 ┆ 3 ┆ 2018-01-02 ┆ completed │
│ 3 ┆ 94 ┆ 2018-01-04 ┆ completed │
│ 4 ┆ 50 ┆ 2018-01-05 ┆ completed │
│ 5 ┆ 64 ┆ 2018-01-05 ┆ completed │
│ 6 ┆ 54 ┆ 2018-01-07 ┆ completed │
│ 7 ┆ 88 ┆ 2018-01-09 ┆ completed │
│ 8 ┆ 2 ┆ 2018-01-11 ┆ returned │
│ 9 ┆ 53 ┆ 2018-01-12 ┆ completed │
│ 10 ┆ 7 ┆ 2018-01-14 ┆ completed │
└──────────┴─────────────┴────────────┴───────────┘
99 rows, showing only 10 rows.
Run with —limit 0 to show all rows.
Table jaffle_shop.staging.stg_customers
┌─────────────┬────────────┬───────────┐
│ customer_id ┆ first_name ┆ last_name │
╞═════════════╪════════════╪═══════════╡
│ 1 ┆ Michael ┆ P. │
│ 2 ┆ Shawn ┆ M. │
│ 3 ┆ Kathleen ┆ P. │
│ 4 ┆ Jimmy ┆ C. │
│ 5 ┆ Katherine ┆ R. │
│ 6 ┆ Sarah ┆ R. │
│ 7 ┆ Martin ┆ M. │
│ 8 ┆ Frank ┆ R. │
│ 9 ┆ Jennifer ┆ F. │
│ 10 ┆ Henry ┆ W. │
└─────────────┴────────────┴───────────┘
100 rows, showing only 10 rows.
Run with —limit 0 to show all rows.
Table jaffle_shop.staging.stg_payments
┌────────────┬──────────┬────────────────┬────────┐
│ payment_id ┆ order_id ┆ payment_method ┆ amount │
╞════════════╪══════════╪════════════════╪════════╡
│ 1 ┆ 1 ┆ credit_card ┆ 10 │
│ 2 ┆ 2 ┆ credit_card ┆ 20 │
│ 3 ┆ 3 ┆ coupon ┆ 1 │
│ 4 ┆ 4 ┆ coupon ┆ 25 │
│ 5 ┆ 5 ┆ bank_transfer ┆ 17 │
│ 6 ┆ 6 ┆ credit_card ┆ 6 │
│ 7 ┆ 7 ┆ credit_card ┆ 16 │
│ 8 ┆ 8 ┆ credit_card ┆ 23 │
│ 9 ┆ 9 ┆ gift_card ┆ 23 │
│ 10 ┆ 9 ┆ bank_transfer ┆ 0 │
└────────────┴──────────┴────────────────┴────────┘
113 rows, showing only 10 rows.
Run with —limit 0 to show all rows.
Great! We’ve successfully migrated our first models and run them locally with SDF DB.
Migrate Complex Jinja
Moving on, let’s try migrating a model that uses more complex Jinja. SDF supports Jinja natively via the open source MiniJinja processor.
As such, even with complex jinja the only thing we need to change is the ref()
macro.
Let’s migrate the orders
model to demonstrate this. Let’s imagine we want to materialize the layer that selects from staging
into the analytics
schema.
Since we are using the schema-table-name
index, we can move the orders.sql
file from the DBT project into a directory called orders
in the SDF workspace.
Just like before, we’ll need to modify the SQL to remove the ref()
macro. See below for the before and after.
Let’s run a compile to make sure SDF can successfully process the model.
Working set 4 model files, 3 .sdf files
Compiling jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Compiling jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Compiling jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Compiling jaffle_shop.analytics.orders (./models/analytics/orders.sql)
Finished 7 models [7 succeeded] in 0.772 secs
And lastly, let’s run our new model with the following command:
Working set 4 model files, 2 .sdf files
Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Running jaffle_shop.analytics.orders (./models/analytics/orders.sql)
Finished 5 models [5 succeeded] in 0.808 secs
Table jaffle_shop.analytics.orders
┌──────────┬─────────────┬────────────┬───────────┬────────────────────┬───────────────┬──────────────────────┬──────────────────┬────────┐
│ order_id ┆ customer_id ┆ order_date ┆ status ┆ credit_card_amount ┆ coupon_amount ┆ bank_transfer_amount ┆ gift_card_amount ┆ amount │
╞══════════╪═════════════╪════════════╪═══════════╪════════════════════╪═══════════════╪══════════════════════╪══════════════════╪════════╡
│ 1 ┆ 1 ┆ 2018-01-01 ┆ returned ┆ 10 ┆ 0 ┆ 0 ┆ 0 ┆ 10 │
│ 2 ┆ 3 ┆ 2018-01-02 ┆ completed ┆ 20 ┆ 0 ┆ 0 ┆ 0 ┆ 20 │
│ 4 ┆ 50 ┆ 2018-01-05 ┆ completed ┆ 0 ┆ 25 ┆ 0 ┆ 0 ┆ 25 │
│ 6 ┆ 54 ┆ 2018-01-07 ┆ completed ┆ 6 ┆ 0 ┆ 0 ┆ 0 ┆ 6 │
│ 8 ┆ 2 ┆ 2018-01-11 ┆ returned ┆ 23 ┆ 0 ┆ 0 ┆ 0 ┆ 23 │
│ 9 ┆ 53 ┆ 2018-01-12 ┆ completed ┆ 0 ┆ 0 ┆ 0 ┆ 23 ┆ 23 │
│ 11 ┆ 99 ┆ 2018-01-14 ┆ completed ┆ 27 ┆ 0 ┆ 0 ┆ 0 ┆ 27 │
│ 13 ┆ 84 ┆ 2018-01-17 ┆ completed ┆ 5 ┆ 0 ┆ 14 ┆ 0 ┆ 19 │
│ 15 ┆ 25 ┆ 2018-01-17 ┆ completed ┆ 0 ┆ 22 ┆ 0 ┆ 0 ┆ 22 │
│ 17 ┆ 71 ┆ 2018-01-18 ┆ completed ┆ 0 ┆ 0 ┆ 2 ┆ 0 ┆ 2 │
└──────────┴─────────────┴────────────┴───────────┴────────────────────┴───────────────┴──────────────────────┴──────────────────┴────────┘
99 rows, showing only 10 rows.
Run with —limit 0 to show all rows.
Nice! We just migrated a more complex model to compile and run entirely with SDF.
Configure Materialization
Next, let’s ensure our new models match the materialization patterns from DBT. For more on configuring materialization, see our Materialization Guide.
Firstly, let’s set the default materialization for all models to table
. To do so, add the following to your workspace YML:
Next, let’s set the materialization for the staging models to view
.
To do so, we’ll add one more includes block that overwrites the default materialization for any models in the staging
directory:
Lastly, let’s run a compile to make sure SDF can successfully process the model.
Working set 4 model files, 4 .sdf files
Compiling jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Compiling jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Compiling jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Compiling jaffle_shop.analytics.orders (./models/analytics/orders.sql)
Finished 7 models [7 succeeded] in 0.771 secs
Sweet! Our models are now materialized as views in the staging layer and as tables everywhere else.
Migrate Tests
As a final step, let’s migrate the tests from the DBT project to the SDF. For this example, we’ll migrate all the tests that run on the staging layer in DBT’s Jaffle Shop.
Let’s start by moving the schema.yml
file within the staging/
directory in the DBT project to the staging/
directory in the SDF workspace. Next, let’s rename it to tests.sdf.yml
.
The resulting workspace should look like this:
tmp/jaffle_shop/
├── models
│ ├── analytics
│ │ └── orders.sql
│ ├── main.sql
│ ├── raw
│ │ └── seeds.sdf.yml
│ └── staging
│ ├── stg_customers.sql
│ ├── stg_orders.sql
│ ├── stg_payments.sql
│ └── tests.sdf.yml
├── seeds
│ ├── raw_customers.csv
│ ├── raw_orders.csv
│ └── raw_payments.csv
└── workspace.sdf.yml
5 directories, 11 files
Next, let’s migrate the DBT test YML to SDF test YML. The steps look like this:
- Remove the
version: 2
at the top of the file. - Make each one of the listed models its own table block. Add
---
in between each block. - Migrate the tests to the SDF test syntax.
Unlike DBT tests, SDF tests are functions that accept parameters and a severity property. This increases the flexibility of the tests and allows for more complex testing scenarios. For more on tests, see the SDF Test Guide.
After going through these steps, the resulting tests.sdf.yml
should look like this:
To run the tests, simply run the following command:
Working set 4 model files, 3 test files, 7 .sdf files
Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Running jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Testing jaffle_shop.staging.test_stg_payments (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_payments.sql)
Testing jaffle_shop.staging.test_stg_orders (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_orders.sql)
Testing jaffle_shop.staging.test_stg_customers (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_customers.sql)
Finished 6 models [6 succeeded], 3 tests [3 passed] in 0.813 secs
[Pass] Test jaffle_shop.staging.test_stg_payments
[Pass] Test jaffle_shop.staging.test_stg_orders
[Pass] Test jaffle_shop.staging.test_stg_customers
Woohoo! The tests ran locally again the data with the SDF DB, and they are passing.
(Optional) Migrate the Rest
If the steps above are followed for the rest of the models and tests, you’ll see that the entire DBT project can be migrated to SDF with ease.
The result is a workspace that looks like this:
tmp/jaffle_shop_complete/
├── models
│ ├── analytics
│ │ ├── customers.sql
│ │ ├── meta.sdf.yml
│ │ └── orders.sql
│ ├── raw
│ │ └── seeds.sdf.yml
│ └── staging
│ ├── stg_customers.sql
│ ├── stg_orders.sql
│ ├── stg_payments.sql
│ └── tests.sdf.yml
├── seeds
│ ├── raw_customers.csv
│ ├── raw_orders.csv
│ └── raw_payments.csv
└── workspace.sdf.yml
5 directories, 12 files
Then compiles:
Working set 5 model files, 10 .sdf files
Compiling jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Compiling jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Compiling jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Compiling jaffle_shop.analytics.customers (./models/analytics/customers.sql)
Compiling jaffle_shop.analytics.orders (./models/analytics/orders.sql)
Finished 8 models [8 succeeded] in 0.788 secs
Runs:
Working set 5 model files, 10 .sdf files
Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Running jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Running jaffle_shop.analytics.customers (./models/analytics/customers.sql)
Running jaffle_shop.analytics.orders (./models/analytics/orders.sql)
Finished 8 models [8 succeeded] in 0.822 secs
and tests:
Working set 5 model files, 5 test files, 10 .sdf files
Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Running jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Testing jaffle_shop.staging.test_stg_customers (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_customers.sql)
Testing jaffle_shop.staging.test_stg_orders (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_orders.sql)
Testing jaffle_shop.staging.test_stg_payments (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_payments.sql)
Running jaffle_shop.analytics.customers (./models/analytics/customers.sql)
Running jaffle_shop.analytics.orders (./models/analytics/orders.sql)
Testing jaffle_shop.analytics.test_customers (./sdftarget/dbg/tests/jaffle_shop/analytics/test_customers.sql)
Testing jaffle_shop.analytics.test_orders (./sdftarget/dbg/tests/jaffle_shop/analytics/test_orders.sql)
Finished 8 models [8 succeeded], 5 tests [5 passed] in 0.922 secs
[Pass] Test jaffle_shop.staging.test_stg_customers
[Pass] Test jaffle_shop.staging.test_stg_orders
[Pass] Test jaffle_shop.staging.test_stg_payments
[Pass] Test jaffle_shop.analytics.test_customers
[Pass] Test jaffle_shop.analytics.test_orders
all locally with SDF.
Known Jinja Differences
Due to SDF using a slightly different Jinja processor, there are a few differences in how Jinja is processed. Below are some of the key differences:
-
Iterating over Dictionaries - Use
|items
instead of.items()
becomes
For more on this, see the section about iteration in our Jinja Docs.
-
Ternary
if
Statements with noelse
- Anelse
is required in ternary conditional statements.becomes
A real life example can be found in the
dbt_utils.group_by
implementationIn DBT’s implementation, the
else
is not required as seen here:SDF’s implementation requires an
else
afterloop.last
:This achieves the same result, but is more explicit in what happens in the negation case.
If you find any others, please let us know via our Slack Community or GitHub Issues.