SDF is best used as an entire transformation layer in itself, effectively replacing DBT and adding all the benefits of SDF.
ref
, source
or config
jinja macro references..sdf.yml
files. See more details in our YML reference.jaffle_shop
example to SDF. It will also migrate away from the DuckDB usage in DBT and use SDF’s built-in database, SDF DB, instead.
sdf new --sample jaffle_shop
to create it locally, or
check out the public source code for the workspace here.jaffle_shop
example project should be setup locally. You can clone it from here.
Analyze the DBT Project
dbt_project.yml
, you’ll notice a few things:jaffle_shop
databasejaffle_shop
project uses data locally via seeds. These are therefore required to build and run the project.sdf compile
and sdf run
all locally.Create a New SDF Workspace
hello_from_dbt
which skaffolds some of the basic configuration to match a DBT project.To do so, run the following command:jaffle_shop
in a subdirectory. The cd
places us into this directory so that we’re ready to start migrating.preprocessor
to jinja
and adds a seeds directory to match a DBT project structure. This is represented by the following yml block:Migrate Over the Data
seeds
directory from the DBT project to the SDF workspace.The resulting workspace should look like this:tmp/jaffle_shop/
├── models
│ └── main.sql
├── seeds
│ ├── raw_customers.csv
│ ├── raw_orders.csv
│ └── raw_payments.csv
└── workspace.sdf.yml3 directories, 5 files
models/raw
and call it seeds.sdf.yml
.raw
directory, we intend to materialize these tables into the raw
schema. This is merely an organizational choice. Seed tables can be materialized to any database or schema, and will default to the default database and schema if only the table name is specified.Location
specification for seeds is relative to the root of the SDF workspace.workspace:
name: jaffle_shop
edition: “1.3” defaults:
preprocessor: jinja
materialization: table includes:
- path: models
index: schema-table-name
- path: seeds
type: resource
Working set 3 .sdf files
Finished 3 models [3 succeeded] in 0.912 secsSchema jaffle_shop.raw.raw_customers
┌─────────────┬───────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪═══════════╪════════════╪═════════════╡
│ id ┆ bigint ┆ ┆ │
│ first_name ┆ varchar ┆ ┆ │
│ last_name ┆ varchar ┆ ┆ │
└─────────────┴───────────┴────────────┴─────────────┘Schema jaffle_shop.raw.raw_orders
┌─────────────┬───────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪═══════════╪════════════╪═════════════╡
│ id ┆ bigint ┆ ┆ │
│ user_id ┆ bigint ┆ ┆ │
│ order_date ┆ date ┆ ┆ │
│ status ┆ varchar ┆ ┆ │
└─────────────┴───────────┴────────────┴─────────────┘Schema jaffle_shop.raw.raw_payments
┌────────────────┬───────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞════════════════╪═══════════╪════════════╪═════════════╡
│ id ┆ bigint ┆ ┆ │
│ order_id ┆ bigint ┆ ┆ │
│ payment_method ┆ varchar ┆ ┆ │
│ amount ┆ bigint ┆ ┆ │
└────────────────┴───────────┴────────────┴─────────────┘
Migrate the First Models
stg_orders.sql
, stg_payments.sql
, and stg_customers.sql
files from the DBT project to the SDF workspace, and place them into a directory called staging
, we’ll touch on why later.Your workspace should now look like this:tmp/jaffle_shop/
├── models
│ ├── main.sql
│ ├── raw
│ │ └── seeds.sdf.yml
│ └── staging
│ ├── stg_customers.sql
│ ├── stg_orders.sql
│ └── stg_payments.sql
├── seeds
│ ├── raw_customers.csv
│ ├── raw_orders.csv
│ └── raw_payments.csv
└── workspace.sdf.yml5 directories, 9 files
staging
directory. For more on this, see our Indexing Guide.To enable this indexing, let’s slightly modify the workspace YML includes path to have it interpret the schema from the directory name and the table from the file name.ref()
, we’ll need to replace this with a simple table selection. This is the only modification required!stg_orders.sql
, stg_payments.sql
, and stg_customers.sql
models. Scroll right within the filename section to see all files and changes.raw.raw_orders
? Since we set an index to schema-table-name
, the default schema for models in the staging
folder is staging
. Therefore, we need to tell SDF to look in the raw
schema for this table.Working set 3 model files, 3 .sdf files
Compiling jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Compiling jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Compiling jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Finished 6 models [6 succeeded] in 0.958 secs
Working set 3 model files, 2 .sdf files
Running jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Finished 6 models [6 succeeded] in 1.008 secsTable jaffle_shop.staging.stg_orders
┌──────────┬─────────────┬────────────┬───────────┐
│ order_id ┆ customer_id ┆ order_date ┆ status │
╞══════════╪═════════════╪════════════╪═══════════╡
│ 1 ┆ 1 ┆ 2018-01-01 ┆ returned │
│ 2 ┆ 3 ┆ 2018-01-02 ┆ completed │
│ 3 ┆ 94 ┆ 2018-01-04 ┆ completed │
│ 4 ┆ 50 ┆ 2018-01-05 ┆ completed │
│ 5 ┆ 64 ┆ 2018-01-05 ┆ completed │
│ 6 ┆ 54 ┆ 2018-01-07 ┆ completed │
│ 7 ┆ 88 ┆ 2018-01-09 ┆ completed │
│ 8 ┆ 2 ┆ 2018-01-11 ┆ returned │
│ 9 ┆ 53 ┆ 2018-01-12 ┆ completed │
│ 10 ┆ 7 ┆ 2018-01-14 ┆ completed │
└──────────┴─────────────┴────────────┴───────────┘
99 rows, showing only 10 rows.
Run with —limit 0 to show all rows.Table jaffle_shop.staging.stg_customers
┌─────────────┬────────────┬───────────┐
│ customer_id ┆ first_name ┆ last_name │
╞═════════════╪════════════╪═══════════╡
│ 1 ┆ Michael ┆ P. │
│ 2 ┆ Shawn ┆ M. │
│ 3 ┆ Kathleen ┆ P. │
│ 4 ┆ Jimmy ┆ C. │
│ 5 ┆ Katherine ┆ R. │
│ 6 ┆ Sarah ┆ R. │
│ 7 ┆ Martin ┆ M. │
│ 8 ┆ Frank ┆ R. │
│ 9 ┆ Jennifer ┆ F. │
│ 10 ┆ Henry ┆ W. │
└─────────────┴────────────┴───────────┘
100 rows, showing only 10 rows.
Run with —limit 0 to show all rows.Table jaffle_shop.staging.stg_payments
┌────────────┬──────────┬────────────────┬────────┐
│ payment_id ┆ order_id ┆ payment_method ┆ amount │
╞════════════╪══════════╪════════════════╪════════╡
│ 1 ┆ 1 ┆ credit_card ┆ 10 │
│ 2 ┆ 2 ┆ credit_card ┆ 20 │
│ 3 ┆ 3 ┆ coupon ┆ 1 │
│ 4 ┆ 4 ┆ coupon ┆ 25 │
│ 5 ┆ 5 ┆ bank_transfer ┆ 17 │
│ 6 ┆ 6 ┆ credit_card ┆ 6 │
│ 7 ┆ 7 ┆ credit_card ┆ 16 │
│ 8 ┆ 8 ┆ credit_card ┆ 23 │
│ 9 ┆ 9 ┆ gift_card ┆ 23 │
│ 10 ┆ 9 ┆ bank_transfer ┆ 0 │
└────────────┴──────────┴────────────────┴────────┘
113 rows, showing only 10 rows.
Run with —limit 0 to show all rows.
Migrate Complex Jinja
ref()
macro.Let’s migrate the orders
model to demonstrate this. Let’s imagine we want to materialize the layer that selects from staging
into the analytics
schema.Since we are using the schema-table-name
index, we can move the orders.sql
file from the DBT project into a directory called orders
in the SDF workspace.Just like before, we’ll need to modify the SQL to remove the ref()
macro. See below for the before and after.Working set 4 model files, 3 .sdf files
Compiling jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Compiling jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Compiling jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Compiling jaffle_shop.analytics.orders (./models/analytics/orders.sql)
Finished 7 models [7 succeeded] in 1.191 secs
Working set 4 model files, 2 .sdf files
Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Running jaffle_shop.analytics.orders (./models/analytics/orders.sql)
Finished 5 models [5 succeeded] in 1.177 secsTable jaffle_shop.analytics.orders
┌──────────┬─────────────┬────────────┬───────────┬────────────────────┬───────────────┬──────────────────────┬──────────────────┬────────┐
│ order_id ┆ customer_id ┆ order_date ┆ status ┆ credit_card_amount ┆ coupon_amount ┆ bank_transfer_amount ┆ gift_card_amount ┆ amount │
╞══════════╪═════════════╪════════════╪═══════════╪════════════════════╪═══════════════╪══════════════════════╪══════════════════╪════════╡
│ 1 ┆ 1 ┆ 2018-01-01 ┆ returned ┆ 10 ┆ 0 ┆ 0 ┆ 0 ┆ 10 │
│ 2 ┆ 3 ┆ 2018-01-02 ┆ completed ┆ 20 ┆ 0 ┆ 0 ┆ 0 ┆ 20 │
│ 4 ┆ 50 ┆ 2018-01-05 ┆ completed ┆ 0 ┆ 25 ┆ 0 ┆ 0 ┆ 25 │
│ 6 ┆ 54 ┆ 2018-01-07 ┆ completed ┆ 6 ┆ 0 ┆ 0 ┆ 0 ┆ 6 │
│ 8 ┆ 2 ┆ 2018-01-11 ┆ returned ┆ 23 ┆ 0 ┆ 0 ┆ 0 ┆ 23 │
│ 9 ┆ 53 ┆ 2018-01-12 ┆ completed ┆ 0 ┆ 0 ┆ 0 ┆ 23 ┆ 23 │
│ 11 ┆ 99 ┆ 2018-01-14 ┆ completed ┆ 27 ┆ 0 ┆ 0 ┆ 0 ┆ 27 │
│ 13 ┆ 84 ┆ 2018-01-17 ┆ completed ┆ 5 ┆ 0 ┆ 14 ┆ 0 ┆ 19 │
│ 15 ┆ 25 ┆ 2018-01-17 ┆ completed ┆ 0 ┆ 22 ┆ 0 ┆ 0 ┆ 22 │
│ 17 ┆ 71 ┆ 2018-01-18 ┆ completed ┆ 0 ┆ 0 ┆ 2 ┆ 0 ┆ 2 │
└──────────┴─────────────┴────────────┴───────────┴────────────────────┴───────────────┴──────────────────────┴──────────────────┴────────┘
99 rows, showing only 10 rows.
Run with —limit 0 to show all rows.
Configure Materialization
table
. To do so, add the following to your workspace YML:view
.
To do so, we’ll add one more includes block that overwrites the default materialization for any models in the staging
directory:Working set 4 model files, 4 .sdf files
Compiling jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Compiling jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Compiling jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Compiling jaffle_shop.analytics.orders (./models/analytics/orders.sql)
Finished 7 models [7 succeeded] in 0.990 secs
Migrate Tests
schema.yml
file within the staging/
directory in the DBT project to the staging/
directory in the SDF workspace. Next, let’s rename it to tests.sdf.yml
.The resulting workspace should look like this:tmp/jaffle_shop/
├── models
│ ├── analytics
│ │ └── orders.sql
│ ├── main.sql
│ ├── raw
│ │ └── seeds.sdf.yml
│ └── staging
│ ├── stg_customers.sql
│ ├── stg_orders.sql
│ ├── stg_payments.sql
│ └── tests.sdf.yml
├── seeds
│ ├── raw_customers.csv
│ ├── raw_orders.csv
│ └── raw_payments.csv
└── workspace.sdf.yml6 directories, 11 files
version: 2
at the top of the file.---
in between each block.tests.sdf.yml
should look like this:Working set 4 model files, 3 test files, 7 .sdf files
Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Running jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Testing jaffle_shop.staging.test_stg_orders (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_orders.sql)
Testing jaffle_shop.staging.test_stg_payments (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_payments.sql)
Testing jaffle_shop.staging.test_stg_customers (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_customers.sql)
Finished 6 models [6 succeeded], 3 tests [3 passed] in 1.050 secs
[Pass] Test jaffle_shop.staging.test_stg_orders
[Pass] Test jaffle_shop.staging.test_stg_payments
[Pass] Test jaffle_shop.staging.test_stg_customers
(Optional) Migrate the Rest
tmp/jaffle_shop_complete/
├── models
│ ├── analytics
│ │ ├── customers.sql
│ │ ├── meta.sdf.yml
│ │ └── orders.sql
│ ├── raw
│ │ └── seeds.sdf.yml
│ └── staging
│ ├── stg_customers.sql
│ ├── stg_orders.sql
│ ├── stg_payments.sql
│ └── tests.sdf.yml
├── seeds
│ ├── raw_customers.csv
│ ├── raw_orders.csv
│ └── raw_payments.csv
└── workspace.sdf.yml6 directories, 12 files
Working set 5 model files, 10 .sdf files
Compiling jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Compiling jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Compiling jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Compiling jaffle_shop.analytics.orders (./models/analytics/orders.sql)
Compiling jaffle_shop.analytics.customers (./models/analytics/customers.sql)
Finished 8 models [8 succeeded] in 1.108 secs
Working set 5 model files, 10 .sdf files
Running jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Running jaffle_shop.analytics.customers (./models/analytics/customers.sql)
Running jaffle_shop.analytics.orders (./models/analytics/orders.sql)
Finished 8 models [8 succeeded] in 1.236 secs
Working set 5 model files, 5 test files, 10 .sdf files
Running jaffle_shop.staging.stg_orders (./models/staging/stg_orders.sql)
Running jaffle_shop.staging.stg_payments (./models/staging/stg_payments.sql)
Running jaffle_shop.staging.stg_customers (./models/staging/stg_customers.sql)
Testing jaffle_shop.staging.test_stg_orders (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_orders.sql)
Testing jaffle_shop.staging.test_stg_payments (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_payments.sql)
Testing jaffle_shop.staging.test_stg_customers (./sdftarget/dbg/tests/jaffle_shop/staging/test_stg_customers.sql)
Running jaffle_shop.analytics.customers (./models/analytics/customers.sql)
Running jaffle_shop.analytics.orders (./models/analytics/orders.sql)
Testing jaffle_shop.analytics.test_orders (./sdftarget/dbg/tests/jaffle_shop/analytics/test_orders.sql)
Testing jaffle_shop.analytics.test_customers (./sdftarget/dbg/tests/jaffle_shop/analytics/test_customers.sql)
Finished 8 models [8 succeeded], 5 tests [5 passed] in 2.137 secs
[Pass] Test jaffle_shop.staging.test_stg_payments
[Pass] Test jaffle_shop.staging.test_stg_orders
[Pass] Test jaffle_shop.staging.test_stg_customers
[Pass] Test jaffle_shop.analytics.test_customers
[Pass] Test jaffle_shop.analytics.test_orders
|items
instead of .items()
if
Statements with no else
- An else
is required in ternary conditional statements.
dbt_utils.group_by
implementation
In DBT’s implementation, the else
is not required as seen here:
else
after loop.last
: