Overview
The goal of this tutorials series is to provide a guided way for you to explore SDF and understand how it can be integrated into your data workflows. We built SDF to be an intuitive and easy to use and we hope you’ll have fun exploring it. In our series of tutorial we will be working on a single SDF workspace - “Mom’s Flower Shop”. In this page, we will provide a setup guide as well as an overview of this workspace. We will even use SDF to conduct some initial exploration. This project was inspired by Fleurette Studio, one of our co-founders Elias’ mom’s boutique floral design studio located in Los Angeles, CA. Let’s get started!Prerequisites
- A Mac or Linux with a valid installation of the latest SDF version running locally.
- (Recommended) Having gone through our Getting Started Guide.
If using VSCode, SDF’s YML schema is available for type and syntax checking via the Red HAT YAML. This will
add auto-fill, type checking, and YML validation directly inline while editing
sdf.yml
files.Getting Started with Mom’s Flower Shop
1
Create the Sample Workspace
To create a new SDF Workspace containing Mom’s Flower Shop data, run the following command:After running the command, you will see the following output:
That’s it!

2
Workspace Overview
This workspace is powering the data warehouse of Mom’s Flower Shop.First, let’s open our terminal and change the directory. Run:Let’s review the files and folders in the directory:
.
├── checks
│ └── README.txt
├── classifications
│ ├── column_classifiers.sdf.yml
│ └── table_classifiers.sdf.yml
├── metadata
│ ├── analytics
│ │ ├── agg_installs_and_campaigns.sdf.yml
│ │ └── dim_marketing_campaigns.sdf.yml
│ ├── raw
│ │ ├── raw_addresses.sdf.yml
│ │ ├── raw_customers.sdf.yml
│ │ ├── raw_inapp_events.sdf.yml
│ │ └── raw_marketing_campaign_events.sdf.yml
│ └── staging
│ └── app_installs.sdf.yml
├── models
│ ├── analytics
│ │ └── agg_installs_and_campaigns.sql
│ ├── raw
│ │ ├── raw_addresses.sql
│ │ ├── raw_customers.sql
│ │ ├── raw_inapp_events.sql
│ │ └── raw_marketing_campaign_events.sql
│ └── staging
│ ├── app_installs.sql
│ ├── app_installs_v2.sql
│ ├── customers.sql
│ ├── inapp_events.sql
│ ├── marketing_campaigns.sql
│ └── stg_installs_per_campaign.sql
├── reports
│ └── README.txt
├── seeds
│ ├── csv
│ │ └── README.txt
│ └── parquet
│ ├── addresses.parquet
│ ├── customers.parquet
│ ├── inapp_events.parquet
│ └── marketing_campaign_events.parquet
└── workspace.sdf.yml15 directories, 28 files
To follow along, run
ls
or tree
in your terminal.- Raw data seeds are available in the
seeds
folder. - Models (SQL files) are available in the
models
folder. - The workspace is defined in the
workspace.sdf.yml
configuration file.
3
Source Tables
Let’s explore the tables. In your terminal, run:SDF will print out the full description of the raw tables (seeds) that exist in our workspace:We can also get a sample of each table if we run the command:For example, the results for ** Note that these files are randomly generated and do not contain any real data
When we run SDF compile, our engine validates SQL syntax and dependencies correctness.
In the example above, SDF is set to compile models under
models/raw
, but if we
run sdf compile
alone we will instantly guarantee a successful execution of all models
in the warehouse with a single command, running locally with lightning speed.- Customers
- Marketing campaigns
- Mobile in-app events
- Street addresses
raw_customers
look like this:Schema moms_flower_shop.raw.raw_customers
All relevant information related to customers known to mom s flower shop. This information comes from the user input into the mobile app.┌─────────────┬───────────┬────────────┬────────────────────────────────────────────────────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪═══════════╪════════════╪════════════════════════════════════════════════════════════╡
│ id ┆ bigint ┆ ┆ A unique identifier of a mom s floewr shop customer │
│ first_name ┆ varchar ┆ ┆ The first name of the customer │
│ last_name ┆ varchar ┆ ┆ The last name of the customer │
│ email ┆ varchar ┆ ┆ The email of the customer │
│ gender ┆ varchar ┆ ┆ The gender of the customer │
│ address_id ┆ bigint ┆ ┆ The address_id of the address associated with the customer │
└─────────────┴───────────┴────────────┴────────────────────────────────────────────────────────────┘
raw_customers
look like this:Table moms_flower_shop.raw.raw_customers
┌────┬────────────┬─────────────┬─────────────────────────────┬─────────────┬────────────┐
│ id ┆ first_name ┆ last_name ┆ email ┆ gender ┆ address_id │
╞════╪════════════╪═════════════╪═════════════════════════════╪═════════════╪════════════╡
│ 1 ┆ Willi ┆ Vannuchi ┆ wvannuchi0@engadget.com ┆ Male ┆ 73 │
│ 2 ┆ Opalina ┆ Christophle ┆ ochristophle1@sina.com.cn ┆ Female ┆ 17 │
│ 3 ┆ Parke ┆ Tackley ┆ ptackley2@deliciousdays.com ┆ Male ┆ 89 │
│ 4 ┆ Berton ┆ Klamp ┆ bklamp3@desdev.cn ┆ Male ┆ 99 │
│ 5 ┆ Taryn ┆ Brook ┆ tbrook4@flavors.me ┆ Female ┆ 86 │
│ 6 ┆ Moina ┆ Crake ┆ mcrake5@uol.com.br ┆ Female ┆ 38 │
│ 7 ┆ Robbyn ┆ Chezelle ┆ rchezelle6@themeforest.net ┆ Female ┆ 92 │
│ 8 ┆ Carce ┆ Clendening ┆ cclendening7@chron.com ┆ Male ┆ 64 │
│ 9 ┆ Pennie ┆ Shilton ┆ pshilton8@mapy.cz ┆ Genderqueer ┆ 13 │
│ 10 ┆ Brig ┆ Graal ┆ bgraal9@microsoft.com ┆ Male ┆ 38 │
└────┴────────────┴─────────────┴─────────────────────────────┴─────────────┴────────────┘
1000 rows, showing only 10 rows.
Run with —limit 0 to show all rows.
In this workspace we are using sample data stored locally for ease of use.When setting up your own workspace, you can connect existing data providers to SDF.
Follow the relevant provider’s guide to get started.
4
Other Models
We can also explore the other models that are found under the The output should look like this:Let’s see how it could’ve looked like without SDF’s optimization. First we need
to clean cache. Run:Now, compile again by running:Notice the difference? Imagine running unoptimized compilations on a warehouse of your magnitude.To see the schema of each table, similarly to the source tables in the previous step,
simply add the flag
models
directory.To see the full list of models displayed in the order of execution, we can run:Working set 11 model files, 22 .sdf files
Compiling moms_flower_shop.staging.inapp_events (./models/staging/inapp_events.sql)
Compiling moms_flower_shop.staging.marketing_campaigns (./models/staging/marketing_campaigns.sql)
Compiling moms_flower_shop.staging.app_installs (./models/staging/app_installs.sql)
Compiling moms_flower_shop.staging.app_installs_v2 (./models/staging/app_installs_v2.sql)
Compiling moms_flower_shop.analytics.agg_installs_and_campaigns (./models/analytics/agg_installs_and_campaigns.sql)
Compiling moms_flower_shop.staging.customers (./models/staging/customers.sql)
Compiling moms_flower_shop.staging.stg_installs_per_campaign (./models/staging/stg_installs_per_campaign.sql)
Finished 11 models [7 succeeded, 4 reused] in 0.931 secs
SDF intelligently caches previous compilations. Since we compiled the source models
under
models/raw
in previous steps, those will not be re-compiled in this run.Working set 11 model files, 22 .sdf files
Compiling moms_flower_shop.raw.raw_addresses (./models/raw/raw_addresses.sql)
Compiling moms_flower_shop.raw.raw_customers (./models/raw/raw_customers.sql)
Compiling moms_flower_shop.raw.raw_marketing_campaign_events (./models/raw/raw_marketing_campaign_events.sql)
Compiling moms_flower_shop.raw.raw_inapp_events (./models/raw/raw_inapp_events.sql)
Compiling moms_flower_shop.staging.marketing_campaigns (./models/staging/marketing_campaigns.sql)
Compiling moms_flower_shop.staging.inapp_events (./models/staging/inapp_events.sql)
Compiling moms_flower_shop.staging.app_installs_v2 (./models/staging/app_installs_v2.sql)
Compiling moms_flower_shop.staging.app_installs (./models/staging/app_installs.sql)
Compiling moms_flower_shop.analytics.agg_installs_and_campaigns (./models/analytics/agg_installs_and_campaigns.sql)
Compiling moms_flower_shop.staging.customers (./models/staging/customers.sql)
Compiling moms_flower_shop.staging.stg_installs_per_campaign (./models/staging/stg_installs_per_campaign.sql)
Finished 11 models [11 succeeded] in 0.910 secs
--show all
to the command: