Overview
Explore SDF’s series of tutorials and get to know our tool a little better.
Overview
The goal of this tutorials series is to provide a guided way for you to explore SDF and understand how it can be integrated into your data workflows. We built SDF to be an intuitive and easy to use and we hope you’ll have fun exploring it.
In our series of tutorial we will be working on a single SDF workspace - “Mom’s Flower Shop”. In this page, we will provide a setup guide as well as an overview of this workspace. We will even use SDF to conduct some initial exploration.
This project was inspired by Fleurette Studio, one of our co-founders Elias’ mom’s boutique floral design studio located in Los Angeles, CA.
Let’s get started!
Prerequisites
- A Mac or Linux with a valid installation of the latest SDF version running locally.
- (Recommended) Having gone through our Getting Started Guide.
For the sake of the tutorials, there is no need to connect to a database or to leverage any compute engine other than your own laptop.
If using VSCode, SDF’s YML schema is available for type and syntax checking via the Red HAT YAML. This will
add auto-fill, type checking, and YML validation directly inline while editing sdf.yml
files.
Getting Started with Mom’s Flower Shop
Create the Sample Workspace
To create a new SDF Workspace containing Mom’s Flower Shop data, run the following command:
After running the command, you will see the following output:
That’s it!
Workspace Overview
This workspace is powering the data warehouse of Mom’s Flower Shop.
First, let’s open our terminal and change the directory. Run:
Let’s review the files and folders in the directory:
.
├── checks
│ └── README.txt
├── classifications
│ ├── column_classifiers.sdf.yml
│ └── table_classifiers.sdf.yml
├── metadata
│ ├── analytics
│ │ ├── agg_installs_and_campaigns.sdf.yml
│ │ └── dim_marketing_campaigns.sdf.yml
│ ├── raw
│ │ ├── raw_addresses.sdf.yml
│ │ ├── raw_customers.sdf.yml
│ │ ├── raw_inapp_events.sdf.yml
│ │ └── raw_marketing_campaign_events.sdf.yml
│ └── staging
│ └── app_installs.sdf.yml
├── models
│ ├── analytics
│ │ └── agg_installs_and_campaigns.sql
│ ├── raw
│ │ ├── raw_addresses.sql
│ │ ├── raw_customers.sql
│ │ ├── raw_inapp_events.sql
│ │ └── raw_marketing_campaign_events.sql
│ └── staging
│ ├── app_installs.sql
│ ├── app_installs_v2.sql
│ ├── customers.sql
│ ├── inapp_events.sql
│ ├── marketing_campaigns.sql
│ └── stg_installs_per_campaign.sql
├── reports
│ └── README.txt
├── seeds
│ ├── csv
│ │ └── README.txt
│ └── parquet
│ ├── addresses.parquet
│ ├── customers.parquet
│ ├── inapp_events.parquet
│ └── marketing_campaign_events.parquet
└── workspace.sdf.yml
14 directories, 28 files
To follow along, run ls
or tree
in your terminal.
- Raw data seeds are available in the
seeds
folder. - Models (SQL files) are available in the
models
folder. - The workspace is defined in the
workspace.sdf.yml
configuration file.
Let’s ignore the rest of the directory for now. We will get back to those in later tutorials.
Source Tables
Let’s explore the tables. In your terminal, run:
When we run SDF compile, our engine validates SQL syntax and dependencies correctness.
In the example above, SDF is set to compile models under models/raw
, but if we
run sdf compile
alone we will instantly guarantee a successful execution of all models
in the warehouse with a single command, running locally with lightning speed.
SDF will print out the full description of the raw tables (seeds) that exist in our workspace:
- Customers
- Marketing campaigns
- Mobile in-app events
- Street addresses
For example, the results for raw_customers
look like this:
Schema moms_flower_shop.raw.raw_customers
All relevant information related to customers known to mom s flower shop. This information comes from the user input into the mobile app.
┌─────────────┬───────────┬────────────┬────────────────────────────────────────────────────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪═══════════╪════════════╪════════════════════════════════════════════════════════════╡
│ id ┆ bigint ┆ ┆ A unique identifier of a mom s floewr shop customer │
│ first_name ┆ varchar ┆ ┆ The first name of the customer │
│ last_name ┆ varchar ┆ ┆ The last name of the customer │
│ email ┆ varchar ┆ ┆ The email of the customer │
│ gender ┆ varchar ┆ ┆ The gender of the customer │
│ address_id ┆ bigint ┆ ┆ The address_id of the address associated with the customer │
└─────────────┴───────────┴────────────┴────────────────────────────────────────────────────────────┘
We can also get a sample of each table if we run the command:
For example, the results for raw_customers
look like this:
Table moms_flower_shop.raw.raw_customers
┌────┬────────────┬─────────────┬─────────────────────────────┬─────────────┬────────────┐
│ id ┆ first_name ┆ last_name ┆ email ┆ gender ┆ address_id │
╞════╪════════════╪═════════════╪═════════════════════════════╪═════════════╪════════════╡
│ 1 ┆ Willi ┆ Vannuchi ┆ wvannuchi0@engadget.com ┆ Male ┆ 73 │
│ 2 ┆ Opalina ┆ Christophle ┆ ochristophle1@sina.com.cn ┆ Female ┆ 17 │
│ 3 ┆ Parke ┆ Tackley ┆ ptackley2@deliciousdays.com ┆ Male ┆ 89 │
│ 4 ┆ Berton ┆ Klamp ┆ bklamp3@desdev.cn ┆ Male ┆ 99 │
│ 5 ┆ Taryn ┆ Brook ┆ tbrook4@flavors.me ┆ Female ┆ 86 │
│ 6 ┆ Moina ┆ Crake ┆ mcrake5@uol.com.br ┆ Female ┆ 38 │
│ 7 ┆ Robbyn ┆ Chezelle ┆ rchezelle6@themeforest.net ┆ Female ┆ 92 │
│ 8 ┆ Carce ┆ Clendening ┆ cclendening7@chron.com ┆ Male ┆ 64 │
│ 9 ┆ Pennie ┆ Shilton ┆ pshilton8@mapy.cz ┆ Genderqueer ┆ 13 │
│ 10 ┆ Brig ┆ Graal ┆ bgraal9@microsoft.com ┆ Male ┆ 38 │
└────┴────────────┴─────────────┴─────────────────────────────┴─────────────┴────────────┘
1000 rows, showing only 10 rows.
Run with —limit 0 to show all rows.
** Note that these files are randomly generated and do not contain any real data
In this workspace we are using sample data stored locally for ease of use.
When setting up your own workspace, you can connect existing data providers to SDF. Follow the relevant provider’s guide to get started.
Other Models
We can also explore the other models that are found under the models
directory.
To see the full list of models displayed in the order of execution, we can run:
The output should look like this:
Working set 11 model files, 22 .sdf files
Compiling moms_flower_shop.staging.marketing_campaigns (./models/staging/marketing_campaigns.sql)
Compiling moms_flower_shop.staging.inapp_events (./models/staging/inapp_events.sql)
Compiling moms_flower_shop.staging.app_installs (./models/staging/app_installs.sql)
Compiling moms_flower_shop.staging.app_installs_v2 (./models/staging/app_installs_v2.sql)
Compiling moms_flower_shop.analytics.agg_installs_and_campaigns (./models/analytics/agg_installs_and_campaigns.sql)
Compiling moms_flower_shop.staging.customers (./models/staging/customers.sql)
Compiling moms_flower_shop.staging.stg_installs_per_campaign (./models/staging/stg_installs_per_campaign.sql)
Finished 11 models [7 succeeded, 4 reused] in 0.937 secs
SDF intelligently caches previous compilations. Since we compiled the source models
under models/raw
in previous steps, those will not be re-compiled in this run.
Let’s see how it could’ve looked like without SDF’s optimization. First we need to clean cache. Run:
Now, compile again by running:
Notice the difference? Imagine running unoptimized compilations on a warehouse of your magnitude.
Working set 11 model files, 22 .sdf files
Compiling moms_flower_shop.raw.raw_addresses (./models/raw/raw_addresses.sql)
Compiling moms_flower_shop.raw.raw_customers (./models/raw/raw_customers.sql)
Compiling moms_flower_shop.raw.raw_marketing_campaign_events (./models/raw/raw_marketing_campaign_events.sql)
Compiling moms_flower_shop.raw.raw_inapp_events (./models/raw/raw_inapp_events.sql)
Compiling moms_flower_shop.staging.marketing_campaigns (./models/staging/marketing_campaigns.sql)
Compiling moms_flower_shop.staging.inapp_events (./models/staging/inapp_events.sql)
Compiling moms_flower_shop.staging.app_installs (./models/staging/app_installs.sql)
Compiling moms_flower_shop.staging.app_installs_v2 (./models/staging/app_installs_v2.sql)
Compiling moms_flower_shop.staging.customers (./models/staging/customers.sql)
Compiling moms_flower_shop.analytics.agg_installs_and_campaigns (./models/analytics/agg_installs_and_campaigns.sql)
Compiling moms_flower_shop.staging.stg_installs_per_campaign (./models/staging/stg_installs_per_campaign.sql)
Finished 11 models [11 succeeded] in 0.913 secs
To see the schema of each table, similarly to the source tables in the previous step,
simply add the flag --show all
to the command:
Next Steps
Let’s continue in our journey to explore SDF: