Overview

The goal of this tutorials series is to provide a guided way for you to explore SDF and understand how it can be integrated into your data workflows. We built SDF to be an intuitive and easy to use and we hope you’ll have fun exploring it.

In our series of tutorial we will be working on a single SDF workspace - “Mom’s Flower Shop”. In this page, we will provide a setup guide as well as an overview of this workspace. We will even use SDF to conduct some initial exploration.

This project was inspired by Fleurette Studio, one of our co-founders Elias’ mom’s boutique floral design studio located in Los Angeles, CA.

Let’s get started!

Prerequisites

For the sake of the tutorials, there is no need to connect to a database or to leverage any compute engine other than your own laptop.

If using VSCode, SDF’s YML schema is available for type and syntax checking via the Red HAT YAML. This will add auto-fill, type checking, and YML validation directly inline while editing sdf.yml files.

Getting Started with Mom’s Flower Shop

1

Create the Sample Workspace

To create a new SDF Workspace containing Mom’s Flower Shop data, run the following command:

sdf new --sample moms_flower_shop

After running the command, you will see the following output:

That’s it!

2

Workspace Overview

This workspace is powering the data warehouse of Mom’s Flower Shop.

First, let’s open our terminal and change the directory. Run:

cd moms_flower_shop

Let’s review the files and folders in the directory:

. ├── checks │   └── README.txt ├── classifications │   ├── column_classifiers.sdf.yml │   └── table_classifiers.sdf.yml ├── metadata │   ├── analytics │   │   ├── agg_installs_and_campaigns.sdf.yml │   │   └── dim_marketing_campaigns.sdf.yml │   ├── raw │   │   ├── raw_addresses.sdf.yml │   │   ├── raw_customers.sdf.yml │   │   ├── raw_inapp_events.sdf.yml │   │   └── raw_marketing_campaign_events.sdf.yml │   └── staging │       └── app_installs.sdf.yml ├── models │   ├── analytics │   │   └── agg_installs_and_campaigns.sql │   ├── raw │   │   ├── raw_addresses.sql │   │   ├── raw_customers.sql │   │   ├── raw_inapp_events.sql │   │   └── raw_marketing_campaign_events.sql │   └── staging │       ├── app_installs.sql │       ├── app_installs_v2.sql │       ├── customers.sql │       ├── inapp_events.sql │       ├── marketing_campaigns.sql │       └── stg_installs_per_campaign.sql ├── reports │   └── README.txt ├── seeds │   ├── csv │   │   └── README.txt │   └── parquet │       ├── addresses.parquet │       ├── customers.parquet │       ├── inapp_events.parquet │       └── marketing_campaign_events.parquet └── workspace.sdf.yml

14 directories, 28 files

To follow along, run ls or tree in your terminal.

  • Raw data seeds are available in the seeds folder.
  • Models (SQL files) are available in the models folder.
  • The workspace is defined in the workspace.sdf.yml configuration file.

Let’s ignore the rest of the directory for now. We will get back to those in later tutorials.

3

Source Tables

Let’s explore the tables. In your terminal, run:

sdf compile models/raw

When we run SDF compile, our engine validates SQL syntax and dependencies correctness. In the example above, SDF is set to compile models under models/raw, but but if we run sdf compile alone we will instantly guarantee a successful execution of all models in the warehouse with a single command, running locally with lightning speed.

SDF will print out the full description of the raw tables (seeds) that exist in our workspace:

  1. Customers
  2. Marketing campaigns
  3. Mobile in-app events
  4. Street addresses

For example, the results for raw_customers look like this:

Schema moms_flower_shop.raw.raw_customers        All relevant information related to customers known to mom s flower shop.  This information comes from the user input into the mobile app.

All relevant information related to customers known to mom s flower shop.  This information comes from the user input into the mobile app.

┌─────────────┬───────────┬────────────┬────────────────────────────────────────────────────────────┐ │ column_name ┆ data_type ┆ classifier ┆ description                                                │ ╞═════════════╪═══════════╪════════════╪════════════════════════════════════════════════════════════╡ │ id          ┆ bigint    ┆            ┆ A unique identifier of a mom s floewr shop customer        │ │             ┆           ┆            ┆ A unique identifier of a mom s floewr shop customer        │ │ first_name  ┆ varchar   ┆            ┆ The first name of the customer                             │ │             ┆           ┆            ┆ The first name of the customer                             │ │ last_name   ┆ varchar   ┆            ┆ The last name of the customer                              │ │             ┆           ┆            ┆ The last name of the customer                              │ │ email       ┆ varchar   ┆            ┆ The email of the customer                                  │ │             ┆           ┆            ┆ The email of the customer                                  │ │ gender      ┆ varchar   ┆            ┆ The gender of the customer                                 │ │             ┆           ┆            ┆ The gender of the customer                                 │ │ address_id  ┆ bigint    ┆            ┆ The address_id of the address associated with the customer │ │             ┆           ┆            ┆ The address_id of the address associated with the customer │ └─────────────┴───────────┴────────────┴────────────────────────────────────────────────────────────┘

We can also get a sample of each table if we run the command:

sdf run models/raw

For example, the results for raw_customers look like this:

Table moms_flower_shop.raw.raw_customers ┌────┬────────────┬─────────────┬─────────────────────────────┬─────────────┬────────────┐ │ id ┆ first_name ┆ last_name   ┆ email                       ┆ gender      ┆ address_id │ ╞════╪════════════╪═════════════╪═════════════════════════════╪═════════════╪════════════╡ │ 1  ┆ Willi      ┆ Vannuchi    ┆ wvannuchi0@engadget.com     ┆ Male        ┆ 73         │ │ 2  ┆ Opalina    ┆ Christophle ┆ ochristophle1@sina.com.cn   ┆ Female      ┆ 17         │ │ 3  ┆ Parke      ┆ Tackley     ┆ ptackley2@deliciousdays.com ┆ Male        ┆ 89         │ │ 4  ┆ Berton     ┆ Klamp       ┆ bklamp3@desdev.cn           ┆ Male        ┆ 99         │ │ 5  ┆ Taryn      ┆ Brook       ┆ tbrook4@flavors.me          ┆ Female      ┆ 86         │ │ 6  ┆ Moina      ┆ Crake       ┆ mcrake5@uol.com.br          ┆ Female      ┆ 38         │ │ 7  ┆ Robbyn     ┆ Chezelle    ┆ rchezelle6@themeforest.net  ┆ Female      ┆ 92         │ │ 8  ┆ Carce      ┆ Clendening  ┆ cclendening7@chron.com      ┆ Male        ┆ 64         │ │ 9  ┆ Pennie     ┆ Shilton     ┆ pshilton8@mapy.cz           ┆ Genderqueer ┆ 13         │ │ 10 ┆ Brig       ┆ Graal       ┆ bgraal9@microsoft.com       ┆ Male        ┆ 38         │ └────┴────────────┴─────────────┴─────────────────────────────┴─────────────┴────────────┘ 1000 rows, showing only 10 rows.   Run with —limit 0 to show all rows.

** Note that these files are randomly generated and do not contain any real data

In this workspace we are using sample data stored locally for ease of use.

When setting up your own workspace, you can connect existing data providers to SDF. Follow the relevant provider’s guide to get started.

4

Other Models

We can also explore the other models that are found under the models directory.

To see the full list of models displayed in the order of execution, we can run:

sdf compile

The output should look like this:

Working set 11 model files, 22 .sdf files   Compiling moms_flower_shop.staging.inapp_events (./models/staging/inapp_events.sql)   Compiling moms_flower_shop.staging.marketing_campaigns (./models/staging/marketing_campaigns.sql)   Compiling moms_flower_shop.staging.app_installs (./models/staging/app_installs.sql)   Compiling moms_flower_shop.staging.app_installs_v2 (./models/staging/app_installs_v2.sql)   Compiling moms_flower_shop.analytics.agg_installs_and_campaigns (./models/analytics/agg_installs_and_campaigns.sql)   Compiling moms_flower_shop.staging.customers (./models/staging/customers.sql)   Compiling moms_flower_shop.staging.stg_installs_per_campaign (./models/staging/stg_installs_per_campaign.sql)    Finished 11 models [7 succeeded, 4 reused] in 0.866 secs

SDF intelligently caches previous compilations. Since we compiled the source models under models/raw in previous steps, those will not be re-compiled in this run.

Let’s see how it could’ve looked like without SDF’s optimization. First we need to clean cache. Run:

sdf clean

Now, compile again by running:

sdf compile

Notice the difference? Imagine running unoptimized compilations on a warehouse of your magnitude.

Working set 11 model files, 22 .sdf files   Compiling moms_flower_shop.raw.raw_inapp_events (./models/raw/raw_inapp_events.sql)   Compiling moms_flower_shop.raw.raw_addresses (./models/raw/raw_addresses.sql)   Compiling moms_flower_shop.raw.raw_marketing_campaign_events (./models/raw/raw_marketing_campaign_events.sql)   Compiling moms_flower_shop.raw.raw_customers (./models/raw/raw_customers.sql)   Compiling moms_flower_shop.staging.inapp_events (./models/staging/inapp_events.sql)   Compiling moms_flower_shop.staging.marketing_campaigns (./models/staging/marketing_campaigns.sql)   Compiling moms_flower_shop.staging.app_installs (./models/staging/app_installs.sql)   Compiling moms_flower_shop.staging.app_installs_v2 (./models/staging/app_installs_v2.sql)   Compiling moms_flower_shop.analytics.agg_installs_and_campaigns (./models/analytics/agg_installs_and_campaigns.sql)   Compiling moms_flower_shop.staging.customers (./models/staging/customers.sql)   Compiling moms_flower_shop.staging.stg_installs_per_campaign (./models/staging/stg_installs_per_campaign.sql)    Finished 11 models [11 succeeded] in 0.869 secs

To see the schema of each table, similarly to the source tables in the previous step, simply add the flag --show all to the command:

sdf compile --show all

Next Steps

Let’s continue in our journey to explore SDF:

  1. Creating a model
  2. Debugging
  3. Deprecating a model
  4. Enriching your warehouse