Overview

In this guide, we’ll materialize our first tables and views in BigQuery with SDF. We’ll be using a public dataset from BigQuery’s marketplace called NYC TLC Trips to demonstrate how to materialize tables and views in BigQuery with SDF.

Prerequisites

This guide should be followed after completing the Getting Started with BigQuery and SDF guide.

Before beginning, we’ll need to make sure we have the following:

  • A GCP account & project with billing enabled
  • Valid BigQuery service account credentials with write access to at least one database we can materialize tables to.
  • Instantiated credentials completed in the previous guide.

Guide

1

Create a New SDF Project from Our BigQuery Starter

Create a new SDF project using the Cybersyn Tech Innovation sample. This will create a new project in your current working directory with the sample project files.

sdf new --sample bigquery_starter 
2

Replace Project ID with Your Project ID

Next, let’s replace the project ID in the workspace.sdf.yml file with your own project ID. This is the project ID we’ll read table metadata from and materialize tables to.

On line 13 of the workspace.sdf.yml, replace <REPLACE_WITH_BQ_PROJECT> with the project ID from your BigQuery account.

The catalog in SDF is equivalent to project ID in BigQuery.

3

Run the Starter Script in BigQuery

Now that we’ve replaced the project ID, let’s run the starter script in BigQuery to create source tables for our models. You can find a script in the bigquery_starter directory called run_me_in_bq.sql.

Open the script and copy the contents. Then, navigate to the BigQuery console and paste the script into the query editor. Make sure you run the script in the same project as the project ID specified in the previous step.

The script will create a new dataset called sdf_ecommerce_example with three source tables: orders, products, and customers.

Click below to see the contents of the script:

4

Compile your Workspace

Let’s try compiling our workspace. This will pull down the table schemas for the tables we created in the previous step with our starter script.

sdf compile 

Working set 2 model files, 1 .sdf file Downloading “integration-test-sdf”.sdf_ecommerce_example.orders (schema) Downloading “integration-test-sdf”.sdf_ecommerce_example.products (schema) Downloading “integration-test-sdf”.sdf_ecommerce_example.customers (schema)   Compiling integration-test-sdf.sdf_ecommerce_example.aggregate_orders (./models/aggregate_orders.sql)   Compiling integration-test-sdf.sdf_ecommerce_example.customers_over_100 (./models/customers_over_100.sql)    Finished 5 models [2 succeeded, 3 downloaded] in 1.978 secs

Great! We’ve compiled our workspace successfully, thereby validating all SQL models in the workspace without running them against the database.

If you do not see a successful compilation, please ensure you’ve followed the Getting Started with BigQuery and SDF guide to authenticate to your BigQuery instance.

5

Materialize the Models in BigQuery

Now that we’ve successful compiled our workspace, it’s time to trying materializing them remotely.

You’ll notice in the workspace.sdf.yml file specifies the following integrations block:

integrations:
  - provider: bigquery
    type: database
    sources:
        - pattern: "*.sdf_ecommerce_example.*"
    targets:
        - pattern: "*.sdf_ecommerce_example.*"

This tells SDF to hydrate missing table schemas from the sdf_ecommerce_example dataset and also materialize tables to the sdf_ecommerce_example dataset. For more information on integration configuration, see the integration documentation.

We have two models defined locally, and since we haven’t set their materialization, they will default to view.

Now, let’s try running the models to materialize them in BigQuery.

sdf run

Working set 2 model files, 1 .sdf file Downloading “integration-test-sdf”.sdf_ecommerce_example.customers (schema) Downloading “integration-test-sdf”.sdf_ecommerce_example.orders (schema) Downloading “integration-test-sdf”.sdf_ecommerce_example.products (schema)     Running integration-test-sdf.sdf_ecommerce_example.aggregate_orders (./models/aggregate_orders.sql)     Running integration-test-sdf.sdf_ecommerce_example.customers_over_100 (./models/customers_over_100.sql)    Finished 5 models [2 succeeded, 3 downloaded] in 3.621 secs

This is great, but what if we want to see the data our views produced?

Let’s add small flag to sdf run that tells SDF to pull down a sample of the data from the views we just materialized.

sdf run --show all

Working set 2 model files, 1 .sdf file Downloading “integration-test-sdf”.sdf_ecommerce_example.products (schema) Downloading “integration-test-sdf”.sdf_ecommerce_example.orders (schema) Downloading “integration-test-sdf”.sdf_ecommerce_example.customers (schema)     Running integration-test-sdf.sdf_ecommerce_example.aggregate_orders (./models/aggregate_orders.sql)     Running integration-test-sdf.sdf_ecommerce_example.customers_over_100 (./models/customers_over_100.sql)    Finished 5 models [2 succeeded, 3 downloaded] in 7.014 secs

Table “integration-test-sdf”.sdf_ecommerce_example.customers ┌─────────────┬───────────────┬───────────────────────┬─────────────────────┐ │ customer_id ┆ customer_name ┆ email                 ┆ signup_date         │ ╞═════════════╪═══════════════╪═══════════════════════╪═════════════════════╡ │ 1           ┆ John Doe      ┆ johndoe@example.com   ┆ 2023-01-15T00:00:00 │ │ 2           ┆ Jane Smith    ┆ janesmith@example.com ┆ 2023-02-10T00:00:00 │ │ 3           ┆ Sam Johnson   ┆ samj@example.com      ┆ 2023-03-05T00:00:00 │ └─────────────┴───────────────┴───────────────────────┴─────────────────────┘ 3 rows.

Table “integration-test-sdf”.sdf_ecommerce_example.products ┌────────────┬──────────────┬────────┬─────────────┐ │ product_id ┆ product_name ┆ price  ┆ category    │ ╞════════════╪══════════════╪════════╪═════════════╡ │ 1          ┆ Laptop       ┆ 999.99 ┆ Electronics │ │ 2          ┆ Headphones   ┆ 199.99 ┆ Electronics │ │ 3          ┆ Coffee Maker ┆ 89.99  ┆ Appliances  │ └────────────┴──────────────┴────────┴─────────────┘ 3 rows.

Table “integration-test-sdf”.sdf_ecommerce_example.orders ┌──────────┬─────────────┬────────────┬─────────────────────┬──────────┐ │ order_id ┆ customer_id ┆ product_id ┆ order_date          ┆ quantity │ ╞══════════╪═════════════╪════════════╪═════════════════════╪══════════╡ │ 1        ┆ 1           ┆ 1          ┆ 2023-06-15T00:00:00 ┆ 1        │ │ 2        ┆ 2           ┆ 3          ┆ 2023-07-01T00:00:00 ┆ 2        │ │ 3        ┆ 3           ┆ 2          ┆ 2023-08-03T00:00:00 ┆ 1        │ └──────────┴─────────────┴────────────┴─────────────────────┴──────────┘ 3 rows.

Table “integration-test-sdf”.sdf_ecommerce_example.aggregate_orders ┌─────────────┬───────────────┬──────────────┬─────────────┐ │ customer_id ┆ customer_name ┆ total_orders ┆ total_spent │ ╞═════════════╪═══════════════╪══════════════╪═════════════╡ │ 1           ┆ John Doe      ┆ 1            ┆ 999.99      │ │ 2           ┆ Jane Smith    ┆ 1            ┆ 179.98      │ │ 3           ┆ Sam Johnson   ┆ 1            ┆ 199.99      │ └─────────────┴───────────────┴──────────────┴─────────────┘ 3 rows.

Table “integration-test-sdf”.sdf_ecommerce_example.customers_over_100 ┌─────────────┬───────────────┬──────────────┬─────────────┐ │ customer_id ┆ customer_name ┆ total_orders ┆ total_spent │ ╞═════════════╪═══════════════╪══════════════╪═════════════╡ │ 1           ┆ John Doe      ┆ 1            ┆ 999.99      │ │ 2           ┆ Jane Smith    ┆ 1            ┆ 179.98      │ │ 3           ┆ Sam Johnson   ┆ 1            ┆ 199.99      │ └─────────────┴───────────────┴──────────────┴─────────────┘ 3 rows.

Congratulations! You’ve just materialized your first tables in BigQuery with SDF, and even got a sneak peek at the data they produced.