Basic Materialization
Materialize tables and views in BigQuery with SDF.
Overview
In this guide, we’ll materialize our first tables and views in BigQuery with SDF.
Prerequisites
This guide should be followed after completing the Getting Started with BigQuery and SDF guide.
Before beginning, we’ll need to make sure we have the following:
- A GCP account & project with billing enabled
- Valid BigQuery service account credentials with write access to at least one database we can materialize tables to.
- Instantiated credentials completed in the previous guide.
Guide
Create a New SDF Project from Our BigQuery Starter
Create a new SDF project using the BigQuery starter sample. This will create a new project in your current working directory with the sample project files.
Replace Project ID with Your Project ID
Next, let’s replace the project ID in the workspace.sdf.yml
file with your own project ID. This is the project ID we’ll read table metadata from and materialize tables to.
On line 13 of the workspace.sdf.yml
, replace <REPLACE_WITH_BQ_PROJECT>
with the project ID from your BigQuery account.
The catalog
in SDF is equivalent to project ID in BigQuery.
Run the Starter Script in BigQuery
Now that we’ve replaced the project ID, let’s run the starter script in BigQuery to create source tables for our models. You can find a script in the bigquery_starter
directory called run_me_in_bq.sql
.
Open the script and copy the contents. Then, navigate to the BigQuery console and paste the script into the query editor. Make sure you run the script in the same project as the project ID specified in the previous step.
The script will create a new dataset called sdf_ecommerce_example
with three source tables: orders
, products
, and customers
.
Click below to see the contents of the script:
Compile your Workspace
Let’s try compiling our workspace. This will pull down the table schemas for the tables we created in the previous step with our starter script.
Working set 2 model files, 1 .sdf file
Downloading “integration-test-sdf”.sdf_ecommerce_example.orders (schema)
Downloading “integration-test-sdf”.sdf_ecommerce_example.customers (schema)
Downloading “integration-test-sdf”.sdf_ecommerce_example.products (schema)
Compiling integration-test-sdf.sdf_ecommerce_example.aggregate_orders (./models/aggregate_orders.sql)
Compiling integration-test-sdf.sdf_ecommerce_example.customers_over_100 (./models/customers_over_100.sql)
Finished 5 models [5 succeeded] in 2.953 secs
Great! We’ve compiled our workspace successfully, thereby validating all SQL models in the workspace without running them against the database.
If you do not see a successful compilation, please ensure you’ve followed the Getting Started with BigQuery and SDF guide to authenticate to your BigQuery instance.
Materialize the Models in BigQuery
Now that we’ve successful compiled our workspace, it’s time to trying materializing them remotely.
You’ll notice in the workspace.sdf.yml
file specifies the following integrations block:
This tells SDF to hydrate missing table schemas from the sdf_ecommerce_example
dataset and also materialize tables to the sdf_ecommerce_example
dataset. For more information on integration configuration, see the integration documentation.
We have two models defined locally, and since we haven’t set their materialization, they will default to view
.
Now, let’s try running the models to materialize them in BigQuery.
Working set 2 model files, 1 .sdf file
Downloading “integration-test-sdf”.sdf_ecommerce_example.customers (schema)
Downloading “integration-test-sdf”.sdf_ecommerce_example.products (schema)
Downloading “integration-test-sdf”.sdf_ecommerce_example.orders (schema)
Running integration-test-sdf.sdf_ecommerce_example.aggregate_orders (./models/aggregate_orders.sql)
Running integration-test-sdf.sdf_ecommerce_example.customers_over_100 (./models/customers_over_100.sql)
Finished 5 models [5 succeeded] in 5.429 secs
This is great, but what if we want to see the data our views produced?
Let’s add small flag to sdf run
that tells SDF to pull down a sample of the data from the views we just materialized.
Working set 2 model files, 1 .sdf file
Downloading “integration-test-sdf”.sdf_ecommerce_example.products (schema)
Downloading “integration-test-sdf”.sdf_ecommerce_example.orders (schema)
Downloading “integration-test-sdf”.sdf_ecommerce_example.customers (schema)
Running integration-test-sdf.sdf_ecommerce_example.aggregate_orders (./models/aggregate_orders.sql)
Running integration-test-sdf.sdf_ecommerce_example.customers_over_100 (./models/customers_over_100.sql)
Finished 5 models [5 succeeded] in 7.706 secs
Table “integration-test-sdf”.sdf_ecommerce_example.orders
┌──────────┬─────────────┬────────────┬─────────────────────┬──────────┐
│ order_id ┆ customer_id ┆ product_id ┆ order_date ┆ quantity │
╞══════════╪═════════════╪════════════╪═════════════════════╪══════════╡
│ 1 ┆ 1 ┆ 1 ┆ 2023-06-15T00:00:00 ┆ 1 │
│ 2 ┆ 2 ┆ 3 ┆ 2023-07-01T00:00:00 ┆ 2 │
│ 3 ┆ 3 ┆ 2 ┆ 2023-08-03T00:00:00 ┆ 1 │
└──────────┴─────────────┴────────────┴─────────────────────┴──────────┘
3 rows.
Table “integration-test-sdf”.sdf_ecommerce_example.customers
┌─────────────┬───────────────┬───────────────────────┬─────────────────────┐
│ customer_id ┆ customer_name ┆ email ┆ signup_date │
╞═════════════╪═══════════════╪═══════════════════════╪═════════════════════╡
│ 1 ┆ John Doe ┆ johndoe@example.com ┆ 2023-01-15T00:00:00 │
│ 2 ┆ Jane Smith ┆ janesmith@example.com ┆ 2023-02-10T00:00:00 │
│ 3 ┆ Sam Johnson ┆ samj@example.com ┆ 2023-03-05T00:00:00 │
└─────────────┴───────────────┴───────────────────────┴─────────────────────┘
3 rows.
Table “integration-test-sdf”.sdf_ecommerce_example.products
┌────────────┬──────────────┬────────┬─────────────┐
│ product_id ┆ product_name ┆ price ┆ category │
╞════════════╪══════════════╪════════╪═════════════╡
│ 1 ┆ Laptop ┆ 999.99 ┆ Electronics │
│ 2 ┆ Headphones ┆ 199.99 ┆ Electronics │
│ 3 ┆ Coffee Maker ┆ 89.99 ┆ Appliances │
└────────────┴──────────────┴────────┴─────────────┘
3 rows.
Table “integration-test-sdf”.sdf_ecommerce_example.aggregate_orders
┌─────────────┬───────────────┬──────────────┬─────────────┐
│ customer_id ┆ customer_name ┆ total_orders ┆ total_spent │
╞═════════════╪═══════════════╪══════════════╪═════════════╡
│ 1 ┆ John Doe ┆ 1 ┆ 999.99 │
│ 2 ┆ Jane Smith ┆ 1 ┆ 179.98 │
│ 3 ┆ Sam Johnson ┆ 1 ┆ 199.99 │
└─────────────┴───────────────┴──────────────┴─────────────┘
3 rows.
Table “integration-test-sdf”.sdf_ecommerce_example.customers_over_100
┌─────────────┬───────────────┬──────────────┬─────────────┐
│ customer_id ┆ customer_name ┆ total_orders ┆ total_spent │
╞═════════════╪═══════════════╪══════════════╪═════════════╡
│ 1 ┆ John Doe ┆ 1 ┆ 999.99 │
│ 2 ┆ Jane Smith ┆ 1 ┆ 179.98 │
│ 3 ┆ Sam Johnson ┆ 1 ┆ 199.99 │
└─────────────┴───────────────┴──────────────┴─────────────┘
3 rows.
Congratulations! You’ve just materialized your first tables in BigQuery with SDF, and even got a sneak peek at the data they produced.