Workspaces
In this document, we’re going to discuss a critical part of the SDF ecosystem - the workspace.
An SDF workspace is a directory which defines the data transformation, quality, and governance processes in your warehouse. It consists of SQL files representing data models or quality controls, configuration files, and additional file types supporting advanced data operations.
A YAML file, workspace.sdf.yml
, defines the configuration for the project,
and will be used by SDF to build and deploy your project. This file is required in your
project and sdf will fail without it.
Our engine supports multiple dialects in the same SDF project. That means even column-level lineage will work across SQL dialects.
sdf.yml
As previously mentioned, the workspace.sdf.yml
is a specific instance of an
sdf.yml
file. High-level, sdf.yml's
are general-purpose metadata descriptors
for describing things like:
- Where your queries are in the directory
- The SQL dialect your query is written in
- Which Checks should run
- The Schema and Catalog defaults
- Which classifiers exist and the columns they should be attached to
- And more…
The workspace.sdf.yml
is actually a special instance of the general sdf.yml.
The sdf.yml
is structured such that there are a set of specific YAML blocks
titled by their purpose in your project. For example, you might have a table
block, a classifier
block, and a function
block all in the same file. Each of
these blocks will contain a set of properties that define the metadata for that
block. The types of metadata differ per block type, but they all share a few
properties.
sdf.yml
files can be placed anywhere within your SDF project, and can be named
whatever you like, as long as the file extension is .sdf.yml
. For example, you
could have a tables.sdf.yml
file, a classifiers.sdf.yml
file, and a
functions.sdf.yml
file. Or you could have an sdf.yml
per table defined in
your project. Maybe I have a table six_hourly_ingest.sql
and a corresponding
six_hourly_ingest.sdf.yml
. The possibilites are truly endless.
Shared Properties Among All Metadata Blocks
name
- The name of the metadata block. Names must be unique within their block
type. For example, two tables cannot have the same name, but a table and a
classifier could share the name pii
.
description
- This is a free-form text field that can be used to describe the
metadata block. This is useful for documentation purposes.
All this metadata is indexed and searchable in the SDF Cloud
Metadata Blocks
Here we’ll briefly review each of the metadata block types. Each serves its
own purpose, and is used in different ways. For a full list of the options
available in the sdf.yml
, see the sdf.yml reference.
Workspace Block
The workspace block is a special metadata block that belongs in your
workspace.sdf.yml
. This defines your workspace configuration, and is used by
the SDF engine to build and deploy your project.
Hello World Workspace
The simplest version of a workspace.sdf.yml
looks like this:
So what’s going on here? The first thing you’ll notice is the workspace
specifies an edition
. This tells the SDF engine which version of the workspace
to expect when compiling your project.
Next we see an includes
block. This tells the SDF engine where to look to find
the code sources it will use to build your project and transform your data.
Simply put, your SQL files should go into the models
directory.
For a full list of the options available in the workspace.sdf.yml
, see the
sdf.yml reference.
Important Properties
name
- The name of the workspace. This is used to create the default catalog for the workspace.edition
- The edition of the workspace. This is used to determine which version of the workspace to expect when compiling your project. By default this is1.2
.includes
- The include block references all metadata, code, and that will be used to build your project and transform your data. Can be a file, folder, or glob pattern. Each includes element can specify additional properties like the IncludeType.excludes
- The excludes block defines where to explicitly ignore code sources or metadata. Can be a file, folder, or glob pattern.defaults
- The defaults block contains all defaults for the workspace. In this case, we’re specifying the default catalog, schema, and dialect.dialect
- The default SQL dialect that will be used to build your project. SDF supports multi-dialect projects, so you can have different SQL dialects in the same project by configuring the dialect in the table metadata block. However, most projects will only use one.
Includes Blocks
The includes sub-property exist only on the workspace block and the environment block. The purpose of this block is to tell SDF where to find the models, metadata specifications, macros, reports, and resources like seed data. The includes property contains a list of includes blocks, each of which are dictionaries that share the same structure. The most commonly used properties on these includes blocks are:
path
- The relative path to the directory or file in the workspace containing the models, metadata, resources, etc. This path is relative to theworkspace.sdf.yml
.type
- The type of assets being included by this block. Some commonly used types aremodel
,spec
, andmacro
.defaults
- This block models the defaults block at the top level of the workspace or environment block. It enables you to overwrite default properties on a per-includes-path basis.
SDF will auto-infer the type of the includes block if the directory name matches a certain naming convention. We consider this a best practice and encourage users to follow these conventions.
Directory Name | Type |
---|---|
models | model |
specs | spec |
reports | report |
checks | check |
macros | macro |
tests | test |
seeds | seed |
resources | resource |
Given the naming conventions mentioned above, the following includes block would work without any type specifications:
Table Block
The table block is used to define the metadata for a table. This includes the table name, the table description, the table’s columns, and even the table’s SQL dialect. This will likely be your most widely used metadata block. For more detailed information on the table block, see the Table Block reference.
Table-Level Classifiers
SDF enables you to tag tables via table-level classifiers, which allows you to quickly identify all
tables under a certain category. Some examples for table-level classifiers can be customer_facing
, debugging
,
sensitive_information
, and more.
Here’s an example of a table definition with a table-level classifier:
Column-Level Classifiers
One of the most important use cases for the table block is defining column-level classifiers. SDF will propagate these classifiers to downstream dependencies automatically. Here’s an example of a basic table definition with a column-level classifier:
This table definition attaches the pii.phone
classifier to the phone
column.
Classifier Block
The classifier block is used to define classifiers in your SDF projects. You can think of these as units of metadata to be reused and measured across your data warehouse. Classifiers have names, descriptions, and labels. For more detailed information on the classifier block, see the Classifier Block reference.
In light of data privacy regulations, a common use case for classifiers is to define PII columns. You can define a classifier for PII columns, then apply that classifier to any column in your project and watch it propagate automatically to downstream queries. This will allow you to easily identify PII columns across your data warehouse, and even enforce data governance checks on them in the future.
Here’s an example of a basic PII classifier:
This classifier can be used across columns and tables alike to tell the SDF
engine where to find PII
. See the Table Block section for an
example of how to apply this classifier to a column.
SDF automatically propagates classifiers to downstream dependencies. This helps you keep your metadata up to date and consistent across your entire data warehouse without having to manually update it everywhere. This makes it easier to stay compliant with data privacy regulations like GDPR and CCPA.
Generated Metadata
SDF represents all the metadata about your project in a local sdftarget
directory. After
running an SDF command, you’ll find this new directory appears in your SDF workspace.
This is automatically added to your .gitignore
and should not be tracked in git.
As the name implies, this folder enables SDF to cache intermediaries as you’re
building your project locally. Along with its caching capability, it also
generates a set of sdf.yml
files after running any SDF command.
These sdf.yml
files represent all the user-defined and generated SDF metadata across your project.
This metadata is used to enable features like column-level lineage, classifier
propagation, and more.
Here’s an example of a full-fledged table definition generated in the sdftarget
:
table:
name: moms_flower_shop.analytics.agg_installs_and_campaigns
dialect: trino
materialization: view
purpose: model
dependencies:
- moms_flower_shop.staging.app_installs_v2
columns:
- name: install_date
datatype: varchar
lineage:
modify:
- moms_flower_shop.staging.app_installs_v2.install_time
- name: campaign_name
datatype: varchar
lineage:
copy:
- moms_flower_shop.staging.app_installs_v2.campaign_name
- name: platform
datatype: varchar
lineage:
copy:
- moms_flower_shop.staging.app_installs_v2.platform
- name: distinct_installs
datatype: bigint
lineage:
modify:
- moms_flower_shop.staging.app_installs_v2.customer_id
classifiers:
- RETENTION.infinity
lineage:
scan:
- moms_flower_shop.staging.app_installs_v2.campaign_name
- moms_flower_shop.staging.app_installs_v2.install_time
- moms_flower_shop.staging.app_installs_v2.platform
source-locations:
- path: metadata/analytics/agg_installs_and_campaigns.sdf.yml
- path: models/analytics/agg_installs_and_campaigns.sql
These generated sdf.yml
files are a powerful representation that can help you understand what
SDF is doing under the hood. You can use these files to understand how SDF
is interpreting your project and debug issues that might arise. We hope developers will take this representation and build their own
tools and functionalities on top of it in the future.
Want to build something cool that utilizes the SDF engine? We’d love to hear about it! Inquire here to let us know what you’re working on and how we can help.