In this document, we’re going to discuss a critical part of the SDF ecosystem - the workspace.
workspace.sdf.yml
, defines the configuration for the project,
and will be used by SDF to build and deploy your project. This file is required in your
project and sdf will fail without it.
workspace.sdf.yml
is a specific instance of an
sdf.yml
file. High-level, sdf.yml's
are general-purpose metadata descriptors
for describing things like:
workspace.sdf.yml
is actually a special instance of the general sdf.yml.sdf.yml
is structured such that there are a set of specific YAML blocks
titled by their purpose in your project. For example, you might have a table
block, a classifier
block, and a function
block all in the same file. Each of
these blocks will contain a set of properties that define the metadata for that
block. The types of metadata differ per block type, but they all share a few
properties.
sdf.yml
files can be placed anywhere within your SDF project, and can be named
whatever you like, as long as the file extension is .sdf.yml
. For example, you
could have a tables.sdf.yml
file, a classifiers.sdf.yml
file, and a
functions.sdf.yml
file. Or you could have an sdf.yml
per table defined in
your project. Maybe I have a table six_hourly_ingest.sql
and a corresponding
six_hourly_ingest.sdf.yml
. The possibilites are truly endless.
.sdf.yml
files do not support YML anchors nor YML aliases due to limitations in our YML processor.
If you’re unfamiliar with YML anchors, don’t worry about it. If you’re familiar and feel like you really want them,
request it in the Slack!name
- The name of the metadata block. Names must be unique within their block
type. For example, two tables cannot have the same name, but a table and a
classifier could share the name pii
.
description
- This is a free-form text field that can be used to describe the
metadata block. This is useful for documentation purposes.
sdf.yml
, see the sdf.yml reference.
workspace.sdf.yml
. This defines your workspace configuration, and is used by
the SDF engine to build and deploy your project.
workspace.sdf.yml
looks like this:
edition
. This tells the SDF engine which version of the workspace
to expect when compiling your project.
Next we see an includes
block. This tells the SDF engine where to look to find
the code sources it will use to build your project and transform your data.
Simply put, your SQL files should go into the models
directory.
For a full list of the options available in the workspace.sdf.yml
, see the
sdf.yml reference.
name
- The name of the workspace. This is used to create the default catalog
for the workspace.edition
- The edition of the workspace. This is used to determine which
version of the workspace to expect when compiling your project. By default this is 1.2
.includes
- The include block references all metadata, code, and that will be used to build your project and transform your data. Can be a file, folder, or glob pattern.
Each includes element can specify additional properties like the IncludeType.excludes
- The excludes block defines where to explicitly ignore code sources or metadata. Can be a file, folder, or glob pattern.defaults
- The defaults block contains all defaults for the workspace. In this case, we’re specifying the default catalog, schema, and dialect.dialect
- The default SQL dialect that will be used to build your project.
SDF supports multi-dialect projects, so you can have different SQL dialects in
the same project by configuring the dialect in the
table metadata block. However, most projects will only use
one.path
- The relative path to the directory or file in the workspace containing the models, metadata, resources, etc. This path is relative to the workspace.sdf.yml
.type
- The type of assets being included by this block. Some commonly used types are model
, spec
, and macro
.defaults
- This block models the defaults block at the top level of the workspace or environment block. It enables you to overwrite default properties on a per-includes-path basis.Directory Name | Type |
---|---|
models | model |
specs | spec |
reports | report |
checks | check |
macros | macro |
tests | test |
seeds | seed |
resources | resource |
customer_facing
, debugging
,
sensitive_information
, and more.
Here’s an example of a table definition with a table-level classifier:
pii.phone
classifier to the phone
column.
PII
. See the Table Block section for an
example of how to apply this classifier to a column.
sdftarget
directory. After
running an SDF command, you’ll find this new directory appears in your SDF workspace.
This is automatically added to your .gitignore
and should not be tracked in git.
As the name implies, this folder enables SDF to cache intermediaries as you’re
building your project locally. Along with its caching capability, it also
generates a set of sdf.yml
files after running any SDF command.
These sdf.yml
files represent all the user-defined and generated SDF metadata across your project.
This metadata is used to enable features like column-level lineage, classifier
propagation, and more.
Here’s an example of a full-fledged table definition generated in the sdftarget
:
table:
name: moms_flower_shop.analytics.agg_installs_and_campaigns
dialect: trino
casing-policy: preserve
materialization: view
purpose: model
dependencies:
- moms_flower_shop.staging.app_installs_v2
columns:
- name: install_date
datatype: varchar
lineage:
modify:
- moms_flower_shop.staging.app_installs_v2.install_time
- name: campaign_name
description: The campaign name associated with the campaign_id
datatype: varchar
lineage:
copy:
- moms_flower_shop.staging.app_installs_v2.campaign_name
- name: platform
description: iOS or Android
datatype: varchar
lineage:
copy:
- moms_flower_shop.staging.app_installs_v2.platform
- name: distinct_installs
datatype: bigint
lineage:
modify:
- moms_flower_shop.staging.app_installs_v2.customer_id
classifiers:
- RETENTION.infinity
lineage:
scan:
- moms_flower_shop.staging.app_installs_v2.campaign_name
- moms_flower_shop.staging.app_installs_v2.install_time
- moms_flower_shop.staging.app_installs_v2.platform
source-locations:
- path: metadata/analytics/agg_installs_and_campaigns.sdf.yml
- path: models/analytics/agg_installs_and_campaigns.sql
sdf.yml
files are a powerful representation that can help you understand what
SDF is doing under the hood. You can use these files to understand how SDF
is interpreting your project and debug issues that might arise. We hope developers will take this representation and build their own
tools and functionalities on top of it in the future.