1.2 ➡️ 1.3
Workspace edition 1.3 brings a new information schema, improvements to the SDF CLI, and an expansion of what was previously the provider
configuration. This guide will help you migrate your workspace from edition 1.2 to 1.3.
In versions 0.3.0
and above, workspace 1.3
is required. As a first step, please update your workspace block to the following:
Breaking Changes
Providers -> Integrations
SDF can read metadata and data from a variety of sources, including databases, data lakes, and data catalogs.
In the past, we referred to databases as providers
, then had confusing methods for configuring data lakes and data catalogs.
Now, all of these external relationships are managed in a single, unified integrations
block.
This replaces the providers
block in the workspace configuration.
If you have a workspace with a provider block like this:
It should now look like this:
This makes integrations much more extensible as we grow towards more data and metadata sources for local execution. For an in-depth guide on how to configure a certain integration or a description about our new integrations philosophy, see the integration documentation.
The type
field is now required for all integrations. This is to differentiate between database, data lake, and data catalog integrations.
Goodbye Compute
Previously, the compute
property was required in the defaults
block to tell SDF where to run the query, i.e. local
or remote
.
This is now inferred from the integrations block, rendering it unnecessary. It should now be removed from the defaults
block.
Should now look like this:
Information Schema V2
The information schema has been updated to include more metadata about your tables, and has been re-architected to enable performant queries on larger workspaces. If you have any checks or reports, these will need to be rewritten to use the new information schema.
For example, if you have a check that looks like this:
It should now look like this:
As you can see, new functions are now available for local execution, including CONTAINS_ARRAY_VARCHAR
.
For a full reference on the new information schema, see the information schema documentation.
Goodbye .sdfcache, hello sdftarget
Based on some quality user feedback, the .sdfcache
directory has been renamed to sdftarget
. This is where SDF will store all of its metadata and cache files.
This shouldn’t break anything as is, but if you have any scripts or processes that rely on the .sdfcache
directory, you’ll need to update them to use sdftarget
instead.
Furthermore, .gitignore
files should be updated to ignore sdftarget
instead of .sdfcache
.
Support for trino
syntax
We’ve enabled trino
to be used as an alias for trino
for the workspace’s dialect
property.
To see all accepted dialects, see the dialect documentation.
seeded: true
-> cycle-cut-point: true
Due to confusion with configuring seed
models and breaking cycles with the seeded
property, we’d renamed this field to cycle-cut-point
. This now explicitly marks this table as first table to be processed in a cycle.