Local Compilation
Compile your SDF workspace offline without running any queries against your database.
Overview
Do you have a need for speed? Is the Rust performance of SDF not enough for you? Do you want to compile your SDF project without running any queries against your database? If you answered yes to any of these questions, then you’re in the right place! This guide will show you how to compile your SDF project locally without running any queries against your database. The key to accomplishing this is hydrating your SDF workspace with schemas for the remote sources locally.
Managing a locally compilable SDF workspace is significantly more maintenance than a standard SDF workspace. We recommend only advanced users with a strong understanding of SDF and general software engineering principles attempt this.
Architecture
The architecture of a locally compilable SDF workspace is similar to a standard SDF workspace. The primary difference is that the locally compilable workspace will have a local copy of the schemas for the remote sources. This local copy of the schemas will be used to compile the workspace without running any queries against the database.
We recommend storing these in a directory called sources
in the root of your workspace. As such, a typical directory structure for this might be:
For compilation speeds, we then recommend structuring your source YML declaration using the catalog-schema-table-name index.
So if I had sources coming from a database called financials
, in a schema public
, the file structure would optimally look like:
Source Declarations
In order to compile locally, we need the column names and datatypes for the tables our queries are pulling data from (i.e. sources).
These can be declared as Local Schema Files. You can create a YML file for each source table in the sources
directory. These files should contain the column names and datatypes for the source table. (example below)
Since these source declarations are just SDF yml files - descriptions, tests, and other metadata like classifiers can be easily stored alongside the schema.
Let’s imagine we have a source table called raw_customers
(as seen above) with columns like an customerid
and name
. An example of a schema for this source table might look like:
Let’s take note of a few things about this source declaration:
- The
origin
field is set toremote
to indicate that this is a remote source. This is critical, as SDF will still fetch the remote schema unless this attribute is set. - The
columns
field contains a list of column objects, each with aname
anddatatype
field. Thedatatype
field must be a valid SQL datatype that corresponds to the column’s datatype in the remote source. - Other metadata like classifiers and descriptions can be added alongside the datatype. These will propagate downstream using our column-level lineage.
The easiest way to manually create these is by looking into the sdftarget
after compiling use remote sources. In the example above, I can simply copy the SDF yml file produced from compilation located at
sdftarget/dbg/table/financials/public/raw_customers.sdf.yml
. Note it’s recommended to remove lots of extra generated metadata like the purpose
, case-policy
, materialization
, and more if using this method.
We’ve seen users take a variety of methods to generate these YML files programmatically. The most common is to use a protobuf representation of the sources, likely generated from the production db (like Postgres), and convert that into SDF yml as a pre-compile script.
Incremental Models and Snapshots
When SDF compiles incremental models and snapshots, the compilation results are often modified by the incremental or snapshot mode. By default, this mode is set by simply checking to see if this model exists in the remote database. Therefore, we need to overwrite this default behavior if we’d like to compile incremental models and snapshots entirely locally.
This can be accomplished by passing in the flag --prefer-local
to the sdf compile
command. This flag will force SDF to compile the model entirely locally, without checking the remote database.
The --prefer-local
flag works by simply setting the incremental mode and snapshot mode variables to true during compilation, thereby replacing the need to check the remote database.
However, let’s say we wanted to compile these models locally but with incremental and snapshot mode off. We can do this by passing extra parameters to the sdf compile
command, specifically --no-incremental-mode
and --no-snapshot-mode
.
Here are four examples and their expected results:
sdf compile --prefer-local
: All incremental models and snapshots will not require a database request to compile. They will compile with incremental mode totrue
and snapshot mode totrue
sdf compile --prefer-local --no-incremental-mode
: All incremental models and snapshots will not require a database request to compile. They will be compile with incremental mode tofalse
and snapshot mode totrue
sdf compile --prefer-local --no-snapshot-mode
: All incremental models and snapshots will not require a database request to compile. They will be compile with incremental mode totrue
and snapshot mode tofalse
sdf compile --prefer-local --no-incremental-mode --no-snapshot-mode
: All incremental models and snapshots will not require a database request to compile. They will be compile with incremental mode tofalse
and snapshot mode tofalse
--no-incremental-mode
and --no-snapshot-mode
will not work to compile locally without --prefer-local
. --prefer-local
is required to prevent a request to the remote database.