Overview
SDF is open core powered by Apache DataFusion. This means that while certain parts of SDF may be closed source, the core of SDF’s SQL execution, function typing, and logical plan representation are all open source. SDF workspaces can also depend on one another, allowing you to build and share your own open source libraries. This is especially useful for sharing common functions, macros, and models across your organization.Open Source Libraries
SDF offers a rich open source library ecosystem that you can use to extend the functionality of your SDF workspace. These libraries are maintained by the SDF community and are available for you to use in your own SDF projects. The SDF binary includes three open source libraries by default. Contributions are welcome, and the source code can be found in the GitHub repos linked below as well as in thesdftarget
directory of your workspace.
SDF Materialization
This library contains the macros used to materialize SDF models in cloud data warehouses. This is a great place to start for anyone looking to author their own custom materializations. During preprocessing, queries first have their macros expanded and then fit into the materialization templates. This library is responsible for the latter part of that process.Materializations from these macros happen before the query is compiled.
SDF Tests
This library contains the macros used to author simple data tests with SDF (i.e.unique()
, not_null()
). It also supports writing your own custom data tests through simple wrapper macros.
SDF Utils
This library contains a collection of Jinja utilities for working with SDF models. This library is a good example for how to author your own SDF libraries.Using Open Source Libraries
To use an open source library in your SDF workspace, you can add it as a dependency in yourworkspace.sdf.yml
file.
Recommended Libraries
SDF’s library ecosystem is still growing, but here’s a library we recommend checking out:Library | Description |
---|---|
SDF Workspace Evaluator | This library contains some of the most used reports and can be used as a model for authoring your own reports. Some of the most popular are dead_column analysis (find columns that are never used and are wasting compute / storage) and and column_description_coverage (find columns that are missing descriptions). |
We’ll continue to add our top picks here as the library ecosystem grows.
Open Source Components
SDF is built on a number of open source components. These components are used to power the SDF CLI and platform. Most notably, local execution and more is powered by Apache DataFusion. Our official fork used internally is open sourced and available for contributions here. One of SDF’s more significant efforts is to power local compilation and execution of dialect-specific functions and queries. Imagine being able to run a Snowflake-native query locally on your machine against sampled data, then run the same query against your Snowflake warehouse in the cloud. Our ongoing efforts to do this are powered by our open source SQL Functions Crate.If you run a query locally and find that a function hasn’t been implemented or is misproperly typed, you can contribute to the SQL Functions Crate to help improve the local execution experience.