Overview
SDF supports three types of integrations: databases (i.e. data warehouses), data source, and metadata sources.
SDF supports a wide variety of integrations to ensure the engine can seamlessly fit into your existing infrastructure. We define integrations as explicit support for interactions with external systems, often via network requests.
SDF’s support for compilation and local execution of dialects is not covered by this document. Please see the Features for more information on supported dialects.
These integrations can be broken down into three categories:
- Databases (i.e. data warehouses). Think Snowflake, Redshift, BigQuery, etc.
- Data Sources. Think S3, GCS, and Azure Blob Storage.
- Metadata Sources. Think Iceberg or AWS Glue.
These categories may be distinct, but that does not mean they are mutually exclusive. Databases or data warehouses can act as both data sources and metadata sources. However, they are unique in that the queries authored with SDF can actually be executed on these databases.
Databases
Databases are the most common type of integration. They are the primary target for the queries that are authored with SDF. They are also unique in that they can act as all three integration types: databases, data sources, and metadata sources.
Currently, SDF supports the following databases:
Feature | Snowflake | Redshift | BigQuery |
---|---|---|---|
Metadata Source | 🟢 | 🟢 | 🟢 |
Data Source | 🟢 | 🔴 | 🟢 |
Materialization | 🟢 | 🔴 | 🟢 |
See these links for official guides on how to get started with each database:
Support for materialization in Redshift and BigQuery generally is under active development and coming soon.
Data Sources
Data sources are used to read data into SDF from external sources. The most common use case for this is pulling data down for local execution with the SDF DB.
Currently, SDF supports the following data sources:
- S3
- Snowflake
- BigQuery
See our File Formats doc for information on the supported data formats for each data source.
Support for additional data sources like GCS is under active development and coming soon.
Metadata Sources
Metadata sources are used to pull in metadata about the tables in your data warehouse. This metadata powers SDF compilations by providing SDF with table schemas for use in compilation and type checking.
Currently, SDF supports the following metadata sources:
- Apache Iceberg**
- AWS Glue
- Snowflake
- Redshift
- BigQuery
** Via metadata stores like AWS Glue.
Others
Outside of the three main categories used to enable SDF query compilation and materialization, SDF supports the following integrations for bespoke use cases:
- GitHub - SDF offers an official open source GitHub Action for running SDF in CI/CD workflows.
- DBT - SDF can provide static impact analysis, column-level lineage, data classification / governance, and more alongside DBT projects.
- Databricks - SDF can ingest and compile Spark Logical Plans from Databricks Spark clusters to power column-level lineage and data classification.
- Dagster - SDF workspaces can be orchestrated with Dagster for better scheduling, monitoring, and execution of data workflows.