Overview

SDF DB was designed with two goals in mind:

  1. Provide an integrated transformation layer and dependency aware database to make getting started with a data warehouse faster, more intuitive, and scalable.
  2. Use SDF’s Executable Semantics for multiple SQL dialects to enable query acceleration for proprietary data warehouses on compute agnostic storage like Iceberg and AWS S3.

SDF DB understands time, code, and data dependencies to optimize execution across many queries in a warehouse. SDF DB only executes parts of the DAG that have changed since the last run. This intelligence is managed by a sophisticated caching layer that fingerprints data, code, metadata to optimally recompute nodes that are out of date after a change.

Use SDF to run fast, in-memory, queries on data locally or in the cloud. The physical plan, optimizer and execution is based on Apache Datafusion for which we have the greatest appreciation.

SDF DB is in alpha with breaking changes, new functionality, and more coming all the time. If you are interested in using the database product, please contact us.

Using SDF Database

It’s simple, SDF allows you to set an execution context per query. In fact, all checks and reports are executed with SDF’s builtin execution engine already!

You can get started with the database in just a couple of commands.

sdf new sample_db && cd sample_db
sdf run --show all

To explicitly set the execution context and dialects, you can configure global, or even per-table settings via SDF’s defaults paradigm.

If you wanted you could execute one query on Snowflake, another query on Redshift, then collect results from both locally on your laptop and then run an aggregation on the results locally on your laptop! (We’re not sure why you would want to though…) You can also

workspace:
  ...
  defaults: 
    dialect: trino

What SDF DB is Not

As the line between “Database”, “Query Engine”, and “Execution Runtime” is being blurred, it’s important to note what SDF DB is not.

SDF is not designed to be a Online Transactional Processing, (OLTP) and as such is not a replacement for MySQL or PostgreSQL.

Data Provider Capabilities

Data may come from many sources, either from a local file system, remote file system, or metadata storage layer.

SDF DB supports the following data locations.

FeaureSDF DB
Local File System🟢
AWS Glue🟢
Iceberg🟢
AWS S3🟢
Delta Lake🟡
Google Cloud Storage🔴
Azure Bulk Storage🔴