Block: Workspace

FieldTypeDescription
edition:stringThe SDF edition, should always be 1.3 (1.2 is deprecated)
name:stringThe name of this workspace (defaults to the workspace directory name if not given) Name must be set for deployment.
description:stringA description of this workspace
repository:stringThe URL of the workspace source repository (defaults to ‘none’ if no repository is given)
includes:arrayAn array of directories and filenames containing .sql and .sdf.yml files
excludes:arrayAn array of directories and filenames to be skipped when resolving includes
dependencies:arrayDependencies of the workspace to other workspaces or to cloud database providers
integrations:arrayThe integrations for this environment
defaults:Defaults?Defaults for this workspace
source-locations:arrayWorkspace defined by these set of files
vars:objectA map of named values for setting SQL variables from your environment Ex. -dt: dt, used in SQL as @dt, and in Jinja as {{ dt }}

Block: Table

FieldTypeDescription
name:string
description:string?
dialect:Dialect?The dialect of this table, defaults to `trino`
materialization:Materialization?The table-type of this table (new version)
purpose:TablePurpose?Specify what kind of table or view this is
origin:TableOriginThe origin of this table <remote> or <local>
exists-remotely:boolean?Whether the table exists in the remote DB (used for is_incremental macro)
table-location:TableLocation?Specify table ,location, defaults to none if not set
creation-flags:TableCreationFlags?Defines the table creation options, defaults to none if not set
incremental-options:IncrementalOptions?Options governing incremental table evaluation (only for incremental tables)
snapshot-options:SnapshotOptions?Options governing snapshot table evaluation (only for snapshot tables)
dependencies:arrayAll tables that this table depends on (syntax: catalog.schema.table)
depended-on-by:arrayAll tables that depend on this table (syntax: catalog.schema.table)
columns:arrayThe columns of the schema: name, type, metadata
partitioned-by:arrayThe partitioning format of the table
severity:Severity?The default severity for this tables tests and checks
tests:array
schedule:stringThe schedule of the table [expressed as cron]
starting:stringThe first date of the table [expressed by prefixes of RFC 33]
classifiers:arrayAn array of classifier references
reclassify:arrayArray of reclassify instructions for changing the attached classifier labels
lineage:Lineage?Lineage, a tagged array of column references
location:string?Data is at this location
file-format:FileFormat?Store table in this format [only for external tables]
with-header:boolean?CSV data has a header [only for external tables]
delimiter:string?CSV data is separated by this delimiter [only for external tables]
compression:CompressionType?Json or CSV data is compressed with this method [only for external tables]
cycle-cut-point:boolean?If this table is part of a cyclic dependency then cut the cycle here
source-locations:arrayTable is defined by these .sql and/or .sdf files
sealed:boolean?This table is either backed by a create table ddl or by a table definition in yml that is the table’s complete schema
meta:objectMetadata for this table

Block: Classifier

FieldTypeDescription
name:stringThe name of the classifier type
description:stringA description of this classifier type
labels:arrayNamed classifier labels
scope:ScopeScope of the classifier: table or column
cardinality:CardinalityCardinality of the classifier: zero-or-one, one or zero-or-many
propagate:booleanDoes the classifier propagate from scope to scope or is it a one scope marker
source-locations:arrayClassifier defined by these set of .sdf files

Block: Function

FieldTypeDescription
name:stringThe name of the function [syntax: [[catalog.]schema].function]
section:stringThe generic type bounds
dialect:Dialect?The dialect that provides this function
description:stringA description of this function
variadic:VariadicArbitrary number of arguments of an common type out of a list of valid types
kind:FunctionKindThe function kind
parameters:[Parameter]?The arguments of this function
optional-parameters:[OptionalParameter]?The arguments of this function
returns:Parameter?The results of this function (can be a tuple)
binds:arrayThe generic type bounds
volatility:Volatilityvolatility - The volatility of the function.
examples:arrayexample - Example use of the function (tuple with input/output)
cross-link:stringcross-link - link to existing documentation, for example: https://trino.io/docs/current/functions/datetime.html#date_trunc
reclassify:arrayArray of reclassify instructions for changing the attached classifier labels
source-locations:arrayFunction defined by these set of .sdf files
implemented-by:FunctionImplSpec?
special:booleanFunction can be called without parentheses, e.g. as if it were a constant, e.g. current_date

Block: Credential

FieldTypeDescription
name:stringThe name of the credential (default = ‘default’)
description:string?A description of this credential
source-locations:[FilePath]?Credential defined by these set of .sdf files

Options

sdf

FieldTypeDescription
type:string
variant:SdfAuthVariantVariant of the credential
headless-creds:HeadlessCredentials?Headless Credentials

snowflake

FieldTypeDescription
type:string
account-id:stringThe account id of your snowflake account (e.g. <orgname>.<accountname> or <locator>)
username:stringThe user name to connect to snowflake
password:stringThe password to connect to snowflake
role:string?The role to use for the connection
warehouse:string?The warehouse to use for the connection

aws

FieldTypeDescription
type:string
profile:string?The name of th profile to use in the AWS credentials file
default-region:string?The default region to use for this profile
role-arn:string?The arn of the role to assume
external-id:string?The external id to use for the role
use-web-identity:boolean?Whether to use a web identity for authentication
access-key-id:string?The access key id to use for the connection
secret-access-key:string?The secret access key to use for the connection
session-token:string?The session token to use for the connection

openai

FieldTypeDescription
type:string
api-key:stringThe api key to use for the connection

empty

FieldTypeDescription
type:string

Block: Environment

FieldTypeDescription
name:stringThe name of this workspace (defaults to the workspace directory name if not given) Name must be set for deployment.
description:stringA description of this workspace
repository:stringThe URL of the workspace source repository (defaults to ‘none’ if no repository is given)
includes:arrayAn array of directories and filenames containing .sql and .sdf.yml files
excludes:arrayAn array of directories and filenames to be skipped when resolving includes
defaults:Defaults?Defaults for this workspace
dependencies:arrayDependencies of the workspace to other workspaces or to cloud database providers
integrations:arrayThe integrations for this environment
vars:objectA map of named values for setting SQL variables from your environment Ex. -dt: dt, used in SQL as @dt, and in Jinja as &#123;&#123; dt &#125;&#125;
source-locations:arrayWorkspace defined by these set of files
preprocessor:Preprocessor?Experimental: This project has jinja

Enum: IncludeType

ValueDescription
modelModels are .sql files (ddls or queries) used for data transformation. To learn more: https://docs.sdf.com/reference/sdf-yml#enum-includetype
testTests are expectations against the data of models. Uses the sdf-test library by default. To learn more: https://docs.sdf.com/reference/sdf-yml#enum-includetype
checkChecks are logical expectations expresses as queries against the information schema of the SDF workspace. To learn more: https://docs.sdf.com/reference/sdf-yml#enum-includetype
reportReports are informational queries against the information schema of the SDF workspace. To learn more: https://docs.sdf.com/reference/sdf-yml#enum-includetype
statStatistics are informational queries against the data of the models. To learn more: https://docs.sdf.com/reference/sdf-yml#enum-includetype
resourceResources are local data files (like parquet, csv, or json) used by models in the workspace. To learn more: https://docs.sdf.com/reference/sdf-yml#enum-includetype
metadataMetadata are .sdf.yml files to add additional metadata. To learn more: https://docs.sdf.com/reference/sdf-yml#enum-includetype
seedSeeds are data files used both locally and in remote data warehouses. To learn more: https://docs.sdf.com/reference/sdf-yml#enum-includetype

Enum: IndexMethod

ValueDescription
scan-dbtundefined
noneFile path is not used in figuring out the full names of the tables defined in the file. Catalog and schema are inferred from the SQL statement if possible, or from the defaults section in the workspace configuration. Table name is inferred from the SQL statement if possible or from the name of the file without the extension. To figure out dependencies between tables and files, SDF will parse every file in the corresponding include path. This is the default option
table-nameTable name is inferred from the file name without the extension. Catalog and schema are inferred from the defaults configuration. SDF assumes that a referenced table will reside in the file with the corresponding name. As a result, only the required subset of files is parsed. This option is is fast compared to the default, but it requires catalog and schemas to use the default values
schema-table-nameTable name is inferred from the file name without the extension. Schema is inferred from the directory name in which the file resides. Catalog is inferred from the defaults configuration. SDF assumes that a referenced table will reside in the directory/file with the corresponding schema/table name. As a result, only the required subset of files is parsed. This option is also fast, and it allows the schema to vary based on the directory name, but it still requires catalog to use the default values
catalog-schema-table-nameTable name is inferred from the file name without the extension. Schema is inferred from the directory name in which the file resides. Catalog is inferred from the directory name in which the schema directory resides. Defaults are not used. SDF assumes that a referenced table will reside in the catalog/schema/table file with the corresponding schema/catalog/table names. As a result, only the required subset of files is parsed. This option is also fast, and it provides the most flexibility as catalog, schema and table names can all vary

Enum: Dialect

Supported dialects.

Values
snowflake
trino
bigquery
redshift
spark-lp

Enum: Preprocessor

Values
none
jinja
sql-vars
sql-logic-test
all

Enum: Materialization

ValueDescription
tablePermanent table; e.g. for Snowflake, it will generate CREATE OR REPLACE TABLE AS ...
transient-tableTransient tables are cost-optimized as the 90 day data retention period is not enforced. They also limit Snowflake time travel support. e.g. for Snowflake, it will generate CREATE OR REPLACE TRANSIENT TABLE AS ...
temporary-tableTemporary table; e.g. for Snowflake, it will generate `CREATE OR REPLACE TEMPORARY TABLE AS …
external-tableTable with pre-existing data; it is not populated by SDF e.g. for Snowflake, it will generate CREATE OR REPLACE EXTERNAL TABLE ...
viewPermanent view (default option); e.g. for Snowflake, it will generate CREATE OR REPLACE VIEW AS ...
materialized-viewMaterialized view; e.g. for Snowflake, it will generate CREATE OR REPLACE MATERIALIZED VIEW AS ...
incremental-tableIncremental tables are permanent tables that are gradually populated and not overwritten from scratch during every SDF run. Different SQL statements will be generated depending on whether this table already exists or not. In the latter case, an equivalent of CREATE OR REPLACE TABLE will be generated; in the former case, one of INSERT INTO ..., MERGE ..., or DELETE FROM ..., followed by INSERT INTO ... will be generated depending on the parameters of incremental-options
snapshot-tableSnapshot tables are permanent tables that capture a snapshot of some underlying data at the time of running the query. Like incremental tables, snapshot tables are not recomputed from scratch during every SDF run
recursive-tableRecursive tables are permanent tables whose queries can have self-references. SDF will generate an equivalent of CREATE RECURSIVE TABLE AS .... Note: in Snowflake, recursive tables are not treated specially---a normal CREATE OR REPLACE TABLE AS... is generated for them
otherUser defined other materializations, it is up to users to define their own materialization logics

Enum: TableCreationFlags

ValueDescription
create-newAttempts to create a new table; fails if the table already exists.
drop-if-existsDrops the existing table and creates a new one.
skip-if-existsSkips table creation if the table already exists.
create-or-replaceReplaces the existing table with a new one. Implementation may vary by DBMS.
create-if-not-existsCreates the table only if it doesn’t already exist.

Enum: SyncType

ValueDescription
alwaysSynchronizes directory on pull and push
on-pullSynchronizes directory on every pull
on-pushSynchronizes directory on every push
neverNever synchronizes directory

Enum: Severity

Supported dialects.

Values
warning
error

Enum: CompressionType

ValueDescription
tarundefined
bzip2BZIP2 Compression (.bz2)
gzipGZIP Compression (.gzip)
noneNone, (default)

Enum: ExcludeType

ValueDescription
contentundefined
pathExcludes this path, can be a glob expression

Enum: IntegrationType

Values
data
metadata
database

Enum: ProviderType

Values
glue
redshift
snowflake
s3
sdf

Enum: TablePurpose

ValueDescription
reportundefined
modelA regular table
checkA code contract
testA data contract
statA data report
systemA SDF System table/view, maintained by the sdf cli
external-systemAn External System table/view, maintained by the database system (e.g. Snowflake, Redshift, etc.)

Enum: TableOrigin

Values
local
remote

Enum: TableLocation

Values
mirror
local
remote
intrinsic

Enum: IncrementalStrategy

Values
append
merge
delete+insert

Enum: OnSchemaChange

ValueDescription
failFail when the new schema of an incremental table does not match the old one
appendKeep all the columns of the old incremental table schema while appending the new columns from the new schema. Initialize all the columns that were deleted in the new schema with NULL
syncDelete all the deleted columns from the incremental table; Append all the new columns

Enum: SnapshotStrategy

Values
timestamp
check

Enum: CheckColsSpec

ValueDescription
colsA list of column names to be used for checking if a snapshot’s source data set was updated
allUse all columns to check whether a snapshot’s source data set was updated

Enum: FileFormat

Store table data in these formats.

Values
parquet
csv
json

Enum: Scope

Values
column
table

Enum: Cardinality

Values
zero-or-one
one
zero-or-more

Enum: Variadic

ValueDescription
non-uniformundefined
uniformAll arguments have the same types
even-oddAll even arguments have one type, odd arguments have another type
anyAny length of arguments, arguments can be different types

Enum: FunctionKind

Values
scalar
aggregate
window
table

Enum: Volatility

ValueDescription
purePure - An pure function will always return the same output when given the same input.
stableStable - A stable function may return different values given the same input across different queries but must return the same value for a given input within a query.
volatileVolatile - A volatile function may change the return value from evaluation to evaluation. Multiple invocations of a volatile function may return different results when used in the same query.

Enum: FunctionImplSpec

ValueDescription
builtinBy a built-in primitive in Datafusion. (Being phased out in favor of UDFs.)
rustBy a UDF in the sdf-functions crate.
datafusionBy a UDF in the datafusion crate.
sqlBy a CREATE FUNCTION. (Not yet supported for evaluation.)

Enum: SdfAuthVariant

Values
interactive
headless
id_token_from_file

Nested Elements

Nested element: IncludePath

FieldTypeDescription
path:stringA filepath
type:IncludeTypeType of included artifacts: modelteststatsmetadataresource
index:IndexMethodIndex method for this include path: scantableschema-tablecatalog-schema-table
defaults:Defaults?Defaults for files on this path

Nested element: Defaults

FieldTypeDescription
environment:string?The default environment (can only be set on the level of the workspace)
dialect:Dialect?The dialect of this environment. If not set, defaults to trino
preprocessor:Preprocessor?The preprocessor for this environment. If not set, defaults to local
catalog:string?Defines a default catalog. If not set, defaults to the (catalog/workspace) name in an outer scope
schema:string?Defines a default schema, If not set, defaults to the schema name in an outer scope, if not set, defaults to ‘pub’
materialization:Materialization?Defines the default materialization, if not set defaults to materialization in outer scope, if not set defaults to base-table
creation-flag:TableCreationFlags?Defines table creation flags, defaults to if not set
utils-lib:string?The default utils library, if set overrides sdf_utils
test-lib:string?The default test library, if set overrides sdf_test
materialization-lib:string?The default materialization library, if set overrides sdf_materialization
index-method:IndexMethod?The default index for this tables
include-type:IncludeType?The default index for this tables
sync-method:SyncType?The default index for this tables
severity:Severity?The default severity for this tables tests and checks
csv-has-header:boolean?CSV data has a header [only for external tables]
csv-delimiter:string?CSV data is separated by this delimiter [only for external tables]
csv-compression:CompressionType?Json or CSV data is compressed with this method [only for external tables]

Nested element: ExcludePath

FieldTypeDescription
path:stringA filepath
exclude-type:ExcludeType?Type of excluded artifacts

Nested element: Dependency

FieldTypeDescription
name:string
path:string?The relative path from this workspace to the referenced workspace, for a Git repo, from the root of the depot to the workspace
environment:string?The chosen workspace environment (none means default)
target:string?The chosen workspace target (none means default)
git:string?The Git repo
rev:string?the Git revision (choose only one of the fields: rev, branch, tag)
branch:string?the Git branch (choose only one of the fields: rev, branch, tag)
tag:string?the Git tag (choose only one of the fields: rev, branch, tag)
imports:arrayWhich models, reports, tests, checks etc. to include from the dependency

Nested element: Integration

FieldTypeDescription
type:IntegrationTypeThe type of the integration [e.g.: database, metadata, data]
provider:ProviderTypeThe type of the provider [e.g.: snowflake, redshift, s3]
credential:string?Credential identifier for this provider
cluster-identifier:string?The cluster identifier for redshift server
batch-size:integer?The size of the batch when querying the provider
sources:arrayA list of (possibly remote) sources to read, matched in order, so write specific pattern before more general patterns
targets:arrayA list of (possibly remote) targets to build, matched in order, so write specific pattern before more general patterns, source patterns are excluded
buckets:arrayA list of remote buckets to target
output-location:string?The remote output location of the integration

Nested element: SourcePattern

FieldTypeDescription
pattern:stringA source that can be read. Sources must be a three part names with globs, eg. ..* matches all catalogs, schema and table in scope
time-travel-qualifier:string?Time travel qualifier expression (e.g. `AT (TIMESTAMP => {{ SOME_TIMESTAMP }})“)
preload:boolean?Whether to preload the source
rename-from:string?Renames sources when searching in the remote, the ith &#123;i&#125; matches the ith * of the name, so to prepend all catalogs,schema,table with _, use &quot;_{1}._&#123;2&#125;._{3}”

Nested element: TargetPattern

FieldTypeDescription
pattern:string?A pattern must be a three part names with globs, eg. ..* matches all catalogs, schema and table in scope
patterns:arrayA list of patterns. A pattern must be a three part names with globs, eg. ..* matches all catalogs, schema and table in scope
preload:boolean?Whether to preload the target
rename-as:string?Renames targets, the ith &#123;i&#125; matches the ith * of the name, so to prepend all catalogs,schema,table with _, use &quot;_{1}._&#123;2&#125;._{3}”

Nested element: DataBucket

FieldTypeDescription
uri:stringThe uri of the bucket
region:string?The region of the bucket

Nested element: FilePath

FieldTypeDescription
path:stringA filepath

Nested element: SystemTime

FieldTypeDescription
secs_since_epoch:integer
nanos_since_epoch:integer

Nested element: Constant

FieldTypeDescription

Nested element: IncrementalOptions

FieldTypeDescription
strategy:IncrementalStrategyIncremental strategy; may be one of `append`, `merge`, or `delete+insert`
unique-key:string?Expression used for identifying records in Merge and Delete+Insert strategies; May be a column name or an expression combining multiple columns. If left unspecified, Merge and Delete+Insert strategies behave the same as Append
merge-update-columns:arrayList of column names to be updated as part of Merge strategy; Only one of merge_update_columns or merge_exclude_columns may be specified
merge-exclude-columns:arrayList of column names to exclude from updating as part of Merge strategy; Only one of merge_update_columns or merge_exclude_columns may be specified
on-schema-change:OnSchemaChange?Method for reacting to changing schema in the source of the incremental table Possible values are `fail`, `append`, and `sync`. If left unspecified, the default behavior is to ignore the change and possibly error out if the schema change is incompatible. `fail` causes a failure whenever any deviation in the schema of the source is detected; `append` adds new columns but does not delete the columns removed from the source; `sync` adds new columns and deletess the columns removed from the source;

Nested element: SnapshotOptions

FieldTypeDescription
strategy:SnapshotStrategySnapshot strategy; may be one of `timestamp` (default), or `check`
unique-key:stringExpression used for identifying records that will be updated according to the snapshot strategiy; May be a column name or an expression combining multiple columns
updated-at:string?Name of the timestamp column used to identify the last update time This option is only required for the `timestamp` snapshot strategy
check-cols:CheckColsSpec?Specification of which columns to check for change (may be a list of column names or `all`) This option is only required for the `check` snapshot strategy

Nested element: Column

FieldTypeDescription
name:stringThe name of the column
description:stringA description of this column
datatype:string?The type of this column
classifiers:arrayAn array of classifier references
lineage:Lineage?Lineage, a tagged array of column references
forward-lineage:Lineage?Forward Lineage, the columns that this column is used to compute
reclassify:arrayArray of reclassify instructions for changing the attached classifier labels
samples:arrayAn array of representative literals of this column [experimental!]
default-severity:SeverityThe default severity for this tables tests and checks
tests:array

Nested element: Lineage

FieldTypeDescription
copy:arrayThe output column is computed by copying these upstream columns
modify:arrayThe output column is computed by transforming these upstream columns
scan:arrayThese upstream columns are indirectly used to produce the output (e.g. in WHERE or GROUP BY)
apply:arrayThese functions were used to produce the output column

Nested element: Reclassify

FieldTypeDescription
to:string?Target classifier
from:string?Expected source classifier

Nested element: Constraint

FieldTypeDescription
expect:stringThe constraint macro: must have the form lib.macro(args,..), where lib is any of the libs in scope, std is available by default
severity:Severity?The severity of this constraint

Nested element: Partition

FieldTypeDescription
name:stringThe name of the partition column
description:string?A description of the partition column
format:string?The format of the partition column [use strftime format for date/time] See (guide)[https://docs.sdf.com/guide/schedules]

Nested element: Label

FieldTypeDescription
name:stringThe name of the label, use ”*” to allow arbitrary strings as labels
description:string?A description of this classifier element

Nested element: Parameter

FieldTypeDescription
name:string?The name of the parameter
description:string?A description of this parameter
datatype:string?The datatype of this parameter
classifier:[string]?An array of classifier references
constant:string?The required constant value of this parameter
identifiers:[string]?The parameter may appear as identifier, without quote

Nested element: OptionalParameter

FieldTypeDescription
name:stringThe name of the parameter
description:string?A description of this parameter
datatype:stringThe datatype of this parameter
classifier:[string]?An array of classifier references
constant:string?The required constant value of this parameter
identifiers:[string]?The parameter may appear as identifier, without quote

Nested element: TypeBound

FieldTypeDescription
type-variable:string
datatypes:array

Nested element: Example

FieldTypeDescription
input:stringThe sql string corresponding to the input of this example
output:stringThe output corresponding to running the input string

Nested element: RustFunctionSpec

FieldTypeDescription
name:string?The name attribute of the implementing UDF. None indicates the UDF is named the same as the function.

Nested element: DataFusionSpec

FieldTypeDescription
udf:string?The name attribute of the implementing UDF. None indicates the UDF is named the same as the function.

Nested element: Config

FieldTypeDescription
name:stringThe name of the configuration section
description:string?A description of this configuration section
properties:object?

Nested element: HeadlessCredentials

FieldTypeDescription
access_key:string
secret_key:string