> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Workspaces

> In this document, we're going to discuss a critical part of the SDF ecosystem - the workspace.

An SDF workspace is a directory which defines the data transformation, quality,
and governance processes in your warehouse. It consists of SQL files representing [data
models](/guide/basics/lineage_metadata#source-sql-files) or [quality controls](/guide/data-quality/),
[configuration files](/reference/sdf-yml/), and additional file types supporting advanced data operations.

A YAML file, `workspace.sdf.yml`, defines the configuration for the project,
and will be used by SDF to build and deploy your project. This file is required in your
project and sdf will fail without it.

<Tip>
  Our engine supports multiple dialects in the same SDF project. That means even
  column-level lineage will work across SQL dialects.
</Tip>

## sdf.yml

As previously mentioned, the `workspace.sdf.yml` is a specific instance of an
`sdf.yml` file. High-level, `sdf.yml's` are general-purpose metadata descriptors
for describing things like:

* Where your queries are in the directory
* The SQL dialect your query is written in
* Which Checks should run
* The Schema and Catalog defaults
* Which classifiers exist and the columns they should be attached to
* And more...

<Note>
  The `workspace.sdf.yml` is actually a special instance of the general [sdf.yml](/reference/sdf-yml).
</Note>

The `sdf.yml` is structured such that there are a set of specific YAML blocks
titled by their purpose in your project. For example, you might have a `table`
block, a `classifier` block, and a `function` block all in the same file. Each of
these blocks will contain a set of properties that define the metadata for that
block. The types of metadata differ per block type, but they all share a few
properties.

`sdf.yml` files can be placed anywhere within your SDF project, and can be named
whatever you like, as long as the file extension is `.sdf.yml`. For example, you
could have a `tables.sdf.yml` file, a `classifiers.sdf.yml` file, and a
`functions.sdf.yml` file. Or you could have an `sdf.yml` per table defined in
your project. Maybe I have a table `six_hourly_ingest.sql` and a corresponding
`six_hourly_ingest.sdf.yml`. The possibilites are truly endless.

<Warning>
  `.sdf.yml` files *do not support* [YML anchors nor YML aliases](https://smcleod.net/2022/11/yaml-anchors-and-aliases/) due to limitations in our YML processor.
  If you're unfamiliar with YML anchors, don't worry about it. If you're familiar and feel like you really want them,
  request it in the Slack!
</Warning>

#### Shared Properties Among All Metadata Blocks

`name` - The name of the metadata block. Names must be unique within their block
type. For example, two tables cannot have the same name, but a table and a
classifier could share the name `pii`.

`description` - This is a free-form text field that can be used to describe the
metadata block. This is useful for documentation purposes.

<Tip>
  All this metadata is indexed and searchable in the SDF Cloud
</Tip>

## Metadata Blocks

Here we'll briefly review each of the metadata block types. Each serves its
own purpose, and is used in different ways. For a full list of the options
available in the `sdf.yml`, see the [sdf.yml](/reference/sdf-yml) reference.

### Workspace Block

The workspace block is a special metadata block that belongs in your
`workspace.sdf.yml`. This defines your workspace configuration, and is used by
the SDF engine to build and deploy your project.

#### Hello World Workspace

The simplest version of a `workspace.sdf.yml` looks like this:

```yaml theme={null}
workspace:
  name: hello_world
  edition: "1.3"
  includes:
    - path: models
```

So what's going on here? The first thing you'll notice is the workspace
specifies an `edition`. This tells the SDF engine which version of the workspace
to expect when compiling your project.

Next we see an `includes` block. This tells the SDF engine where to look to find
the code sources it will use to build your project and transform your data.
Simply put, your SQL files should go into the `models` directory.

For a full list of the options available in the `workspace.sdf.yml`, see the
[sdf.yml](/reference/sdf-yml) reference.

#### Important Properties

```yml theme={null}
workspace:
  name: moms_flower_shop
  edition: "1.3"
  description: >
    This workspace represents the data warehouse of mom's flower shop. 

    It contains raw data regarding:
    1. Customers
    2. Marketing campaigns
    3. Mobile in-app events
    4. Street addresses

    That data is available in the seeds folder and is referenced in models/raw
    to be loaded and used by SDF. Data transformations are performed and additional 
    models are available in the staging and analytics folders under the models folder.

  includes:
    - path: models
      type: model
      index: schema-table-name
    - path: seeds/parquet
      type: resource
    - path: metadata
      type: metadata
      index: schema-table-name
    - path: classifications
      type: metadata
    - path: reports
      type: report
    - path: checks
      type: check
  
  defaults: 
    preprocessor: jinja
---
environment:
  name: dev
  integrations:
    - provider: sdf
      type: database
      targets:
        - pattern: moms_flower_shop.*.*
          rename-as: moms_workshed.${1}.${2}


```

* `name` - The name of the workspace. This is used to create the default catalog
  for the workspace.
* `edition` - The edition of the workspace. This is used to determine which
  version of the workspace to expect when compiling your project. By default this is `1.2`.
* `includes` - The include block references all metadata, code, and that will be used to build your project and transform your data. Can be a file, folder, or glob pattern.
  Each includes element can specify additional properties like the [IncludeType](/reference/sdf-yml#enum-includetype).
* `excludes` - The excludes block defines where to explicitly ignore code sources or metadata. Can be a file, folder, or glob pattern.
* `defaults` - The defaults block contains all defaults for the workspace. In this case, we're specifying the default catalog, schema, and dialect.
* `dialect` - The default SQL dialect that will be used to build your project.
  SDF supports multi-dialect projects, so you can have different SQL dialects in
  the same project by configuring the dialect in the
  [table metadata block](#table-block). However, most projects will only use
  one.

#### Includes Blocks

The includes sub-property exist only on the [workspace block](#workspace-block) and the [environment block](/guide/setup/environments).
The purpose of this block is to tell SDF where to find the models, metadata specifications, macros, reports, and resources like seed data. The includes property contains a list of
includes blocks, each of which are dictionaries that share the same structure. The most commonly used properties on these includes blocks are:

* `path` - The relative path to the directory or file in the workspace containing the models, metadata, resources, etc. This path is relative to the `workspace.sdf.yml`.
* `type` - The type of assets being included by this block. Some commonly used types are `model`, `spec`, and `macro`.
* `defaults` - This block models the defaults block at the top level of the workspace or environment block. It enables you to overwrite default properties on a per-includes-path basis.

SDF will auto-infer the type of the includes block if the directory name matches a certain naming convention. We consider this a best practice and encourage users to follow these conventions.

| Directory Name | Type       |
| -------------- | ---------- |
| models         | `model`    |
| specs          | `spec`     |
| reports        | `report`   |
| checks         | `check`    |
| macros         | `macro`    |
| tests          | `test`     |
| seeds          | `seed`     |
| resources      | `resource` |

<Tip>
  Given the naming conventions mentioned above, the following includes block would work without any type specifications:

  ```yaml workspace.sdf.yml theme={null}
  includes:
    - path: models
    - path: specs
    - path: reports
    - path: checks
    - path: macros
    - path: seeds
    - path: resources
  ```
</Tip>

### Table Block

The table block is used to define the metadata for a table. This includes the
table name, the table description, the table's columns, and even the table's SQL dialect.
This will likely be your most widely used metadata block. For more detailed
information on the table block, see the [Table Block](/reference/sdf-yml#table) reference.

#### Table-Level Classifiers

SDF enables you to tag tables via table-level classifiers, which allows you to quickly identify all
tables under a certain category. Some examples for table-level classifiers can be `customer_facing`, `debugging`,
`sensitive_information`, and more.

Here's an example of a table definition with a table-level classifier:

```yaml theme={null}
table:
  name: main
  classifiers:
    - purpose.third_party_sharing
```

#### Column-Level Classifiers

One of the most important use cases for the table block is defining column-level
classifiers. SDF will propagate these classifiers to downstream dependencies
automatically. Here's an example of a basic table definition with a column-level
classifier:

```yaml theme={null}
table:
  name: main
  columns:
    - name: phone
      classifiers:
        - pii.phone
```

This table definition attaches the `pii.phone` classifier to the `phone` column.

### Classifier Block

The classifier block is used to define classifiers in your SDF projects. You can
think of these as units of metadata to be reused and measured across your data
warehouse. Classifiers have names, descriptions, and labels.
For more detailed information on the classifier block, see the
[Classifier Block](/reference/sdf-yml#nested-element-classifier) reference.

In light of data privacy regulations, a common use case for classifiers is to
define PII columns. You can define a classifier for PII columns, then apply that
classifier to any column in your project and watch it propagate automatically to
downstream queries. This will allow you to easily identify PII columns across
your data warehouse, and even enforce data governance checks on them in the
future.

Here's an example of a basic PII classifier:

```yaml theme={null}
classifier:
  name: pii
  description: Personally Identifiable Information
  labels:
    - name: email
      description: Email address
    - name: phone
      description: Phone number
    - name: ssn
      description: Social Security Number
```

This classifier can be used across columns and tables alike to tell the SDF
engine where to find `PII`. See the [Table Block](#table-block) section for an
example of how to apply this classifier to a column.

<Tip>
  SDF automatically propagates classifiers to downstream dependencies. This
  helps you keep your metadata up to date and consistent across your entire data
  warehouse without having to manually update it everywhere. This makes it
  easier to stay compliant with data privacy regulations like GDPR and CCPA.
</Tip>

### Generated Metadata

SDF represents all the metadata about your project in a local `sdftarget` directory. After
running an SDF command, you'll find this new directory appears in your SDF workspace.
This is automatically added to your `.gitignore` and should not be tracked in git.
As the name implies, this folder enables SDF to cache intermediaries as you're
building your project locally. Along with its caching capability, it also
generates a set of `sdf.yml` files after running any SDF command.

These `sdf.yml` files represent all the user-defined and generated SDF metadata across your project.
This metadata is used to enable features like column-level lineage, classifier
propagation, and more.

Here's an example of a full-fledged table definition generated in the `sdftarget`:

<div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
  <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-yml">
    <code className="language-yml">
      table:
        name: moms\_flower\_shop.analytics.agg\_installs\_and\_campaigns
        dialect: trino
        casing-policy: preserve
        materialization: view
        purpose: model
        dependencies:
        - moms\_flower\_shop.staging.app\_installs\_v2
        columns:
        - name: install\_date
          datatype: varchar
          lineage:
            modify:
            - moms\_flower\_shop.staging.app\_installs\_v2.install\_time
        - name: campaign\_name
          description: The campaign name associated with the campaign\_id
          datatype: varchar
          lineage:
            copy:
            - moms\_flower\_shop.staging.app\_installs\_v2.campaign\_name
        - name: platform
          description: iOS or Android
          datatype: varchar
          lineage:
            copy:
            - moms\_flower\_shop.staging.app\_installs\_v2.platform
        - name: distinct\_installs
          datatype: bigint
          lineage:
            modify:
            - moms\_flower\_shop.staging.app\_installs\_v2.customer\_id
        classifiers:
        - RETENTION.infinity
        lineage:
          scan:
          - moms\_flower\_shop.staging.app\_installs\_v2.campaign\_name
          - moms\_flower\_shop.staging.app\_installs\_v2.install\_time
          - moms\_flower\_shop.staging.app\_installs\_v2.platform
        source-locations:
        - path: metadata/analytics/agg\_installs\_and\_campaigns.sdf.yml
        - path: models/analytics/agg\_installs\_and\_campaigns.sql
    </code>
  </pre>
</div>

These generated `sdf.yml` files are a powerful representation that can help you understand what
SDF is doing under the hood. You can use these files to understand how SDF
is interpreting your project and debug issues that might arise. We hope developers will take this representation and build their own
tools and functionalities on top of it in the future.

<Note>
  Want to build something cool that utilizes the SDF engine? We'd love to hear
  about it! [Inquire here](https://www.sdf.com/support) to let us
  know what you're working on and how we can help.
</Note>
