> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Enriching your warehouse

## Overview

In the previous tutorial, we set up guardrails based on checks.
In this tutorial, we will see how SDF's semantic understanding
can help transform your data warehouse from strings and numbers
to real-world business logic:

* Maintain business logic consistency
* Control development environment and minimize mistakes from propagating
* Comply with privacy and GDPR requirements

Let's see how we could've prevented the V1 version of app\_installs in the first place.

## Prerequisites

Completion of the [previous tutorial](/tutorials/deprecating-a-model).

## Guide

<Steps>
  <Step title="Setup">
    If you haven't completed the [previous tutorial](/tutorials/deprecating-a-model),
    uncomment the relevant section to reference the metadata files:

    ```yml workspace.sdf.yml theme={null}
    ...
    # Uncomment here to begin the "Deprecating a Model" tutorial >>>>>>
    - path: classifications  # Classifiers
        type: metadata
    - path: reports  # Reports based on SDF s information schema
        type: report
    - path: checks  # Checks against SDF's information schema
        type: check
    # <<<<<<<
    ```
  </Step>

  <Step title="Intro to SDF Classifiers">
    SDF's has the ability to annotate columns and tables with user defined types which represent
    real-world business logic. Those types enrich the data warehouse and create new SQL types -
    instead of just BIGINTs, we can now have currencies, different types of IDs, zip-codes, and many more.

    SDF **automatically propagates** those types to downstream assets, enriching the entire
    data warehouse with a new layer of semantic understanding.

    <Tip>
      To learn more about SDF's classifiers, read our [classifiers guide](/guide/basics/classifiers).
    </Tip>
  </Step>

  <Step title="Creating New Classifiers">
    Let's focus back on Mom's Flower Shop.

    If you recall, V1 of `app_installs` had an incorrect `JOIN` between
    mobile app in-app events in the `raw_inapp_events` table, and marketing
    campaign events in the `raw_marketing_campaign_events` table:

    ```sql theme={null}
    ...
    FROM inapp_events i 
        LEFT OUTER JOIN raw.raw_marketing_campaign_events m
            ON (i.event_id = m.event_id) 
    ...
    ```

    Essentially, we were joining two elements that are completely different.
    Like comparing Apples to Oranges. These kind of mistakes happen all the time.
    Thankfully, we can leverage SDF's semantic understanding and smart propagation
    to set guardrails which will prevent future similar mistakes.

    Let's use SDF classifiers to add the missing business logic.

    The column classifiers file `classifications/column_classifiers.sdf.yml` already contains
    the event classifiers. Take a look yourself:

    ```yml classifications/column_classifiers.sdf.yml theme={null}
    classifier:
        name: EVENT
        labels:
            - name: inapp
            - name: marketing
    ```
  </Step>

  <Step title="Assign Classifiers to Source Tables">
    To assign the classifiers, uncomment the relevant section in each of the
    files:

    ```yml metadata/raw/raw_inapp_events.sdf.yml theme={null}
    ...
    - name: event_id
        description: A unique identifier of an in-app event within mom's flower shop's mobile app
        # Uncomment to begin the "Enriching your Warehouse" tutorial >>>>>
        classifiers:
            - EVENT.inapp
        # <<<<<
    ...
    ```

    ```yml metadata/raw/raw_marketing_campaign_events.sdf.yml theme={null}
    - name: event_id
        description: A unique identifier of the marketing campaign event
        # Uncomment to begin the "Enriching your Warehouse" tutorial >>>>>
        classifiers:
            - EVENT.marketing
        # <<<<<
    ```

    Notice that we only assigned the classifiers to the source `raw` tables.
  </Step>

  <Step title="Propagate Classifiers">
    <Tip>
      SDF actually propagates the classifiers automatically, so no extra steps
      are required.
    </Tip>

    Let's compile to view our tables metadata:

    ```shell theme={null}
    sdf compile --show result
    ```

    Although we didn't assign the classifiers to `dim_marketing_campaigns`,
    we can see some of its columns inherited the classifier `EVENT.inapp`.

    ```shell theme={null}
    sdf compile analytics.dim_marketing_campaigns --show result
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Schema moms\_flower\_shop.analytics.dim\_marketing\_campaigns
          ┌───────────────────────────────┬───────────┬─────────────┬──────────────────────────────────────────────────────────────────┐
          │ column\_name                   ┆ data\_type ┆ classifier  ┆ description                                                      │
          ╞═══════════════════════════════╪═══════════╪═════════════╪══════════════════════════════════════════════════════════════════╡
          │ campaign\_id                   ┆ bigint    ┆             ┆ The identifier of the campaign that is associated with the event │
          │ campaign\_name                 ┆ varchar   ┆             ┆ The campaign name associated with the campaign\_id                │
          │ total\_num\_installs            ┆ bigint    ┆ EVENT.inapp ┆                                                                  │
          │ avg\_customer\_acquisition\_cost ┆ double    ┆ EVENT.inapp ┆                                                                  │
          │ install\_duration\_ratio        ┆ bigint    ┆ EVENT.inapp ┆                                                                  │
          └───────────────────────────────┴───────────┴─────────────┴──────────────────────────────────────────────────────────────────┘
        </code>
      </pre>
    </div>

    Furthermore, we can see that `app_installs` inherited both `EVENT.inapp`
    and `EVENT.marketing`. This is due to the incorrect `JOIN` we found.

    ```shell theme={null}
    sdf compile staging.app_installs --show result
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Schema moms\_flower\_shop.staging.app\_installs
                 This table is a staging table which adds campaign information to app install events
                 \[TABLE\_STATUS.deprecated]
          ┌───────────────┬───────────┬─────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
          │ column\_name   ┆ data\_type ┆ classifier                  ┆ description                                                                                                                                  │
          ╞═══════════════╪═══════════╪═════════════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
          │ event\_id      ┆ bigint    ┆ EVENT.inapp EVENT.marketing ┆                                                                                                                                              │
          │ customer\_id   ┆ bigint    ┆                             ┆ The identifier of the customer that performed the event                                                                                      │
          │ install\_time  ┆ timestamp ┆                             ┆                                                                                                                                              │
          │ platform      ┆ varchar   ┆                             ┆ iOS or Android                                                                                                                               │
          │ campaign\_id   ┆ bigint    ┆                             ┆ The identifier of the campaign that is associated with the event                                                                             │
          │ campaign\_name ┆ varchar   ┆                             ┆ The campaign name associated with the campaign\_id                                                                                            │
          │ campaign\_type ┆ varchar   ┆                             ┆ A substring of the campaign name contain the campaign type.  Supported types - instagram\_ads, friends\_referrals, facebook\_ads, google\_search │
          │               ┆           ┆                             ┆                                                                                                                                              │
          └───────────────┴───────────┴─────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
        </code>
      </pre>
    </div>
  </Step>

  <Step title="Set Up Guardrails">
    We can create custom checks by querying SDF's information schema.
    To prevent this mistake from repeating, we can set up a check.

    First, create a file called `mixed_event_ids.sql` under the folder
    `checks`. Add the following SQL to the new file:

    ```sql checks/mixed_event_ids.sql theme={null}
    SELECT
        DISTINCT c.table_name as "table_name",
        c.column_name as "column name",
        c.classifiers
    FROM
        sdf.information_schema.columns c
    WHERE
        -- more than one EVENT classifier is assigned
        CAST(c.classifiers AS VARCHAR) LIKE '%EVENT%EVENT%'
    ```

    Let's run it:

    ```shell theme={null}
    sdf check mixed_event_ids --show result
    ```

    This check will fail because `app_installs` still has an
    incorrect `JOIN`.

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-error">
        <code className="language-error">
          \[Fail] Check moms\_flower\_shop.pub.mixed\_event\_ids
          ┌──────────────┬─────────────┬────────────────────────────────┐
          │ table\_name   ┆ column name ┆ classifiers                    │
          ╞══════════════╪═════════════╪════════════════════════════════╡
          │ app\_installs ┆ event\_id    ┆ \[EVENT.inapp, EVENT.marketing] │
          └──────────────┴─────────────┴────────────────────────────────┘
          1 rows.
        </code>
      </pre>
    </div>
  </Step>

  <Step title="Deprecate App Installs">
    In the [previous tutorial](/tutorials/deprecating-a-model), we already resolved
    any downstream dependencies of `app_installs` and we now fully support the newer
    version, `app_installs_v2`. It is safe to deprecate the model - just delete the file.

    Or course, you can run `sdf compile` to validate the change.
  </Step>
</Steps>

## Bonus

Classifiers can enrich your data warehouse in many ways.
The following are just a few examples of added information layers
to your static tables.

With each example, you can create checks and reports
to monitor your warehouse's health and compliance.

<Steps>
  <Step title="Privacy">
    Privacy is critical when storing sensitive information.
    With SDF's smart classifiers propagation, it is easier than
    ever to track PII and other privacy related concerns.

    Open `metadata/raw/raw_customers.sdf.yml` and uncomment
    all classifier sections in the file. They should look like this:

    ```yml metadata/raw/raw_customers.sdf.yml theme={null}
    ...
    # Uncomment to begin the "Enriching your Warehouse" tutorial >>>>>
    classifiers:  <-- uncomment this part
        - name: PII.[name/address/gender/email]  <-- uncomment this part
    # <<<<<
    ...
    ```

    Now, when we compile, we can see the PII across our data warehouse.

    ```shell theme={null}
    sdf compile --show all
    ```

    For example, the `customers` table compile output should look like this:

    ```shell theme={null}
    sdf compile staging.customers --show result
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Schema moms\_flower\_shop.staging.customers
          ┌───────────────┬───────────┬─────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
          │ column\_name   ┆ data\_type ┆ classifier  ┆ description                                                                                                                                  │
          ╞═══════════════╪═══════════╪═════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
          │ customer\_id   ┆ bigint    ┆             ┆ A unique identifier of a mom's floewr shop customer                                                                                          │
          │ first\_name    ┆ varchar   ┆ PII.name    ┆ The first name of the customer                                                                                                               │
          │ last\_name     ┆ varchar   ┆ PII.name    ┆ The last name of the customer                                                                                                                │
          │ full\_name     ┆ varchar   ┆ PII.name    ┆                                                                                                                                              │
          │ email         ┆ varchar   ┆ PII.email   ┆ The emial of the customer                                                                                                                    │
          │ gender        ┆ varchar   ┆ PII.gender  ┆ The gender of the customer                                                                                                                   │
          │ campaign\_id   ┆ bigint    ┆             ┆ The identifier of the campaign that is associated with the event                                                                             │
          │ campaign\_name ┆ varchar   ┆             ┆ The campaign name associated with the campaign\_id                                                                                            │
          │ campaign\_type ┆ varchar   ┆             ┆ A substring of the campaign name contain the campaign type.  Supported types - instagram\_ads, friends\_referrals, facebook\_ads, google\_search │
          │               ┆           ┆             ┆                                                                                                                                              │
          │ address\_id    ┆ bigint    ┆ PII.address ┆ The address\_id of the address associated with the customer                                                                                   │
          │ full\_address  ┆ varchar   ┆             ┆ The full address associated with the address\_id                                                                                              │
          │ state         ┆ varchar   ┆             ┆ The address US state associated with the address\_id                                                                                          │
          └───────────────┴───────────┴─────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
        </code>
      </pre>
    </div>
  </Step>

  <Step title="Retention">
    We can set up table level and column level retention classifiers.
    Let's look at a table level example.

    In the `table_classifiers.sdf.yml` file you will find a retention classifier:

    ```yml classifications/table_classifiers.sdf.yml theme={null}
    classifier: 
        name: RETENTION
        labels:
            - name: d7
            - name: d30
            - name: d90
            - name: d180
            - name: infinity
    ```

    We can assign short term retention to our raw tables, while keeping infinite
    retention for any analytics tables.

    For each raw table metadata found in `metadata/raw/*`, add:

    ```yml theme={null}
    ...
    classifiers:
        - name: RETENTION.d7
    ...
    ```

    To each analytics table metadata found in `metadata/analytics/*`, add:

    ```yml theme={null}
    ...
    classifiers:
        - name: RETENTION.infinity
    ...
    ```

    Notice that these classifiers are defined not to propagate downstream
    using the flag `propagate: false`:

    ```yml theme={null}
    classifier:
        ...
        propagate: false
    ...
    ```

    Compile to see the results:

    <CodeGroup>
      ```shell raw theme={null}
      sdf compile models/raw/ --show result
      ```

      ```shell analytics theme={null}
      sdf compile models/analytics/ --show result
      ```
    </CodeGroup>

    For example, we can look at the `raw_addresses` output
    from the first command:

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Schema moms\_flower\_shop.raw\.raw\_addresses
                 All relevant information related to street addresses known to mom's flower shop.  This information comes from the user input into the mobile app.
                 \[RETENTION.d7]
          ┌───────────────┬───────────┬────────────┬──────────────────────────────────────────────────────────┐
          │ column\_name   ┆ data\_type ┆ classifier ┆ description                                              │
          ╞═══════════════╪═══════════╪════════════╪══════════════════════════════════════════════════════════╡
          │ index         ┆ bigint    ┆            ┆ Row number                                               │
          │ address\_id    ┆ bigint    ┆            ┆ A unique identifier of an address                        │
          │ full\_address  ┆ varchar   ┆            ┆ The full address associated with the address\_id          │
          │ street\_number ┆ bigint    ┆            ┆ The address street number associated with the address\_id │
          │ street\_name   ┆ varchar   ┆            ┆ The address street name associated with the address\_id   │
          │ state         ┆ varchar   ┆            ┆ The address US state associated with the address\_id      │
          │ city          ┆ varchar   ┆            ┆ The address US city associated with the address\_id       │
          └───────────────┴───────────┴────────────┴──────────────────────────────────────────────────────────┘
        </code>
      </pre>
    </div>

    Or `dim_marketing_campaigns` from the second:

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Schema moms\_flower\_shop.analytics.dim\_marketing\_campaigns
                 \[RETENTION.infinity]
          ┌───────────────────────────────┬───────────┬─────────────┬──────────────────────────────────────────────────────────────────┐
          │ column\_name                   ┆ data\_type ┆ classifier  ┆ description                                                      │
          ╞═══════════════════════════════╪═══════════╪═════════════╪══════════════════════════════════════════════════════════════════╡
          │ campaign\_id                   ┆ bigint    ┆             ┆ The identifier of the campaign that is associated with the event │
          │ campaign\_name                 ┆ varchar   ┆             ┆ The campaign name associated with the campaign\_id                │
          │ total\_num\_installs            ┆ bigint    ┆ EVENT.inapp ┆                                                                  │
          │ avg\_customer\_acquisition\_cost ┆ double    ┆ EVENT.inapp ┆                                                                  │
          │ install\_duration\_ratio        ┆ bigint    ┆ EVENT.inapp ┆                                                                  │
          └───────────────────────────────┴───────────┴─────────────┴──────────────────────────────────────────────────────────────────┘
        </code>
      </pre>
    </div>
  </Step>

  <Step title="Reports">
    To learn more on how to generate reports of warehouse health, see
    our extended [reports guide](/guide/data-quality/reports).
  </Step>
</Steps>

## Summary

This tutorial only shows the tip of the iceberg of what you can do with our
semantic understanding. Anything that's possible with SQL is possible as a check or
report against the information schema.

<Tip>
  We created the information schema to support custom checks and reports.
  You can find the full description of SDF's information schema [here](/reference/sdf-information-schema).
</Tip>

## Related Topics

<CardGroup cols={3}>
  <Card title="Classifiers" href="/guide/basics/classifiers">
    Learn all the ways classifiers can help enhance your workspace
  </Card>

  <Card title="Checks" href="/guide/data-quality/checks">
    Learn more about setting up checks for static analysis
  </Card>

  <Card title="Checks" href="/guide/data-quality/checks">
    Learn more about checks to run impact analysis on your data warehouse
  </Card>
</CardGroup>
