> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Reports

## Understanding Reports

Reports are a useful way to track & analyze datatypes, metadata, and business logic throughout your data warehouse. Reports are expressed as SQL queries against the SDF Information Schema, similar to [Checks](/guide/data-quality/checks). Since reports are generated with Static Analysis, executing them is fast.
Reports are simply small SQL queries that run against the information schema and enable you to analyze your warehouse and metadata. Some examples of effective reports are:

* Analyze Privacy Compliance: Get a list of all tables and columns that contain PII
* Create a Data Deletion Report: Improve your data deletion capabilities by tracking retention and writing a report to find all tables that contain PII and are past their retention period.
* Run a Dead Column Analysis: Optimize your warehouse by computing a list of all columns that have no dependencies, then remove them to save on storage and compute costs.

<Note>
  A report is very similar to a check. A check is a specialized report where the expected result of the report is empty.
  In this way, reports can be thought of as variables reflecting the state of your warehouse. These variables can then be used to power a check.
</Note>

## Writing Your First Report

Let's write and modify a report, starting with one of the sdf samples.

<Steps>
  <Step title="Create a new SDF Project">
    ```shell theme={null}
    sdf new --sample pii_saas_platform
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
              Created pii\_saas\_platform/.gitignore
              Created pii\_saas\_platform/checks/no\_pii\_in\_external.sql
              Created pii\_saas\_platform/classification/taxonomy.sdf.yml
              Created pii\_saas\_platform/classification/users.sdf.yml
              Created pii\_saas\_platform/ddls/payment/public/invoices.sql
              Created pii\_saas\_platform/ddls/payment/public/organizations.sql
              Created pii\_saas\_platform/ddls/payment/public/users.sql
              Created pii\_saas\_platform/models/external/invoice\_stats.sql
              Created pii\_saas\_platform/models/external/org\_invoice\_stats.sql
              Created pii\_saas\_platform/models/internal/avg\_invoice\_amt.sql
              Created pii\_saas\_platform/models/internal/invoice\_payment\_delay.sql
              Created pii\_saas\_platform/models/internal/mau\_per\_org.sql
              Created pii\_saas\_platform/models/internal/most\_frequent\_payer.sql
              Created pii\_saas\_platform/models/internal/total\_revenue\_per\_org.sql
              Created pii\_saas\_platform/models/internal/users\_per\_domain.sql
              Created pii\_saas\_platform/models/internal/users\_per\_org.sql
              Created pii\_saas\_platform/reports/tables\_with\_pii.sql
              Created pii\_saas\_platform/workspace.sdf.yml
           
                  Welcome to your new SDF Workspace! To help you on your journey:
                  
                  💡 Join the SDF Community Slack -> [https://sdf.com/join](https://sdf.com/join) 
                  
                  📚 Read the Docs to Get Started -> [https://docs.sdf.com/](https://docs.sdf.com/)
                  
             Finished new in 0.311 secs
        </code>
      </pre>
    </div>

    This sample contains a report and a workspace modeled after a simple SAAS platform.

    Notice the report in the `reports` directory. This report simply lists all tables with PII. It's configured in the `workspace.sdf.yml` file with [includes type](/reference/sdf-yml#enum-includetype) `report` like so:

    ```yml theme={null}
    workspace:
      name: hello
      ...
      includes:
        ...
        - path: reports
          type: report
    ```

    The report itself lists all tables with PII by fetching all columns with classifiers that have `PII`. This is done with the following SQL:

    ```sql theme={null}
    SELECT 
      t.table_id, 
      t.description, 
      t.dialect
    FROM 
      sdf.information_schema.tables t
    JOIN 
      sdf.information_schema.columns c ON t.table_id = c.table_id
    WHERE CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII')
    GROUP BY 1,2,3;
    ```

    <Info>
      Both checks and reports must be written in the Trino SQL dialect.
    </Info>

    This report will show us tables that contain PII only because we defined a classifier like so:

    ```yml theme={null}
        classifier:
        name: PII
        labels: 
          - name: NAME
          - name: email
          - name: phone
    ```

    As such, this report will show tables that contain columns with any instance of the PII classifier (i.e. `PII.name`, `PII.email`, or `PII.phone`).

    <Note>
      SDF stores the `classifiers` on a column as a string. As such, we can use the SQL `LIKE` operator to search for a substring.
    </Note>
  </Step>

  <Step title="Run & Modify a Report">
    ```shell theme={null}
    sdf report --show all
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 12 model files, 3 .sdf files
            Compiling payment.public.users (./ddls/payment/public/users.sql)
            Compiling payment.public.organizations (./ddls/payment/public/organizations.sql)
            Compiling payment.public.invoices (./ddls/payment/public/invoices.sql)
            Compiling transformations.internal.mau\_per\_org (./models/internal/mau\_per\_org.sql)
            Compiling transformations.internal.avg\_invoice\_amt (./models/internal/avg\_invoice\_amt.sql)
            Compiling transformations.internal.invoice\_payment\_delay (./models/internal/invoice\_payment\_delay.sql)
            Compiling transformations.internal.most\_frequent\_payer (./models/internal/most\_frequent\_payer.sql)
            Compiling transformations.internal.users\_per\_domain (./models/internal/users\_per\_domain.sql)
            Compiling transformations.internal.users\_per\_org (./models/internal/users\_per\_org.sql)
            Compiling transformations.internal.total\_revenue\_per\_org (./models/internal/total\_revenue\_per\_org.sql)
            Compiling transformations.external.invoice\_stats (./models/external/invoice\_stats.sql)
            Compiling transformations.external.org\_invoice\_stats (./models/external/org\_invoice\_stats.sql)
          Working set 1 report file, 1 .sdf file
            Reporting sdf.reports.tables\_with\_pii (./reports/tables\_with\_pii.sql)
             Finished 12 models \[12 succeeded], 1 report \[1 succeeded] in 8.851 secs

          Report sdf.reports.TABLES\_WITH\_PII
          ┌──────────────────────────────────────────────┬─────────────────────────┬───────────┐
          │ table\_id                                     ┆ description             ┆ dialect   │
          ╞══════════════════════════════════════════════╪═════════════════════════╪═══════════╡
          │ payment.public.USERS                         ┆ DDL for the users table ┆ snowflake │
          │ TRANSFORMATIONS.EXTERNAL.INVOICE\_STATS       ┆                         ┆ snowflake │
          │ TRANSFORMATIONS.INTERNAL.USERS\_PER\_DOMAIN    ┆                         ┆ snowflake │
          │ TRANSFORMATIONS.INTERNAL.MOST\_FREQUENT\_PAYER ┆                         ┆ snowflake │
          └──────────────────────────────────────────────┴─────────────────────────┴───────────┘
          4 rows.
        </code>
      </pre>
    </div>

    As you can see from the output, it looks we have three tables containing columns with PII. Let's say we also wanted to show the names of the columns that contain PII. We can do this by modifying the SQL query to include the column name like so:

    ```sql theme={null}
    SELECT 
      t.table_id, 
      c.column_name,
      t.description, 
      t.dialect
    FROM 
      sdf.information_schema.tables t
    JOIN 
      sdf.information_schema.columns c ON t.table_id = c.table_id
    WHERE CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII')
    GROUP BY 2,1,3,4;
    ```

    Now, when we run the report, we can see the column names as well:

    ```shell theme={null}
    sdf report --show all
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 12 model files, 3 .sdf files
          Working set 1 report file, 1 .sdf file
            Reporting sdf.reports.tables\_with\_pii (./reports/tables\_with\_pii.sql)
             Finished 12 models \[12 reused], 1 report \[1 succeeded] in 4.899 secs

          Report sdf.reports.TABLES\_WITH\_PII
          ┌──────────────────────────────────────────────┬──────────────┬─────────────────────────┬───────────┐
          │ table\_id                                     ┆ column\_name  ┆ description             ┆ dialect   │
          ╞══════════════════════════════════════════════╪══════════════╪═════════════════════════╪═══════════╡
          │ TRANSFORMATIONS.EXTERNAL.INVOICE\_STATS       ┆ NAME         ┆                         ┆ snowflake │
          │ TRANSFORMATIONS.INTERNAL.USERS\_PER\_DOMAIN    ┆ EMAIL\_DOMAIN ┆                         ┆ snowflake │
          │ TRANSFORMATIONS.INTERNAL.MOST\_FREQUENT\_PAYER ┆ EMAIL        ┆                         ┆ snowflake │
          │ payment.public.USERS                         ┆ NAME         ┆ DDL for the users table ┆ snowflake │
          │ TRANSFORMATIONS.EXTERNAL.INVOICE\_STATS       ┆ EMAIL        ┆                         ┆ snowflake │
          │ payment.public.USERS                         ┆ EMAIL        ┆ DDL for the users table ┆ snowflake │
          │ payment.public.USERS                         ┆ PHONE        ┆ DDL for the users table ┆ snowflake │
          └──────────────────────────────────────────────┴──────────────┴─────────────────────────┴───────────┘
          7 rows.
        </code>
      </pre>
    </div>
  </Step>

  <Step title="Add a New Report">
    Let's say we'd like to extend our analysis a bit but still maintain the original report. We can do this by adding a new report to the `reports` directory. Let's add a new report to find the most frequently used datatypes in columns with PII. We can do this with the following SQL:

    ```sql pii_datatypes.sql theme={null}
      SELECT c.datatype, COUNT(*) as frequency
      FROM sdf.information_schema.columns c
      WHERE CONTAINS_ARRAY_VARCHAR(c.classifiers, 'PII')
      GROUP BY c.datatype 
      ORDER BY frequency DESC;
    ```

    Now, when we run the report, we can see the most frequently used datatypes in columns with PII:

    ```shell theme={null}
    sdf report --show all
    ```

    <div className="bg-[#0F1117] dark:bg-codeblock rounded-xl dark:ring-1 dark:ring-gray-800/50 relative">
      <pre style={{ fontFamily: 'monospace', backgroundColor: 'transparent' }} className="language-shell">
        <code className="language-shell">
          Working set 12 model files, 3 .sdf files
          Working set 2 report files, 1 .sdf file
            Reporting sdf.reports.pii\_datatypes (./reports/pii\_datatypes.sql)
             Finished 12 models \[12 reused], 2 reports \[1 succeeded, 1 reused] in 3.772 secs

          Report sdf.reports.TABLES\_WITH\_PII
          ┌──────────────────────────────────────────────┬──────────────┬─────────────────────────┬───────────┐
          │ table\_id                                     ┆ column\_name  ┆ description             ┆ dialect   │
          ╞══════════════════════════════════════════════╪══════════════╪═════════════════════════╪═══════════╡
          │ TRANSFORMATIONS.EXTERNAL.INVOICE\_STATS       ┆ EMAIL        ┆                         ┆ snowflake │
          │ payment.public.USERS                         ┆ EMAIL        ┆ DDL for the users table ┆ snowflake │
          │ payment.public.USERS                         ┆ PHONE        ┆ DDL for the users table ┆ snowflake │
          │ TRANSFORMATIONS.EXTERNAL.INVOICE\_STATS       ┆ NAME         ┆                         ┆ snowflake │
          │ TRANSFORMATIONS.INTERNAL.USERS\_PER\_DOMAIN    ┆ EMAIL\_DOMAIN ┆                         ┆ snowflake │
          │ TRANSFORMATIONS.INTERNAL.MOST\_FREQUENT\_PAYER ┆ EMAIL        ┆                         ┆ snowflake │
          │ payment.public.USERS                         ┆ NAME         ┆ DDL for the users table ┆ snowflake │
          └──────────────────────────────────────────────┴──────────────┴─────────────────────────┴───────────┘
          7 rows.

          Report sdf.reports.PII\_DATATYPES
          ┌──────────┬───────────┐
          │ datatype ┆ frequency │
          ╞══════════╪═══════════╡
          │ varchar  ┆ 7         │
          └──────────┴───────────┘
          1 rows.
        </code>
      </pre>
    </div>
  </Step>

  <Step title="Save the Output (Optional)">
    We can save the output of a report to a file by using the `--format` flag. Let's save the output of our reports to a file called `sdf_reports.csv`:

    ```shell theme={null}
    sdf report --format csv --show result > sdf_reports.csv
    ```

    <Note>
      Note that we used `--show result` to only show the result of the report, not any output regarding the execution of the report. In a scenario where you'd want to programatically generate valid CSVs or JSONs of your reports, this is necessary to avoid any extraneous output. See [our input and output guide](/guide/setup/io) for more on this.
    </Note>
  </Step>
</Steps>

## Advanced Reports

* To learn more about transforming classifiers across functions, see: [UDFs](/guide/advanced/udf)
* To learn more about the SDF internal information schema, see: [Information Schema](/reference/sdf-information-schema)
