Create a new SDF Project
Created pii_saas_platform/.gitignore
Created pii_saas_platform/checks/no_pii_in_external.sql
Created pii_saas_platform/classification/taxonomy.sdf.yml
Created pii_saas_platform/classification/users.sdf.yml
Created pii_saas_platform/ddls/payment/public/invoices.sql
Created pii_saas_platform/ddls/payment/public/organizations.sql
Created pii_saas_platform/ddls/payment/public/users.sql
Created pii_saas_platform/models/external/invoice_stats.sql
Created pii_saas_platform/models/external/org_invoice_stats.sql
Created pii_saas_platform/models/internal/avg_invoice_amt.sql
Created pii_saas_platform/models/internal/invoice_payment_delay.sql
Created pii_saas_platform/models/internal/mau_per_org.sql
Created pii_saas_platform/models/internal/most_frequent_payer.sql
Created pii_saas_platform/models/internal/total_revenue_per_org.sql
Created pii_saas_platform/models/internal/users_per_domain.sql
Created pii_saas_platform/models/internal/users_per_org.sql
Created pii_saas_platform/reports/tables_with_pii.sql
Created pii_saas_platform/workspace.sdf.yml
Welcome to your new SDF Workspace! To help you on your journey:
💡 Join the SDF Community Slack -> https://sdf.com/join
📚 Read the Docs to Get Started -> https://docs.sdf.com/
Finished new in 0.311 secs
reports
directory. This report simply lists all tables with PII. It’s configured in the workspace.sdf.yml
file with includes type report
like so:PII
. This is done with the following SQL:PII.name
, PII.email
, or PII.phone
).classifiers
on a column as a string. As such, we can use the SQL LIKE
operator to search for a substring.Run & Modify a Report
Working set 12 model files, 3 .sdf files
Compiling payment.public.users (./ddls/payment/public/users.sql)
Compiling payment.public.organizations (./ddls/payment/public/organizations.sql)
Compiling payment.public.invoices (./ddls/payment/public/invoices.sql)
Compiling transformations.internal.mau_per_org (./models/internal/mau_per_org.sql)
Compiling transformations.internal.avg_invoice_amt (./models/internal/avg_invoice_amt.sql)
Compiling transformations.internal.invoice_payment_delay (./models/internal/invoice_payment_delay.sql)
Compiling transformations.internal.most_frequent_payer (./models/internal/most_frequent_payer.sql)
Compiling transformations.internal.users_per_domain (./models/internal/users_per_domain.sql)
Compiling transformations.internal.users_per_org (./models/internal/users_per_org.sql)
Compiling transformations.internal.total_revenue_per_org (./models/internal/total_revenue_per_org.sql)
Compiling transformations.external.invoice_stats (./models/external/invoice_stats.sql)
Compiling transformations.external.org_invoice_stats (./models/external/org_invoice_stats.sql)
Working set 1 report file, 1 .sdf file
Reporting sdf.reports.tables_with_pii (./reports/tables_with_pii.sql)
Finished 12 models [12 succeeded], 1 report [1 succeeded] in 8.851 secsReport sdf.reports.TABLES_WITH_PII
┌──────────────────────────────────────────────┬─────────────────────────┬───────────┐
│ table_id ┆ description ┆ dialect │
╞══════════════════════════════════════════════╪═════════════════════════╪═══════════╡
│ payment.public.USERS ┆ DDL for the users table ┆ snowflake │
│ TRANSFORMATIONS.EXTERNAL.INVOICE_STATS ┆ ┆ snowflake │
│ TRANSFORMATIONS.INTERNAL.USERS_PER_DOMAIN ┆ ┆ snowflake │
│ TRANSFORMATIONS.INTERNAL.MOST_FREQUENT_PAYER ┆ ┆ snowflake │
└──────────────────────────────────────────────┴─────────────────────────┴───────────┘
4 rows.
Working set 12 model files, 3 .sdf files
Working set 1 report file, 1 .sdf file
Reporting sdf.reports.tables_with_pii (./reports/tables_with_pii.sql)
Finished 12 models [12 reused], 1 report [1 succeeded] in 4.899 secsReport sdf.reports.TABLES_WITH_PII
┌──────────────────────────────────────────────┬──────────────┬─────────────────────────┬───────────┐
│ table_id ┆ column_name ┆ description ┆ dialect │
╞══════════════════════════════════════════════╪══════════════╪═════════════════════════╪═══════════╡
│ TRANSFORMATIONS.EXTERNAL.INVOICE_STATS ┆ NAME ┆ ┆ snowflake │
│ TRANSFORMATIONS.INTERNAL.USERS_PER_DOMAIN ┆ EMAIL_DOMAIN ┆ ┆ snowflake │
│ TRANSFORMATIONS.INTERNAL.MOST_FREQUENT_PAYER ┆ EMAIL ┆ ┆ snowflake │
│ payment.public.USERS ┆ NAME ┆ DDL for the users table ┆ snowflake │
│ TRANSFORMATIONS.EXTERNAL.INVOICE_STATS ┆ EMAIL ┆ ┆ snowflake │
│ payment.public.USERS ┆ EMAIL ┆ DDL for the users table ┆ snowflake │
│ payment.public.USERS ┆ PHONE ┆ DDL for the users table ┆ snowflake │
└──────────────────────────────────────────────┴──────────────┴─────────────────────────┴───────────┘
7 rows.
Add a New Report
reports
directory. Let’s add a new report to find the most frequently used datatypes in columns with PII. We can do this with the following SQL:Working set 12 model files, 3 .sdf files
Working set 2 report files, 1 .sdf file
Reporting sdf.reports.pii_datatypes (./reports/pii_datatypes.sql)
Finished 12 models [12 reused], 2 reports [1 succeeded, 1 reused] in 3.772 secsReport sdf.reports.TABLES_WITH_PII
┌──────────────────────────────────────────────┬──────────────┬─────────────────────────┬───────────┐
│ table_id ┆ column_name ┆ description ┆ dialect │
╞══════════════════════════════════════════════╪══════════════╪═════════════════════════╪═══════════╡
│ TRANSFORMATIONS.EXTERNAL.INVOICE_STATS ┆ EMAIL ┆ ┆ snowflake │
│ payment.public.USERS ┆ EMAIL ┆ DDL for the users table ┆ snowflake │
│ payment.public.USERS ┆ PHONE ┆ DDL for the users table ┆ snowflake │
│ TRANSFORMATIONS.EXTERNAL.INVOICE_STATS ┆ NAME ┆ ┆ snowflake │
│ TRANSFORMATIONS.INTERNAL.USERS_PER_DOMAIN ┆ EMAIL_DOMAIN ┆ ┆ snowflake │
│ TRANSFORMATIONS.INTERNAL.MOST_FREQUENT_PAYER ┆ EMAIL ┆ ┆ snowflake │
│ payment.public.USERS ┆ NAME ┆ DDL for the users table ┆ snowflake │
└──────────────────────────────────────────────┴──────────────┴─────────────────────────┴───────────┘
7 rows.Report sdf.reports.PII_DATATYPES
┌──────────┬───────────┐
│ datatype ┆ frequency │
╞══════════╪═══════════╡
│ varchar ┆ 7 │
└──────────┴───────────┘
1 rows.
Save the Output (Optional)
--format
flag. Let’s save the output of our reports to a file called sdf_reports.csv
:--show result
to only show the result of the report, not any output regarding the execution of the report. In a scenario where you’d want to programatically generate valid CSVs or JSONs of your reports, this is necessary to avoid any extraneous output. See our input and output guide for more on this.