Setup
Intro to SDF Classifiers
Creating New Classifiers
app_installs had an incorrect JOIN between
mobile app in-app events in the raw_inapp_events table, and marketing
campaign events in the raw_marketing_campaign_events table:classifications/column_classifiers.sdf.yml already contains
the event classifiers. Take a look yourself:Assign Classifiers to Source Tables
raw tables.Propagate Classifiers
dim_marketing_campaigns,
we can see some of its columns inherited the classifier EVENT.inapp.Schema moms_flower_shop.analytics.dim_marketing_campaigns
┌───────────────────────────────┬───────────┬─────────────┬──────────────────────────────────────────────────────────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═══════════════════════════════╪═══════════╪═════════════╪══════════════════════════════════════════════════════════════════╡
│ campaign_id ┆ bigint ┆ ┆ The identifier of the campaign that is associated with the event │
│ campaign_name ┆ varchar ┆ ┆ The campaign name associated with the campaign_id │
│ total_num_installs ┆ bigint ┆ EVENT.inapp ┆ │
│ avg_customer_acquisition_cost ┆ double ┆ EVENT.inapp ┆ │
│ install_duration_ratio ┆ bigint ┆ EVENT.inapp ┆ │
└───────────────────────────────┴───────────┴─────────────┴──────────────────────────────────────────────────────────────────┘app_installs inherited both EVENT.inapp
and EVENT.marketing. This is due to the incorrect JOIN we found.Schema moms_flower_shop.staging.app_installs
This table is a staging table which adds campaign information to app install events
[TABLE_STATUS.deprecated]
┌───────────────┬───────────┬─────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═══════════════╪═══════════╪═════════════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ event_id ┆ bigint ┆ EVENT.inapp EVENT.marketing ┆ │
│ customer_id ┆ bigint ┆ ┆ The identifier of the customer that performed the event │
│ install_time ┆ timestamp ┆ ┆ │
│ platform ┆ varchar ┆ ┆ iOS or Android │
│ campaign_id ┆ bigint ┆ ┆ The identifier of the campaign that is associated with the event │
│ campaign_name ┆ varchar ┆ ┆ The campaign name associated with the campaign_id │
│ campaign_type ┆ varchar ┆ ┆ A substring of the campaign name contain the campaign type. Supported types - instagram_ads, friends_referrals, facebook_ads, google_search │
│ ┆ ┆ ┆ │
└───────────────┴───────────┴─────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘Set Up Guardrails
mixed_event_ids.sql under the folder
checks. Add the following SQL to the new file:app_installs still has an
incorrect JOIN.[Fail] Check moms_flower_shop.pub.mixed_event_ids
┌──────────────┬─────────────┬────────────────────────────────┐
│ table_name ┆ column name ┆ classifiers │
╞══════════════╪═════════════╪════════════════════════════════╡
│ app_installs ┆ event_id ┆ [EVENT.inapp, EVENT.marketing] │
└──────────────┴─────────────┴────────────────────────────────┘
1 rows.Deprecate App Installs
app_installs and we now fully support the newer
version, app_installs_v2. It is safe to deprecate the model - just delete the file.Or course, you can run sdf compile to validate the change.Privacy
metadata/raw/raw_customers.sdf.yml and uncomment
all classifier sections in the file. They should look like this:customers table compile output should look like this:Schema moms_flower_shop.staging.customers
┌───────────────┬───────────┬─────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═══════════════╪═══════════╪═════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ customer_id ┆ bigint ┆ ┆ A unique identifier of a mom’s floewr shop customer │
│ first_name ┆ varchar ┆ PII.name ┆ The first name of the customer │
│ last_name ┆ varchar ┆ PII.name ┆ The last name of the customer │
│ full_name ┆ varchar ┆ PII.name ┆ │
│ email ┆ varchar ┆ PII.email ┆ The emial of the customer │
│ gender ┆ varchar ┆ PII.gender ┆ The gender of the customer │
│ campaign_id ┆ bigint ┆ ┆ The identifier of the campaign that is associated with the event │
│ campaign_name ┆ varchar ┆ ┆ The campaign name associated with the campaign_id │
│ campaign_type ┆ varchar ┆ ┆ A substring of the campaign name contain the campaign type. Supported types - instagram_ads, friends_referrals, facebook_ads, google_search │
│ ┆ ┆ ┆ │
│ address_id ┆ bigint ┆ PII.address ┆ The address_id of the address associated with the customer │
│ full_address ┆ varchar ┆ ┆ The full address associated with the address_id │
│ state ┆ varchar ┆ ┆ The address US state associated with the address_id │
└───────────────┴───────────┴─────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘Retention
table_classifiers.sdf.yml file you will find a retention classifier:metadata/raw/*, add:metadata/analytics/*, add:propagate: false:raw_addresses output
from the first command:Schema moms_flower_shop.raw.raw_addresses
All relevant information related to street addresses known to mom’s flower shop. This information comes from the user input into the mobile app.
[RETENTION.d7]
┌───────────────┬───────────┬────────────┬──────────────────────────────────────────────────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═══════════════╪═══════════╪════════════╪══════════════════════════════════════════════════════════╡
│ index ┆ bigint ┆ ┆ Row number │
│ address_id ┆ bigint ┆ ┆ A unique identifier of an address │
│ full_address ┆ varchar ┆ ┆ The full address associated with the address_id │
│ street_number ┆ bigint ┆ ┆ The address street number associated with the address_id │
│ street_name ┆ varchar ┆ ┆ The address street name associated with the address_id │
│ state ┆ varchar ┆ ┆ The address US state associated with the address_id │
│ city ┆ varchar ┆ ┆ The address US city associated with the address_id │
└───────────────┴───────────┴────────────┴──────────────────────────────────────────────────────────┘dim_marketing_campaigns from the second:Schema moms_flower_shop.analytics.dim_marketing_campaigns
[RETENTION.infinity]
┌───────────────────────────────┬───────────┬─────────────┬──────────────────────────────────────────────────────────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═══════════════════════════════╪═══════════╪═════════════╪══════════════════════════════════════════════════════════════════╡
│ campaign_id ┆ bigint ┆ ┆ The identifier of the campaign that is associated with the event │
│ campaign_name ┆ varchar ┆ ┆ The campaign name associated with the campaign_id │
│ total_num_installs ┆ bigint ┆ EVENT.inapp ┆ │
│ avg_customer_acquisition_cost ┆ double ┆ EVENT.inapp ┆ │
│ install_duration_ratio ┆ bigint ┆ EVENT.inapp ┆ │
└───────────────────────────────┴───────────┴─────────────┴──────────────────────────────────────────────────────────────────┘Reports