Understanding Tests

Tests are a commonly used data quality feature allowing you to scan results of queries and set expectations of your data. You can validate that a particular column is never null, or within a certain thresholding range, between a certain set of values, and more.

SDF provides a standard testing library. The functions provided in the library are included natively in SDF.

All SDF Tests for all columns are compiled into table scans. This allows SDF to run multiple tests with a single scan. This dramatically reduces the cost of data tests.

An SDF data test is a simple assertion test that runs a SQL query against your database and, if any data is returned, fails. The simplest example might be a non-null test on a column.

This would look like:

non_null.sql
select * from table where column is null

When sdf test runs this query, it will check if any data is returned and fail if so.

Working set 1 model file, 1 test file, 1 .sdf file     Running hello.pub.main (./models/main.sql)     Testing hello.pub.test_main (./sdftarget/dbg/tests/hello/pub/test_main.sql)    Finished 1 model [1 succeeded], 1 test [1 passed] in 0.754 secs [Pass] Test hello.pub.test_main

There are three types of tests builtin.

  • Scalar Column Tests - validate expectations of individual values in columns (eg. x > 100 )
  • Aggregate Column Tests - validate expectations of all values in a columns (eg. sum(x) > 100 )
  • Table Tests - validate expectations for all columns in a table (eg. columns a and b are unique).

Severity

Each test may be configured with a severity flag. By default all tests have a severity of error meaning that a constraint violation will result in a non-zero exit code for sdf.

Tests may have three results, with text highlighting providing an indication of the output.

Test ResultMeaning
🟡Warning
🔴Error

Using a Test

Tests are added to columns in table blocks. Each column has ‘constraints’ and test functions should be expressed as the expectation you have of the column.

Test libraries are written as SDF workspaces that are imported by your workspace.

Below, we add a simple aggregate uniqueness test, and a scalar null values test to a cool_column.

table:
  name: a_really_great_table
  columns:
    - name: cool_column
      tests:
      - expect: unique()
      - expect: not_null()
        severity: warning

SDF’s tests library is an out-of-the-box optimized testing mechanism. All tests associated with a table will be run within one query.

One query —> one table scan —> significant compute savings compare to other data quality solutions.

SDF Standard Library Tests

Column Tests

Test NameTypeDescription
not_null()ScalarColumn elements must not be null
valid_scalar(condition)ScalarColumn elements all meet scalar condition
valid_aggregate(condition)AggregateColumn elements all meet aggregate condition
unique()AggregateColumn elements must be unique
in_accepted_values([values])AggregateColumn values are all elements of array
minimum(value)AggregateColumn can not have values less than value
maximum(value)AggregateColumn can not have values greater than value
exclusive_minimum(value)AggregateColumn can not have values less or equal to value
exclusive_maximum(value)AggregateColumn can not have values greater than or equal to value
between(lower, upper)AggregateColumn values must be between lower and upper
max_length(value)AggregateColumn values (as strings) are greater than a given length
min_length(value)AggregateColumn values (as strings) are less than a given length
like(string)AggregateColumn elements must contain string
try_cast(type)AggregateColumn elements must include string
primary_key(column)AggregateColumn is primary key of column

Table Tests

Test NameTypeDescription
unique_columns([c1, c2])TableTable with columns c1 and c2 has unique values across these two columns

The full repo of sdf_tests, our open-source tests library, is is available on GitHub here.

Add Your First Tests

1

Prerequisite

SDF’s built-in tests library sdf_tests uses Jinja macros. Therefore, setting Jinja as a preprocessor is required.

Create the sample workspace tests:

sdf new --sample tests

In the file workspace.sdf.yml, we have Jinja set as the default preprocessor:

workspace.sdf.yml
workspace:
  name: Tests workspace
  ...
  defaults: 
    preprocessor: jinja
2

Add Tests

Open the table metadata file src_metadata/raw_inapp_events.sdf.yml. In the file, find the three commented out tests and uncomment them. They should look like this:

3

Run Tests

Run the tests. In the terminal, run:

sdf test 

Working set 1 model file, 1 test file, 2 .sdf files     Running tests_workspace.pub.raw_inapp_events (./models/raw_inapp_events.sql)     Testing tests_workspace.pub.test_raw_inapp_events (./sdftarget/dbg/tests/tests_workspace/pub/test_raw_inapp_events.sql)    Finished 1 model [1 succeeded], 1 test [1 passed] in 0.757 secs [Pass] Test tests_workspace.pub.test_raw_inapp_events

4

Inspect the Test

This step is not mandatory, but sometimes it can be helpful to view the compiled test query. To access the query, navigate to the file located in sdftarget/dbg/preprocessed/sdftarget/dbg/tests/tests_workspace/pub/test_raw_inapp_events.sql.

The file will hold the compiled tests query and will look like this:

WITH raw_inapp_events_constraints AS (     SELECT         CASE WHEN COUNT(DISTINCT event_id) != COUNT(event_id)      THEN       ‘err: column event_id is NOT unique’ ELSE NULL END AS event_id_1         ,         CASE WHEN COUNT(CASE WHEN event_id IS NULL THEN 1 ELSE NULL END) > 0      THEN       ‘err: column event_id has NULL values’ ELSE NULL END AS event_id_2         ,         CASE WHEN COUNT(CASE WHEN event_name NOT IN (‘install’, ‘add_to_cart’, ‘go_to_checkout’, ‘place_order’) THEN 1 ELSE NULL END) > 0      THEN       ‘wrn: column event_name has values outside accepted values ”install”, ”add_to_cart”, ”go_to_checkout”, ”place_order''' ELSE NULL END AS event_name_1         ,         CASE WHEN NOT(MIN(event_value) >= 0) or  MAX(event_value) > 1000 THEN  ‘wrn: column event_value has values outside of 0..1000’ ELSE NULL END AS event_value_1     FROM tests_workspace.pub.raw_inapp_events )

SELECT reason  FROM (         SELECT event_id_1 as reason FROM raw_inapp_events_constraints WHERE event_id_1 IS NOT NULL     UNION ALL             SELECT event_id_2 as reason FROM raw_inapp_events_constraints WHERE event_id_2 IS NOT NULL     UNION ALL              SELECT event_name_1 as reason FROM raw_inapp_events_constraints WHERE event_name_1 IS NOT NULL     UNION ALL              SELECT event_value_1 as reason FROM raw_inapp_events_constraints WHERE event_value_1 IS NOT NULL      ) AS _combined_errors ORDER BY reason;

5

Fail a Test

Let’s see how it’ll look like if the test fails. In src_metadata/raw_inapp_events.sdf.yml, edit the test associated with the column event_value and reduce the maximum in the range to be 10 instead of 1000:

...
- name: event_value
  description: >
    Any numeric value associated with the event. 
    For example, upon placing an order, the event value would be the total order amount in USD.
  # Uncomment the following tests:
  tests:
    - expect: between(0, 10)  <-- Change here - from 1000 to 10
      severity: warning
...

Now let’s run the test again:

sdf test 

warning: SDF2001: tests_workspace.pub.raw_inapp_events: attempting to execute Trino, local execution is currently experimental   —> models/raw_inapp_events.sql:1

… and 1 more warnings not shown

Add a Generic Test

SDF supports the ability to turn any SQL query into a data test. This is useful for more complex tests that are not covered by the standard library.

1

Create a New Workspace

Let’s start by creating a new workspace called generic_test_example

sdf new generic_test_example
2

Write a Generic Test

Next, let’s write a SQL query that will represent our generic test. For our example, we’ll write the most basic of tests which will always fail (since it will always return data).

Create a new file generic_test.sql in a new directory called tests:

tests/generic_test.sql
select 1 as col; 
3

Configure Generic Tests

Next, let’s update the workspace.sdf.yml to include our new generic test. To do so, add - path: tests to your workspace.sdf.yml includes block like so:

workspace.sdf.yml
workspace:
  name: hello
  edition: "1.3"
  description: "A minimal workspace"

  includes:
    - path: models
    - path: tests

The - path: tests includes item does not need the type: test property since naming your directory tests defaults the include type to test. For a list of all default include types, check out our includes block documentation

4

Run the Test

Now that we have a our new generic tests added, let’s trying running it! Note that we are expecting this test to fail, as the SQL query we wrote will always return data.

In the terminal, run:

sdf test

warning: SDF2001: hello.pub.generic_test: attempting to execute Trino, local execution is currently experimental   —> tests/generic_test.sql:1

Great! It’s the first time in your life you’ve been happy to see a test fail.

Add Custom Tests to the Testing Library

SDF supports an easy way to write custom tests that will utilize our built-in testing infrastructure to run custom testing macros.

1

Setup

In our workspace, let’s create a new folder for our custom test macros. We can call it custom_tests.

The new directory will look like this:

    Cleaned sdftarget    Finished clean in 0.215 secs . ├── custom_tests ├── models │   └── raw_inapp_events.sql ├── seeds │   └── inapp_events.csv ├── src_metadata │   └── raw_inapp_events.sdf.yml └── workspace.sdf.yml

4 directories, 4 files

We also need to include the new folder in our workspace.sdf.yml file:

    workspace:
      name: tests_workspace
        ...
        include:
            - path: custom_tests
              type: metadata
        ...

Now we are ready to add some custom tests!

2

Create a Custom Test

To extend SDF’s built-in testing library, add a jinja file named my_test.jinja with the following custom test in the newly created folder:

custom_tests/my_test.jinja
{% macro custom_column_test(severity, column_name) -%}
    COUNT(CASE WHEN {{ column_name }} > 1000 THEN 1 ELSE NULL END) > 0 
    ==> 
    '{{severity}}: column {{ column_name }} is over 1000"' 
{%- endmacro %}

In order to use SDF’s testing infrastructure, we need to maintain a similar macro signature as in the official library.

3

Try It!

First, we need to add the test to our table yml file, just like we did with SDF’s built-in tests. In the file src_metadata/raw_inapp_events.sdf.yml, add our new custom test to the column event_name:

src_metadata/raw_inapp_events.sdf.yml
table:
    ...
    columns:
        - name: index
            description: row_number
            tests:
                - expect: tests_workspace.custom_column_test()  <-- this is our new test!
                  severity: warning
    ... 

Unlike with regular tests, to call a custom test we need to use the syntax {sdf_workspace_name}.{custom_test_name}. In the example above, the workspace name is tests_workspace and the custom_test_name is custom_column_test.

Let’s run the test! In your terminal, execute:

sdf test

The test passed!

Working set 1 model file, 1 test file, 2 .sdf files     Running tests_workspace.pub.raw_inapp_events (./models/raw_inapp_events.sql)     Testing tests_workspace.pub.test_raw_inapp_events (./sdftarget/dbg/tests/tests_workspace/pub/test_raw_inapp_events.sql)    Finished 1 model [1 succeeded], 1 test [1 passed] in 0.754 secs [Pass] Test tests_workspace.pub.test_raw_inapp_events

We can look at the compiled tests query located in the file sdftarget/dbg/preprocessed/sdftarget/dbg/tests/tests_workspace/pub/test_raw_inapp_events.sql. Notice our custom logic in the compiled query:

WITH raw_inapp_events_constraints AS (     SELECT         CASE WHEN COUNT(CASE WHEN index > 1000 THEN 1 ELSE NULL END) > 0      THEN       ‘err: column index is over 1000”’ ELSE NULL END AS index_1     FROM tests_workspace.pub.raw_inapp_events )

SELECT reason  FROM (         SELECT index_1 as reason FROM raw_inapp_events_constraints WHERE index_1 IS NOT NULL      ) AS _combined_errors ORDER BY reason;