Tests

Understanding Tests

Tests are a commonly used data quality feature allowing you to scan results of queries and set expectations of your data. You can validate that a particular column is never null, or within a certain thresholding range, between a certain set of values, and more.

SDF provides a standard testing library. The functions provided in the library are included natively in SDF.

All SDF Tests for all columns are compiled into table scans. This allows SDF to run multiple tests with a single scan. This dramatically reduces the cost of data tests.

An SDF data test is a simple assertion test that runs a SQL query against your database and, if any data is returned, fails. The simplest example might be a non-null test on a column.

This would look like:

non_null.sql
select * from table where column is null

When sdf test runs this query, it will check if any data is returned and fail if so.

Working set 1 model file, 1 test file, 1 .sdf file
    Running hello.pub.main (./models/main.sql)
    Testing hello.pub.test_main (./sdftarget/dbg/tests/hello/pub/test_main.sql)
   Finished 1 model [1 succeeded], 1 test [1 passed] in 0.849 secs
[Pass] Test hello.pub.test_main

There are three types of tests builtin.

Scalar Column Tests - validate expectations of individual values in columns (eg. x > 100 )
Aggregate Column Tests - validate expectations of all values in a columns (eg. sum(x) > 100 )
Table Tests - validate expectations for all columns in a table (eg. columns a and b are unique).

Severity

Each test may be configured with a severity flag. By default all tests have a severity of error meaning that a constraint violation will result in a non-zero exit code for sdf.

Tests may have three results, with text highlighting providing an indication of the output.

Test Result	Meaning
🟡	Warning
🔴	Error

Using a Test

Tests are added to columns in table blocks. Each column has ‘constraints’ and test functions should be expressed as the expectation you have of the column.

Test libraries are written as SDF workspaces that are imported by your workspace.

Below, we add a simple aggregate uniqueness test, and a scalar null values test to a cool_column.

table:
  name: a_really_great_table
  columns:
    - name: cool_column
      tests:
      - expect: unique()
      - expect: not_null()
        severity: warning

SDF’s tests library is an out-of-the-box optimized testing mechanism. All tests associated with a table will be run within one query.

One query —> one table scan —> significant compute savings compare to other data quality solutions.

SDF Standard Library Tests

Column Tests

Test Name	Type	Description
`not_null()`	Scalar	Column elements must not be null
`valid_scalar(condition)`	Scalar	Column elements all meet scalar condition
`valid_aggregate(condition)`	Aggregate	Column elements all meet aggregate condition
`unique()`	Aggregate	Column elements must be unique
`in_accepted_values([values])`	Aggregate	Column values are all elements of array
`minimum(value)`	Aggregate	Column can not have values less than value
`maximum(value)`	Aggregate	Column can not have values greater than value
`exclusive_minimum(value)`	Aggregate	Column can not have values less or equal to value
`exclusive_maximum(value)`	Aggregate	Column can not have values greater than or equal to value
`between(lower, upper)`	Aggregate	Column values must be between lower and upper
`max_length(value)`	Aggregate	Column values (as strings) are greater than a given length
`min_length(value)`	Aggregate	Column values (as strings) are less than a given length
`like(string)`	Aggregate	Column elements must contain string
`try_cast(type)`	Aggregate	Column elements must include string
`primary_key(column)`	Aggregate	Column is primary key of column

Table Tests

Test Name	Type	Description
`unique_columns([c1, c2])`	Table	Table with columns `c1` and `c2` has unique values across these two columns

The full repo of sdf_tests, our open-source tests library, is is available on GitHub here.

Add Your First Tests

Prerequisite

SDF’s built-in tests library sdf_tests uses Jinja macros. Therefore, setting Jinja as a preprocessor is required.

Create the sample workspace tests:

sdf new --sample tests

In the file workspace.sdf.yml, we have Jinja set as the default preprocessor:

workspace.sdf.yml
workspace:
  name: Tests workspace
  ...
  defaults: 
    preprocessor: jinja

Add Tests

Open the table metadata file src_metadata/raw_inapp_events.sdf.yml. In the file, find the three commented out tests and uncomment them. They should look like this:

Run Tests

Run the tests. In the terminal, run:

sdf test

Working set 1 model file, 1 test file, 2 .sdf files
    Running tests_workspace.pub.raw_inapp_events (./models/raw_inapp_events.sql)
    Testing tests_workspace.pub.test_raw_inapp_events (./sdftarget/dbg/tests/tests_workspace/pub/test_raw_inapp_events.sql)
   Finished 1 model [1 succeeded], 1 test [1 passed] in 0.904 secs
[Pass] Test tests_workspace.pub.test_raw_inapp_events

Inspect the Test

This step is not mandatory, but sometimes it can be helpful to view the compiled test query. To access the query, navigate to the file located in sdftarget/dbg/preprocessed/tests_workspace/pub/test_raw_inapp_events.sql.

The file will hold the compiled tests query and will look like this:

WITH
raw_inapp_events_constraints AS (
    SELECT
        CASE WHEN COUNT(DISTINCT event_id) != COUNT(event_id) 
    THEN  
    ‘err: column event_id is NOT unique’ ELSE NULL END AS event_id_1
        ,
        CASE WHEN COUNT(CASE WHEN event_id IS NULL THEN 1 ELSE NULL END) > 0 
    THEN  
    ‘err: column event_id has NULL values’ ELSE NULL END AS event_id_2
        ,
        CASE WHEN COUNT(CASE WHEN event_name NOT IN (‘install’, ‘add_to_cart’, ‘go_to_checkout’, ‘place_order’) THEN 1 ELSE NULL END) > 0 
    THEN  
    ‘wrn: column event_name has values outside accepted values ”install”, ”add_to_cart”, ”go_to_checkout”, ”place_order''' ELSE NULL END AS event_name_1
        ,
        CASE WHEN NOT(MIN(event_value) >= 0) or  MAX(event_value) > 1000
THEN 
‘wrn: column event_value has values outside of 0..1000’ ELSE NULL END AS event_value_1
    FROM tests_workspace.pub.raw_inapp_events
)
SELECT reason 
FROM (
   
    SELECT event_id_1 as reason FROM raw_inapp_events_constraints WHERE event_id_1 IS NOT NULL
    UNION ALL
       
    SELECT event_id_2 as reason FROM raw_inapp_events_constraints WHERE event_id_2 IS NOT NULL
    UNION ALL 
       
    SELECT event_name_1 as reason FROM raw_inapp_events_constraints WHERE event_name_1 IS NOT NULL
    UNION ALL 
       
    SELECT event_value_1 as reason FROM raw_inapp_events_constraints WHERE event_value_1 IS NOT NULL 
    ) AS _combined_errors
ORDER BY reason;

Fail a Test

Let’s see how it’ll look like if the test fails. In src_metadata/raw_inapp_events.sdf.yml, edit the test associated with the column event_value and reduce the maximum in the range to be 10 instead of 1000:

...
- name: event_value
  description: >
    Any numeric value associated with the event. 
    For example, upon placing an order, the event value would be the total order amount in USD.
  # Uncomment the following tests:
  tests:
    - expect: between(0, 10)  <-- Change here - from 1000 to 10
      severity: warning
...

Now let’s run the test again:

sdf test

warning: SDF2001: tests_workspace.pub.raw_inapp_events: attempting to execute Trino, local execution is currently experimental
  —> models/raw_inapp_events.sql:1:1
… and 1 more warnings not shown

Add a Generic Test

SDF supports the ability to turn any SQL query into a data test. This is useful for more complex tests that are not covered by the standard library.

Create a New Workspace

Let’s start by creating a new workspace called generic_test_example

sdf new generic_test_example

Write a Generic Test

Next, let’s write a SQL query that will represent our generic test. For our example, we’ll write the most basic of tests which will always fail (since it will always return data).

Create a new file generic_test.sql in a new directory called tests:

tests/generic_test.sql
select 1 as col; 

Configure Generic Tests

Next, let’s update the workspace.sdf.yml to include our new generic test. To do so, add - path: tests to your workspace.sdf.yml includes block like so:

workspace.sdf.yml
workspace:
  name: hello
  edition: "1.3"
  description: "A minimal workspace"

  includes:
    - path: models
    - path: tests

The - path: tests includes item does not need the type: test property since naming your directory tests defaults the include type to test. For a list of all default include types, check out our includes block documentation

Run the Test

Now that we have a our new generic tests added, let’s trying running it! Note that we are expecting this test to fail, as the SQL query we wrote will always return data.

In the terminal, run:

sdf test

warning: SDF2001: hello.pub.generic_test: attempting to execute Trino, local execution is currently experimental
  —> tests/generic_test.sql:1:1

Great! It’s the first time in your life you’ve been happy to see a test fail.

Add Custom Tests to the Testing Library

SDF supports an easy way to write custom tests that will utilize our built-in testing infrastructure to run custom testing macros.

Setup

In our workspace, let’s create a new folder for our custom test macros. We can call it custom_tests.

The new directory will look like this:

    Cleaned sdftarget
   Finished clean in 0.300 secs
.
├── custom_tests
├── models
│   └── raw_inapp_events.sql
├── seeds
│   └── inapp_events.csv
├── src_metadata
│   └── raw_inapp_events.sdf.yml
└── workspace.sdf.yml
4 directories, 4 files

We also need to include the new folder in our workspace.sdf.yml file:

    workspace:
      name: tests_workspace
        ...
        include:
            - path: custom_tests
              type: metadata
        ...

Now we are ready to add some custom tests!

Create a Custom Test

To extend SDF’s built-in testing library, add a jinja file named my_test.jinja with the following custom test in the newly created folder:

custom_tests/my_test.jinja
{% macro custom_column_test(severity, column_name) -%}
    COUNT(CASE WHEN {{ column_name }} > 1000 THEN 1 ELSE NULL END) > 0 
    ==> 
    '{{severity}}: column {{ column_name }} is over 1000"' 
{%- endmacro %}

In order to use SDF’s testing infrastructure, we need to maintain a similar macro signature as in the official library.

Try It!

First, we need to add the test to our table yml file, just like we did with SDF’s built-in tests. In the file src_metadata/raw_inapp_events.sdf.yml, add our new custom test to the column event_name:

src_metadata/raw_inapp_events.sdf.yml
table:
    ...
    columns:
        - name: index
            description: row_number
            tests:
                - expect: tests_workspace.custom_column_test()  <-- this is our new test!
                  severity: warning
    ... 

Unlike with regular tests, to call a custom test we need to use the syntax {sdf_workspace_name}.{custom_test_name}. In the example above, the workspace name is tests_workspace and the custom_test_name is custom_column_test.

Let’s run the test! In your terminal, execute:

sdf test

The test passed!

/bin/sh: 1: sdf: not found

We can look at the compiled tests query located in the file sdftarget/dbg/preprocessed/tests_workspace/pub/test_raw_inapp_events.sql. Notice our custom logic in the compiled query:

cat: sdftarget/dbg/preprocessed/tests_workspace/pub/test_raw_inapp_events.sql: No such file or directory

Introduction

Tutorials

Guides

Integrations

Understanding Tests

Severity

Using a Test

SDF Standard Library Tests

Column Tests

Table Tests

Add Your First Tests

Add a Generic Test

Add Custom Tests to the Testing Library

Introduction

Tutorials

Guides

Integrations

​Understanding Tests

​Severity

​Using a Test

​SDF Standard Library Tests

​Column Tests

​Table Tests

​Add Your First Tests

​Add a Generic Test

​Add Custom Tests to the Testing Library

Understanding Tests

Severity

Using a Test

SDF Standard Library Tests

Column Tests

Table Tests

Add Your First Tests

Add a Generic Test

Add Custom Tests to the Testing Library