One of the beloved parts of software engineering is how developers have managed to keep their development workflows contained to the terminal. Terminal workflows allow automation via stdin, stdout, and stderr, and cross-compatibility with virtually all operating systems.

As such, SDF is designed to be a command line tool and can ideally keep the large majority of your data development contained to the terminal (and your IDE of choice). For you vim and emacs legends of the world: we want you to develop data workflows in your natural habitat - the terminal. Automate & integrate.

As a byproduct, we take output seriously. This guide will take you through how to configure your terminal environment to get the most out of SDF.

Simply put, SDF allows you to specify

  • display formats, as part of CLI options for command line output
  • input files (format and location), as part of Data Providers
  • output files (format and location), as part of Providers, Data Providers, and Materialization options
  • table schemas through inference, files, or Providers

Display Options

SDF writes progress or results on stdout. It provides the developer with a variety of logging and output options to enable scripting, debugging, and other use cases.

Command-line options

Inputs / Selectors

All SDF SQL processing commands take a list of targets as input.

For example, the compile and run commands can take targets in multiple formats:

  • sdf compile catalog.schema.table_name using fully qualified name
  • sdf compile table_name using table name and inferred fully qualified name from the default catalog and schema
  • sdf compile /path/to/table_name.sql using file path
  • sdf compile catalog.*.* using glob pattern to compile multiple targets
  • sdf compile [No params] using no params compiles the whole workspace
  • sdf compile /path/to/workspace enables compiling a workspace from outside the workspace directory
  • sdf run -q "SELECT * FROM table_name" using a query string

Running SDF commands from the workspace directory, adjacent to the workspace.sdf.yml file, is best practice as error messages and logs will use the relative path from the workspace root in their output.

Outputs

There are 4 commandline options to control the output.

  • --show (all | progress | result | none) - Print desired amount of information, i.e., produced tables.

    • all prints progress, results, and any warnings and errors
    • progress prints the progress of files as they are compiled
    • result prints the tables parsed and the schema assocaited
    • none prints nothing. This will also prevent the Finished in [n] seconds message from displaying at the end of stdout.
  • --format (table | csv | tsv | json | nd-json | yml) - Specifies the table representation on stdout.

    • table (default) is the normal ascii table representation.
    • csv is a CSV representation
    • tsv is a TSV representation
    • json is a JSON representation
    • nd-json is an ndJSON representation
    • yml, is YML representation
  • --limit <number> - Limits the number of shown rows. Run with --limit 0 to show all rows.

Example

Let’s try these options. Start by creating a new SDF workspace with a sample project or change directories into an existing one. The tutorial below uses the lineage sample.

1

Install and setup your workspace

To start with a sample workspace lineage ‘new —sample lineage’ will generate a new workspace folder to walk through this example.

	sdf new --sample lineage

After running the command, you will see the following output:

    Created lineage/checks/check_sink_phone_is_pii.sql     Created lineage/models/knis.sql     Created lineage/models/middle.sql     Created lineage/models/sink.sql     Created lineage/models/source.sql     Created lineage/workspace.sdf.yml    Finished new in 0.102 secs

2

Change Directory

Switch directories into the newly created lineage workspace.

cd lineage/
3

Compile lineage Workspace with --show command

Compile the workspace and review the output.

sdf compile

Reviewing this output, multiple files have been compiled and SDF has statically analyzed the queries.

Working set 4 model files, 1 .sdf file   Compiling lineage.pub.source (./models/source.sql)   Compiling lineage.pub.middle (./models/middle.sql)   Compiling lineage.pub.knis (./models/knis.sql)   Compiling lineage.pub.sink (./models/sink.sql)    Finished 4 models [4 succeeded] in 0.616 secs

Add the --show all command to display all information after compile

sdf compile --show all

Reviewing this output, SDF provides progress and table information. SDF reuses model files that have already been compiled and have not been edited. Alternatively try compiling with various output options:

sdf compile --show [all, none, progress, result]

For execution - you may also modulate table output formats. Let’s take the lineage.pub.middle table.

sdf run lineage.pub.middle --format [nd-json, csv, tsv, yml, table]
4

Display for Individual Tables

To select the output of only one table, use the table name directly after the compile command.

sdf compile models/sink.sql

The compile command can take targets in multiple formats.

  • sdf compile catalog.schema.table_name using fully qualified name
  • sdf compile table_name using table name and inferred fully qualified name
  • sdf compile /path/to/table_name.sql using file path
  • sdf compile catalog.*.* using glob pattern to compile multiple targets
  • sdf compile [No params] using no params compiles the whole workspace

Schema

The schema of an external table must be given explicitly, following the ANSI standard SQL syntax or

External tables are tables that load data from external sources.

An example for an explicit table definition looks like this

create external table events (
    dt date not null,
    action varchar not null,
    user_id smallint not null)
    ...

Besides column names, types and nullability, SDF schema definitions have no other schema descriptions, no foreign keys, no constraints.