Display and IO
One of the beloved parts of software engineering is how developers have managed to keep their development workflows contained to the terminal. Terminal workflows allow automation via stdin, stdout, and stderr, and cross-compatibility with virtually all operating systems.
As such, SDF is designed to be a command line tool and can ideally keep the large majority of your data development contained to the terminal (and your IDE of choice). For you vim and emacs legends of the world: we want you to develop data workflows in your natural habitat - the terminal. Automate & integrate.
As a byproduct, we take output seriously. This guide will take you through how to configure your terminal environment to get the most out of SDF.
Simply put, SDF allows you to specify
- display formats, as part of CLI options for command line output
- input files (format and location), as part of Data Providers
- output files (format and location), as part of Providers, Data Providers, and Materialization options
- table schemas through inference, files, or Providers
Display Options
SDF writes progress or results on stdout. It provides the developer with a variety of logging and output options to enable scripting, debugging, and other use cases.
Command-line options
Inputs / Selectors
All SDF SQL processing commands take a list of targets as input.
For example, the compile
and run
commands can take targets in multiple formats:
sdf compile catalog.schema.table_name
using fully qualified namesdf compile table_name
using table name and inferred fully qualified name from the default catalog and schemasdf compile /path/to/table_name.sql
using file pathsdf compile catalog.*.*
using glob pattern to compile multiple targetssdf compile [No params]
using no params compiles the whole workspacesdf compile /path/to/workspace
enables compiling a workspace from outside the workspace directorysdf run -q "SELECT * FROM table_name"
using a query string
Running SDF commands from the workspace directory, adjacent to the workspace.sdf.yml
file, is best practice as error messages and logs will use the relative path from the workspace root in their output.
Outputs
There are 4 commandline options to control the output.
-
--show (all | progress | result | none)
- Print desired amount of information, i.e., produced tables.all
prints progress, results, and any warnings and errorsprogress
prints the progress of files as they are compiledresult
prints the tables parsed and the schema assocaitednone
prints nothing. This will also prevent theFinished in [n] seconds
message from displaying at the end of stdout.
-
--format (table | csv | tsv | json | nd-json | yml)
- Specifies the table representation on stdout.table
(default) is the normal ascii table representation.csv
is a CSV representationtsv
is a TSV representationjson
is a JSON representationnd-json
is an ndJSON representationyml
, is YML representation
-
--limit <number>
- Limits the number of shown rows. Run with--limit 0
to show all rows.
Example
Let’s try these options. Start by creating a new SDF workspace with a sample project or change directories into an existing one. The tutorial below uses the lineage sample.
Install and setup your workspace
To start with a sample workspace lineage ‘new —sample lineage’ will generate a new workspace folder to walk through this example.
sdf new --sample lineage
After running the command, you will see the following output:
Created lineage/checks/check_sink_phone_is_pii.sql
Created lineage/models/knis.sql
Created lineage/models/middle.sql
Created lineage/models/sink.sql
Created lineage/models/source.sql
Created lineage/workspace.sdf.yml
Welcome to your new SDF Workspace! To help you on your journey:
💡 Join the SDF Community Slack -> https://join.slack.com/t/sdfcommunity/shared_invite/zt-2lkokt7ed-xIR0_pcmUOU3~CebxuqeKg
📚 Read the Docs to Get Started -> https://docs.sdf.com/
Finished new in 0.291 secs
Change Directory
Switch directories into the newly created lineage workspace.
cd lineage/
Compile lineage Workspace with --show command
Compile the workspace and review the output.
sdf compile
Reviewing this output, multiple files have been compiled and SDF has statically analyzed the queries.
Working set 4 model files, 1 .sdf file
Compiling lineage.pub.source (./models/source.sql)
Compiling lineage.pub.middle (./models/middle.sql)
Compiling lineage.pub.sink (./models/sink.sql)
Compiling lineage.pub.knis (./models/knis.sql)
Finished 4 models [4 succeeded] in 0.798 secs
Add the --show all
command to display all information after compile
sdf compile --show all
Reviewing this output, SDF provides progress and table information. SDF reuses model files that have already been compiled and have not been edited. Alternatively try compiling with various output options:
sdf compile --show [all, none, progress, result]
For execution - you may also modulate table output formats. Let’s take the lineage.pub.middle
table.
sdf run lineage.pub.middle --format [nd-json, csv, tsv, yml, table]
Display for Individual Tables
To select the output of only one table, use the table name directly after the compile
command.
sdf compile models/sink.sql
The compile
command can take targets in multiple formats.
sdf compile catalog.schema.table_name
using fully qualified namesdf compile table_name
using table name and inferred fully qualified namesdf compile /path/to/table_name.sql
using file pathsdf compile catalog.*.*
using glob pattern to compile multiple targetssdf compile [No params]
using no params compiles the whole workspace
Schema
The schema
of an external table must be given explicitly, following the ANSI standard SQL syntax or
External tables are tables that load data from external sources.
An example for an explicit table definition looks like this
create external table events (
dt date not null,
action varchar not null,
user_id smallint not null)
...
Besides column names, types and nullability, SDF schema definitions have no other schema descriptions, no foreign keys, no constraints.