Overview

During compilation, SDF creates a comprehensive Information Schema (IF) which contains metadata about your workspace. This metadata includes basic information such as tables, columns, data types, and descriptions along with more complex metadata like column level lineage, and data classifiers.

This workspace IF can be used to enrich data catalogs, and queried to power Checks. Under the hood, SDF stores the analysis output as Parquet.

Information Schema

The below tables represent the SDF Information Schema. They are always accessible, after any compilation, in the sdftarget under sdftarget/dbg/data/system/information_schema::sdf

You may dynamically query the information schema in any valid SDF workspace.

To query your tables:

sdf run -q "select * from sdf.information_schema.tables" 

To query your columns:

sdf run -q "select * from sdf.information_schema.columns" 

To query your column_lineage:

sdf run -q "select * from sdf.information_schema.column_lineage" 

The information schema tables have the following schemas.

$sdf man information-schema --format markdown

Information Schema DDLs

Below are the DDLs representing these tables in SQL.

—information_schemas.sql

CREATE TABLE sdf.information_schema.tables(     “table_id” varchar not null,     “catalog_name” varchar not null,     “schema_name” varchar not null,     “table_name” varchar not null,     “depends_on” array<varchar>,     “depended_on_by” array<varchar>,     “description” varchar,     “dialect” varchar not null,     “materialization” varchar,     “purpose” varchar,     “origin” varchar,     “classifiers” array<varchar>,     “source_locations” array<varchar>,     “meta” map<varchar not null, varchar> );

CREATE TABLE sdf.information_schema.columns(     “table_id” varchar not null,     “catalog_name” varchar not null,     “schema_name” varchar not null,     “table_name” varchar not null,     “depends_on” array<varchar>,     “depended_on_by” array<varchar>,     “description” varchar,     “dialect” varchar not null,     “materialization” varchar,     “purpose” varchar,     “origin” varchar,     “classifiers” array<varchar>,     “source_locations” array<varchar>,     “meta” map<varchar not null, varchar> );

CREATE TABLE sdf.information_schema.table_lineage(     “from_table_id” varchar,     “to_table_id” varchar,     “lineage_kind” varchar );

CREATE TABLE sdf.information_schema.column_lineage(     “from_table_id” varchar,     “from_column_id” varchar,     “to_table_id” varchar,     “to_column_id” varchar,     “lineage_kind” varchar );