This documents aims to layout how SDF’s powerful lineage capabilities creates detailed and accessible visibility into your data warehouse.
lineage
sample workspace is used. If this workspace is not already
set up, it can be created with an sdf new --sample lineage
command.workspace.sdf.yml
file:
lineage
sample with the following command:
FROM
clause or tables that select from an external source outside of the data warehouse. SDF
supports two types of root tables – those that define their data inline using
fixed values – and those that refer to a data file in an external store (e.g.
collection of S3 blobs) The root table source is an example of the former.
With the above definitions, we can run sdf compile to list the tables and their
schemas:
sdf compile
as shown above,
and perform lineage analysis and label propagation described below.
Visit our integration guides to learn more.
sink.phone
│
│ copy
└──────┐
middle.phone
│
│ copy
└──────┐
source.phone
qty
column of the knis
table
shown above is an example of transform lineage (denoted by mod
):
knis.qty
│
│ mod
└──────┐
middle.qty
│
│ mod
└──────┐
source.qty
SUM
aggregation.
WHERE
, GROUP BY
, or ON
clauses) to influence, but not directly
contribute to, a downstream column.
Here is a full dependency graph for sink.phone
which involves all three kinds
of dependencies with inspect dependencies marked by scan
for brevity.
sink.phone
│
│ copy
├──────┐
│ middle.phone
│ │
│ │ copy
│ ├──────┐
│ │ source.phone
│ │ scan
│ └──────┐
│ source.txn_date
│ source.user_id
│ scan
└──────┐
middle.qty
│
│ mod
├──────┐
│ source.qty
│ scan
└──────┐
source.txn_date
source.user_id
sink.phone
│
│ copy
├──────┐
│ middle.phone
│ │
│ │ copy
│ ├──────┐
│ │ source.phone
│ │ scan
│ └──────┐
│ source.txn_date
│ source.user_id
│ scan
└──────┐
middle.qty
│
│ mod
├──────┐
│ source.qty
│ scan
└──────┐
source.txn_date
source.user_id
sdf lineage
supports the --forward
flag to
display all the downstream dependencies of a given column.
For a deeper dive on this command, check out the
Lineage CLI Command referece.
workspace.sdf.yml
file where we added an instruction to track the
SUM
transformation explicitly.
knis.qty
column will be as follows:
knis.qty
│
│ sum
└──────┐
middle.qty
│
│ sum
└──────┐
source.qty
mod
label. By
specifying a wildcard, the user may instruct SDF to track all function
transformations explicitly.
We’re actively working on this functionality and will notify the community when
it is available.