June 2023
V0.1.16
This document contains release notes from version 0.1.16 of SDF.
June 1st, 2023
We released a very exciting version of the CLI which contains many updates, bug fixes, and other improvements to SDF, but most importantly, Pyspark support.
With this release one can now run sdf describe
and sdf lineage
on workspaces with pyspark pipelines with the help of the new Pyspark plugin. Here is an example of how this functionality can be unlocked in a workspace definition file:
Here path
contains Python pipeline files with extension .py. The Pyspark plugin relies on the analyze_pyspark_pipeline.py script which accesses the user’s Databricks cluster and downloads the metadata of all the tables in the specified cluster.
This is an alpha release. Known limitations include:
- Classifier propagation is not yet enabled for Pyspark tables
- The analyze_pyspark_pipeline.py script accesses all tables in a catalog sequentially, which is slow. We may parallelize this or allow the user to specify a subset of schemas/databases to access.
What’s New?
- PySpark support 😄
- Introduced support for super types
- Bug fixes and stability improvements
Latest Builds
Architecture | Status | Version | Download |
---|---|---|---|
Linux Intel X86-64 | ✅ | 0.1.16 | Download |
Linux Arm ARM-64 | ✅ | 0.1.16 | Download |
Apple Intel X86-64 | ✅ | 0.1.16 | Download |
Apple Arm AARCh-64 | ✅ | 0.1.16 | Download |