SDF as a best-in-class transformation layer for BigQuery
Collect Required Information
Connect SDF to BigQuery
sdf auth login bigquery --help
to see all login options. If the JSON path does not work or cannot be specific, the values can be copied in
as command flags like so:~/.sdf/
directory in the root of your system. This credential will be used to authenticate with BigQuery services. By default, the
credential’s name is default. As such, the credential does not need to be explicitly referenced in the integrations configuration below.Add BigQuery Provider in Workspace
workspace.sdf.yml
to use BigQuery as a provider.The first thing we’ll need to do is set the default catalog
to our BigQuery project ID. This tells SDF to use this project as the default project for all queries.workspace.sdf.yml
. This tells SDF to use BigQuery to hydrate missing table schemas.<PROJECT_ID>
with the name of the project you want to pull table metadata from. Note this is configurable and can be changed to any project you have access to. For example, if I wanted SDF to pull from two projects, called proj1
and proj2
, I would write:bq-creds
, you would write:Try it out!
sdf compile -q "select * from <PROJECT_ID>.<DATASET>.<TABLE>" --show all
If the connection is successful, you will see the schema for the table you selected.sdf run
if the project does not already exist. This is useful for creating a new project in BigQuery programmatically on execution of new tables.If you want to enable this feature, the service account used to authenticate SDF to BigQuery will need to following additional permissions:Furthermore, this also requires the Cloud Resource Manager API to be enabled. If not already, please enable this API in your GCP project.PROJECT
in BigQuery is interchangeable with the term CATALOG
in SDF. Similarly, the term DATASET
in BigQuery is interchangeable with SCHEMA
in SDF.