Getting Started with BigQuery and SDF

Overview

SDF can hydrate table schemas and execute queries against a BigQuery DB. This guide will walk you through the steps of creating a BigQuery integration. By the end, you will be able to reference tables in your BigQuery instance in SDF.

SDF will use remote BigQuery schema information to do type bindings and other static analysis checks on your workspace.

Prerequisites

In order to connect to BigQuery, you will need to have on hand the following:

A BigQuery account and project with billing set up
Valid credentials with read access to table metadata in your BigQuery account.

Guide

Collect Required Information

To connect to BigQuery, you’ll need to have created a service account, and a corresponding key for the service account. The following values are needed to authenticate SDF to BigQuery:

Project ID - the project ID of the project you want to pull table metadata from
Client Email - the email address of the service account
Private Key - the private key of the service account

The easiest way to extract all the necessary information is to download the service account’s key as JSON, and fill in the values.

To create a service account key and download the JSON, follow this official GCP guide.

Your BigQuery service account will need the following permissions:

Bigquery Admin

Connect SDF to BigQuery

To authenticate SDF to BigQuery, run the following command with the values from the previous step filled in:

sdf auth login bigquery --json-path <PATH_TO_DOWNLOADED_JSON_FILE>

Run sdf auth login bigquery --help to see all login options. If the JSON path does not work or cannot be specific, the values can be copied in as command flags like so:

sdf auth login bigquery --project-id <PROJECT_ID> --client-email <CLIENT_EMAIL> --private-key <PRIVATE_KEY>

After running this command, SDF will validate that it has access to all the necessary permissions for BigQuery with these credentials. A warning will show if SDF does not have the necessary permissions, or if there was a different issue authenticating to GCP.

This will create a new credential in a ~/.sdf/ directory in the root of your system. This credential will be used to authenticate with BigQuery services. By default, the credential’s name is default. As such, the credential does not need to be explicitly referenced in the integrations configuration below.

To validate the connection, run:

sdf auth status

Add BigQuery Provider in Workspace

Now that we’re authenticated, we can configure our workspace.sdf.yml to use BigQuery as a provider.

The first thing we’ll need to do is set the default catalog to our BigQuery project ID. This tells SDF to use this project as the default project for all queries.

workspace.sdf.yml

workspace:
    ...
    defaults:
        catalog: <PROJECT_ID>

Next, let’s add an integration block to the workspace.sdf.yml. This tells SDF to use BigQuery to hydrate missing table schemas.

workspace:
    ...
    integrations:
        - provider: bigquery
          type: database
          sources: 
            - pattern: <PROJECT_ID>.*.*

Replace <PROJECT_ID> with the name of the project you want to pull table metadata from. Note this is configurable and can be changed to any project you have access to. For example, if I wanted SDF to pull from two projects, called proj1 and proj2, I would write:

workspace:
    ...
    integrations:
        - provider: BigQuery
          type: database
          sources: 
            - pattern: proj1.*.*
            - pattern: proj2.*.*

Integrations can also be configured to use specific credentials by referencing the credential name in the integration block. For example, if you wanted to use a credential called bq-creds, you would write:

workspace:
    ...
    integrations:
        - provider: BigQuery
          type: database
          credential: bq-creds
          sources: 
            - pattern: proj1.dataset1.mytable
            - pattern: proj2.dataset2.*

For more information on integration configuration, see the integration documentation.

Try it out!

Now that you’re connected, let’s make sure SDF can pull the schema information it needs.

Run sdf compile -q "select * from <PROJECT_ID>.<DATASET>.<TABLE>" --show all

If the connection is successful, you will see the schema for the table you selected.

SDF is capable of creating projects in BigQuery on sdf run if the project does not already exist. This is useful for creating a new project in BigQuery programmatically on execution of new tables.

If you want to enable this feature, the service account used to authenticate SDF to BigQuery will need to following additional permissions:

Furthermore, this also requires the Cloud Resource Manager API to be enabled. If not already, please enable this API in your GCP project.

The term PROJECT in BigQuery is interchangeable with the term CATALOG in SDF. Similarly, the term DATASET in BigQuery is interchangeable with SCHEMA in SDF.

Introduction

Tutorials

Guides

Integrations

Getting Started with BigQuery and SDF

Overview

Prerequisites

Guide

Introduction

Tutorials

Guides

Integrations

​Overview

​Prerequisites

​Guide

Overview

Prerequisites

Guide