BigQuery

Read from and write to BigQuery.

Getting Started

Prerequisites to create a BigQuery based workflow. You will need

A source BigQuery connection.
(optional) A list of tables OR SQL queries.
(optional) A destination BigQuery connection.

Do not use your input data warehouse connection as an output connector. This action can result in the unintended overwriting of existing data.

Create a BigQuery Connection

Google BigQuery related actions require creating a bigquery connection. The connection must be configured with the correct permissions for each Gretel Workflow Action.

For specific permissions, please refer to the Minimum Permissions section under each corresponding action.

Gretel bigquery connections require the following fields:

Connection Creation Parameters

Parameter

Description

Example

name

Display name of your choosing used to identify your connection within Gretel.

my-bigquery-connection

connection_target_type

source or destination depending on whether you want to read data from or write data to the connection.

source

project_id

ID of the Google project containing your dataset.

my-project-id

service_account_email

The service account email associated with your private key.

[email protected]

dataset

Name of the dataset to connect to.

my-dataset-name

private_json_key

Private key JSON blob used to authenticate Gretel.

{ "type": "service_account", "project_id": "my-project-id", "private_key_id": "Oabc1def2345678g90123h456789012h34561718", "private_key": "-----BEGIN PRIVATE KEY-----/ ... }

Create a Service Account

In order to generate a private key you will first need to create a service account, and then download the key for that service account.

Use service accounts | BigQuery | Google CloudGoogle Cloud

Configure Bucket IAM Permissions

After the service account has been created, you can attach dataset specific permissions to the service account.

Please see each action's Minimum Permissions section for a list of permissions to attach to the service account.

Introduction to IAM | BigQuery | Google CloudGoogle Cloud

Creating Connections

First, create a file on your local computer containing the connection credentials. This file should also include type, name , config, and credentials. The config and credentials fields should contain fields that are specific to the connection being created.

Below is an example BigQuery connection credential file:

{
    "type": "bigquery",
    "name": "my-bigquery-connection",
    "connection_target_type": "source",
    "config": {
        "project_id": "my-project-id",
        "service_account_email": "[email protected]",
        "dataset": "my-dataset"
    },
    "credentials": {
        "private_key_json": "..."
    }
}

Now that you've created the credentials file, use the CLI to create the connection

gretel connections create --project [project id] --from-file [credential_file.json]

Navigate to the Connections page using the menu item in the left sidebar.
Click the New Connection button.
Step 1, choose the Type for the Connection - Snowflake.
Step 2, choose the Project for your Connection.
Step 3, fill in the credentials and select Add Connection.

from gretel_client import create_or_get_unique_project
from gretel_client.config import get_session_config
from gretel_client.rest_v1.api.connections_api import ConnectionsApi
from gretel_client.rest_v1.models import (
    CreateConnectionRequest,
    UpdateConnectionRequest,
)

session = get_session_config()
connection_api = session.get_v1_api(ConnectionsApi)

project = create_or_get_unique_project(name="bigquery-workflow")

connection = connection_api.create_connection(
    CreateConnectionRequest(
        name="my-bigquery-connection",
        project_id=project.project_guid,
        type="bigquery",
        connection_target_type="source",    # other option: "destination"
        config={
            "dataset": "my-dataset",
            "project_id": "my-project-id",
            "service_account_email": "[email protected]"
        },
        # note: best practice is to read in credentials from a file
        # or secret instead of directly embedding sensitive values
        # in python code.
        credentials={
            "private_key_json": "...",
        },
    )
)

PreviousSnowflake NextDatabricks

Last updated 3 months ago

Was this helpful?