Databricks

Read from and write to Databricks.

Getting Started

Prerequisites to create a Databricks based workflow. You will need

  1. A source Databricks connection.

  2. (optional) A list of tables OR SQL queries.

  3. (optional) A destination Databricks connection.

Create a Connection

A databricks connection is created using the following parameters:

Connection Creation Parameters

name

Display name of your choosing used to identify your connection within Gretel.

my-databricks-connection

server_hostname

Fully qualified domain name (FQDN) used to establish connection to database server.

account_identifier.cloud.databricks.com

http_path

The http path of the cluster.

/sql/1.0/warehouses/foo

personal_access_token

Security credential to authenticate databricks account (36 characters)

dapi....

catalog

Name of catalog to connect to.

MY_CATALOG

schema

Name of schema.

MY_SCHEMA

(optional) params

Optional JDBC URL parameters that can be used for advanced configuration.

role=MY_ROLE

Create a Service Principal & Personal Access Token

In order to generate a personal access token you will first need to create a service principal, and then generate a personal access token for that service account.

Creating Connections

First, create a file on your local computer containing the connection credentials. This file should also include type, name , config, and credentials. The config and credentials fields should contain fields that are specific to the connection being created.

Below is an example Databricks connection:

{
    "type": "databricks",
    "name": "my-databricks-connection",
    "config": {
        "server_hostname": "account_identifier.cloud.databricks.com",
        "http_path": "/sql/1.0/warehouses/foo",
        "catalog": "MY_WAREHOUSE",
        "schema": "MY_SCHEMA",
        "params": "role=MY_ROLE"
    },
    "credentials": {
        "personal_access_token": "dapi..."
    }
}

Now that you've created the credentials file, use the CLI to create the connection

gretel connections create --project [project id] --from-file [credential_file.json]

Permissions

Source Connection Permissions

The Databricks source action requires enough access to read from tables and access schema metadata.

Add the following permissions to the Service Principal that was created above in order to be able to read data.

Databricks Source Permissions

Destination Connection Permissions

The Databricks destination action requires enough permissions to write to the destination schema.

Add the following permissions to the Service Principal that was created above in order to be able to write data.

Databricks Destination Permissions

Last updated

Was this helpful?