Databricks
Read from and write to Databricks.
Getting Started
Prerequisites to create a Databricks based workflow. You will need
A source Databricks connection.
(optional) A list of tables OR SQL queries.
(optional) A destination Databricks connection.
Do not use your input Databricks connection as an output connector. This action can result in the unintended overwriting of existing data.
Create a Connection
Before creating the Databricks connection on Gretel, please ensure that the compute cluster has been started (i.e. Spark Cluster or SQL Warehouse) to ensure that validation doesn't timeout.
A databricks
connection is created using the following parameters:
Connection Creation Parameters
name
Display name of your choosing used to identify your connection within Gretel.
my-databricks-connection
server_hostname
Fully qualified domain name (FQDN) used to establish connection to database server.
account_identifier.cloud.databricks.com
http_path
The http path of the cluster.
/sql/1.0/warehouses/foo
personal_access_token
Security credential to authenticate databricks account (36 characters)
dapi....
catalog
Name of catalog to connect to.
MY_CATALOG
schema
Name of schema.
MY_SCHEMA
(optional) params
Optional JDBC URL parameters that can be used for advanced configuration.
role=MY_ROLE
Create a Service Principal & Personal Access Token
In order to generate a personal access token you will first need to create a service principal, and then generate a personal access token for that service account.
Creating Connections
First, create a file on your local computer containing the connection credentials. This file should also include type
, name
, config
, and credentials
. The config
and credentials
fields should contain fields that are specific to the connection being created.
Below is an example Databricks connection:
Now that you've created the credentials file, use the CLI to create the connection
Permissions
Source Connection Permissions
The Databricks source action requires enough access to read from tables and access schema metadata.
Add the following permissions to the Service Principal that was created above in order to be able to read data.
Destination Connection Permissions
Ensure that the user/server principal is a part of the ownership group for the Destination catalog or schema.
The Databricks destination action requires enough permissions to write to the destination schema.
Add the following permissions to the Service Principal that was created above in order to be able to write data.
Last updated
Was this helpful?