LogoLogo
  • Welcome to Gretel!
  • Gretel Basics
    • Getting Started
      • Quickstart
      • Blueprints
      • Use Case Examples
      • Environment Setup
        • Console
        • SDK
      • Projects
      • Inputs and Outputs
      • Gretel Connectors
        • Object Storage
          • Amazon S3
          • Google Cloud Storage
          • Azure Blob
        • Database
          • MySQL
          • PostgreSQL
          • MS SQL Server
          • Oracle Database
        • Data Warehouse
          • Snowflake
          • BigQuery
          • Databricks
        • Gretel Project
    • Release Notes
      • Platform Release Notes
        • May 2025
        • April 2025
        • March 2025
        • February 2025
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
        • July 2024
        • June 2024
      • Console Release Notes
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
      • Python SDKs
  • Create Synthetic Data
    • Gretel Safe Synthetics
      • Transform
        • Reference
        • Examples
        • Supported Entities
      • Synthetics
        • Gretel Tabular Fine-Tuning
        • Gretel Text Fine-Tuning
        • Gretel Tabular GAN
        • Benchmark Report
        • Privacy Protection
      • Evaluate
        • Synthetic Quality & Privacy Report
        • Tips to Improve Synthetic Data Quality
        • Data Privacy 101
      • SDK
    • Gretel Data Designer
      • Getting Started with Data Designer
      • Define your Data Columns
        • Column Types
        • Add Constraints to Columns
        • Custom Model Configurations
        • Upload Files as Seeds
      • Building your Dataset
        • Seeding your Dataset
        • Generating Data
      • Generate Realistic Personal Details
      • Structured Outputs
      • Code Validation
      • Data Evaluation
      • Magic Assistance
      • Using Jinja Templates
  • Gretel Playground [Legacy]
    • Getting Started
    • Prompts Tips & Best Practices
    • FAQ
    • SDK Examples
    • Tutorials
    • Videos
    • Gretel Playground [Legacy] Inference API
    • Batch Job SDK
  • Reference
    • Gretel's Python Client
    • Gretel’s Open Source Synthetic Engine
    • Gretel’s REST API
    • Homepage
    • Model Suites
Powered by GitBook
On this page
  • Getting Started
  • Create a Connection
  • Connection Creation Parameters
  • Create a Service Principal & Personal Access Token
  • Creating Connections
  • Permissions
  • Source Connection Permissions
  • Destination Connection Permissions

Was this helpful?

Export as PDF
  1. Gretel Basics
  2. Getting Started
  3. Gretel Connectors
  4. Data Warehouse

Databricks

Read from and write to Databricks.

Getting Started

Prerequisites to create a Databricks based workflow. You will need

  1. A source Databricks connection.

  2. (optional) A list of tables OR SQL queries.

  3. (optional) A destination Databricks connection.

Do not use your input Databricks connection as an output connector. This action can result in the unintended overwriting of existing data.

Create a Connection

Before creating the Databricks connection on Gretel, please ensure that the compute cluster has been started (i.e. Spark Cluster or SQL Warehouse) to ensure that validation doesn't timeout.

A databricks connection is created using the following parameters:

Connection Creation Parameters

name

Display name of your choosing used to identify your connection within Gretel.

my-databricks-connection

server_hostname

Fully qualified domain name (FQDN) used to establish connection to database server.

account_identifier.cloud.databricks.com

http_path

The http path of the cluster.

/sql/1.0/warehouses/foo

personal_access_token

Security credential to authenticate databricks account (36 characters)

dapi....

catalog

Name of catalog to connect to.

MY_CATALOG

schema

Name of schema.

MY_SCHEMA

(optional) params

Optional JDBC URL parameters that can be used for advanced configuration.

role=MY_ROLE

Create a Service Principal & Personal Access Token

In order to generate a personal access token you will first need to create a service principal, and then generate a personal access token for that service account.

Creating Connections

First, create a file on your local computer containing the connection credentials. This file should also include type, name , config, and credentials. The config and credentials fields should contain fields that are specific to the connection being created.

Below is an example Databricks connection:

{
    "type": "databricks",
    "name": "my-databricks-connection",
    "config": {
        "server_hostname": "account_identifier.cloud.databricks.com",
        "http_path": "/sql/1.0/warehouses/foo",
        "catalog": "MY_WAREHOUSE",
        "schema": "MY_SCHEMA",
        "params": "role=MY_ROLE"
    },
    "credentials": {
        "personal_access_token": "dapi..."
    }
}

Now that you've created the credentials file, use the CLI to create the connection

gretel connections create --project [project id] --from-file [credential_file.json]
  • Click the New Connection button.

  • Step 1, choose the Type for the Connection - Databricks.

  • Step 2, choose the Project for your Connection.

  • Step 3, fill in the credentials and select Add Connection.

from gretel_client import create_or_get_unique_project
from gretel_client.config import get_session_config
from gretel_client.rest_v1.api.connections_api import ConnectionsApi
from gretel_client.rest_v1.models import (
    CreateConnectionRequest,
    UpdateConnectionRequest,
)

session = get_session_config()
connection_api = session.get_v1_api(ConnectionsApi)

project = create_or_get_unique_project(name="databricks-workflow")

connection = connection_api.create_connection(
    CreateConnectionRequest(
        name="my-databricks-connection",
        project_id=project.project_guid,
        type="databricks",
        config={
            "server_hostname": "account_identifier.cloud.databricks.com",
            "http_path": "/sql/1.0/warehouses/foo",
            "catalog": "MY_WAREHOUSE",
            "schema": "MY_SCHEMA",
            "params": "role=MY_ROLE"
        },
        # note: best practice is to read in credentials from a file
        # or secret instead of directly embedding sensitive values
        # in python code.
        credentials={
            "personal_access_token": "dapi...",
        },
    )
)

Permissions

Source Connection Permissions

The Databricks source action requires enough access to read from tables and access schema metadata.

Add the following permissions to the Service Principal that was created above in order to be able to read data.

Destination Connection Permissions

Ensure that the user/server principal is a part of the ownership group for the Destination catalog or schema.

The Databricks destination action requires enough permissions to write to the destination schema.

Add the following permissions to the Service Principal that was created above in order to be able to write data.

PreviousBigQueryNextGretel Project

Last updated 29 days ago

Was this helpful?

Navigate to the using the menu item in the left sidebar.

Connections page
Manage service principalsdatabricks
Logo
Manage service principalsdatabricks
Logo
Step 1, choose the Type of connection
Step 3, fill in credentials
Databricks Source Permissions
Databricks Destination Permissions