Connectors in Hybrid

Integrate Gretel with your existing data services using Workflows in a hybrid environment

Gretel Workflows provide an easy to use, config driven API for automating and operationalizing Gretel. Using Workflows, you can connect to various data sources such as S3 or MySQL and schedule recurring jobs to make it easy to secure and share data across your organization.

Workflows are composed of many Workflow Actions. Each Workflow Action is responsible for integrating with some service and performing some processing on its set of inputs and/or producing outputs. These services could be external data stores (e.g. for reading source data or writing synthetic data), or Gretel (e.g. for training and running models).

Connections are used to authenticate a Gretel Action to an external service such as GCS or Snowflake. Each action is tied to at most one external service, and needs to be configured with a connection for the appropriate service.

For more information about Gretel Workflows and Connectors, please see Gretel Workflows. For reference documentation covering the different connector types see Connectors.

How Hybrid Connectors Work

When Gretel Hybrid is deployed an encryption key is created in AWS KMS, Azure Key Vault, or GCP KMS (depending on your cloud provider). This key is used to encrypt your connection credentials. The encrypted credentials are passed to the Gretel API when a hybrid connection is created. When a workflow run is scheduled within your Gretel Hybrid deployment, the Kubernetes pod responsible for interacting with your data source will retrieve the encrypted connection credentials and then use your cloud provider's SDK to decrypt them.

Gretel's control plane does not have access to encrypt or decrypt data with this encryption key and unencrypted credentials will never be passed to the Gretel API. The only identity which may access your encryption key and decrypt credentials is the IAM Role associated with your Gretel Workflow Worker pods.

Creating Connections

Prerequisites

Gretel Client Installation

Install and configure the Gretel CLI following our guide here. Be sure you install the hybrid client dependencies for your cloud provider and configure Gretel authentication as outlined in that guide.

Cloud Provider Authentication

Within the previously mentioned Gretel Client installation guide there is a specific section covering cloud provider authentication. Make sure your CLI or SDK environment is set up to authenticate with your cloud provider.

CLI Walkthrough

Step 1 - Create a JSON file with connection configuration

Each individual connector type has a specific configuration schema defined. Refer to the connector documentation for information on the connector type you wish to create. In this example we are creating a MySQL connector and the below JSON snippet was copied directly from the documentation.

Create a local JSON file with the connection configuration. For this example the file is named hybrid-connector.json. Customize the configuration parameters as required for the data source you are connecting to. In the example below we would need to customize the parameters in the "config" section and we would also need to set the password in the "credentials" section.

These sensitive credentials will be encrypted before being sent to the Gretel API and Gretel's control plane will not be able to decrypt these credentials. Be sure you clean up this file after following along with this guide. This will be covered in an explicit step after we finish creating our connection.

hybrid-connector.json

{
  "type": "mssql",
  "name": "my-mssql-connection",
  "config": {
    "username": "john",
    "host": "myserver.example.com",
    "port": 1443,
    "database": "mydatabase",
    "schema": "dbo",
    "params": "TrustServerCertificate=True"
  },
  "credentials": {
    "password": "your-password"
  }
}

Step 2 - Create a Gretel Project

Create a Gretel Project which will contain the connector we're going to create. Anyone that you share the project with will have access to use the connector in any of their own existing Gretel Projects. We'll use the --set-default flag so that we don't have to pass the project as an input when creating the connection in the following step. For more information about sharing connections with other users please see the section covering Connection Sharing.

gretel projects create --name "Gretel-Hybrid-Connections" --display-name "Gretel Hybrid Connections" --set-default

Step 3 - Create the Connection

Create the connection using the below command. Your credentials will be encrypted in memory using your cloud provider's Python SDK before the connection details are sent to the Gretel API.

Please be sure your KMS Key ARN points to the key provisioned during the Gretel Hybrid deployment process. You can retrieve the Key ARN using the AWS Console or the AWS CLI.

gretel connections create --from-file hybrid-connector.json --aws-kms-key-arn "arn:aws:kms:us-west-2:012345678912:key/12345678-726d-4cd9-ab8a-123456789012"

Step 4 - Clean up your sensitive credentials

Now that the Gretel Connection has been created it may be referenced and used with Gretel Workflows. You should clean up your sensitive data by editing the JSON file and redacting the password, or by deleting the file entirely.

Example 1 - Redact credentials

Redact the credentials in case you may need to refer back to the connection configuration for future reference.

{
  "type": "mssql",
  "name": "my-mssql-connection",
  "config": {
    "username": "john",
    "host": "myserver.example.com",
    "port": 1443,
    "database": "mydatabase",
    "schema": "dbo",
    "params": "TrustServerCertificate=True"
  },
  "credentials": {
    "password": "REDACTED"
  }
}

Example 2 - Delete the file.

rm -f hybrid-connector.json

SDK Walkthrough

import getpass

from gretel_client import (
    aws_hybrid,
    configure_hybrid_session,
    create_or_get_unique_project,
)
from gretel_client.config import get_session_config
from gretel_client.rest_v1.api.connections_api import ConnectionsApi
from gretel_client.rest_v1.models import (
    CreateConnectionRequest,
    UpdateConnectionRequest,
)

# The vault and key will be created as part of the terraform setup
# for the AWS hybrid install
# of the form arn:aws:kms:us-east-1:123456789010:key/aaaaaaaa-1234-1234-1234-aaaaaaaaaaaa
KEY_ARN = "..."

creds_encryption = aws_hybrid.KMSEncryption(KEY_ARN)

# The user who is associated with your hybrid deployment
DEPLOYMENT_USER = "...@email.com"

# This sets up a hybrid session, using our previously created credentials encrypter
# and the configured deployment user. All SDK functions that don't receive an explicit
# session will use this hybrid session.
configure_hybrid_session(
    api_key=getpass.getpass("Enter API Key:"),
    creds_encryption=creds_encryption,
    deployment_user=DEPLOYMENT_USER,
)

session = get_session_config()
connection_api = session.get_v1_api(ConnectionsApi)

project = create_or_get_unique_project(name="workflow-testing")

connection = connection_api.create_connection(
    CreateConnectionRequest(
        name="my-s3-conn",
        project_id=project.project_guid,
        type="s3",
        # note: best practice is to read in credentials from a file
        # or secret instead of directly embedding sensitive values
        # in python code.
        credentials={
            "access_key_id": "...",
            "secret_access_key": "...",
        },
    )
)

Managing Connections

Last updated