Connectors in Hybrid

Integrate Gretel with your existing data services using Workflows in a hybrid environment

Gretel Workflows provide an easy to use, config driven API for automating and operationalizing Gretel. Using Workflows, you can connect to various data sources such as S3 or MySQL and schedule recurring jobs to make it easy to secure and share data across your organization.

Workflows are composed of many Workflow Actions. Each Workflow Action is responsible for integrating with some service and performing some processing on its set of inputs and/or producing outputs. These services could be external data stores (e.g. for reading source data or writing synthetic data), or Gretel (e.g. for training and running models).

Connections are used to authenticate a Gretel Action to an external service such as GCS or Snowflake. Each action is tied to at most one external service, and needs to be configured with a connection for the appropriate service.

For more information about Gretel Workflows and Connectors, please see Gretel Workflows. For reference documentation covering the different connector types see Connectors.

How Hybrid Connectors Work

When Gretel Hybrid is deployed an encryption key is created in AWS KMS, Azure Key Vault, or GCP KMS (depending on your cloud provider). This key is used to encrypt your connection credentials, or in the case of asymmetric encryption, the public key is used for encryption. The encrypted credentials are passed to the Gretel API when a hybrid connection is created. When a workflow run is scheduled within your Gretel Hybrid deployment, the Kubernetes pod responsible for interacting with your data source will retrieve the encrypted connection credentials and then use your cloud provider's SDK to decrypt them.

Gretel's control plane does not have access to encrypt or decrypt data with this encryption key and unencrypted credentials will never be passed to the Gretel API. The only identity which may access your encryption key and decrypt credentials is the IAM Role associated with your Gretel Workflow Worker pods.

Enabling Asymmetric Encryption

In order to enable asymmetric encryption (i.e. allowing encryption using a public key from a customer managed key), you have to configure your helm installation of Gretel Hybrid with the following fields:

gretelConfig:
  asymmetricEncryption:
    ## An identifier of the key to be used. This is cloud-provider specific; valid ID schemes are:
    ## - aws-kms:<arn> for an AWS KMS key.
    ## - gcp-kms:<resource name> for a GCP KMS key.
    ## - azure-keyvault:<vault-uri>/<key-name>[/<key-version>] for an Azure Keyvault key.
    keyId: <key_id_with_prefix>
    ## The asymmetric encryption algorithm to use. If asymmetric encryption is used, the only supported
    ## algorithm is currently RSA_4096_OAEP_SHA256.
    algorithm: RSA_4096_OAEP_SHA256
    ## The PEM-encoded public key. This should be a PEM block of type "RSA PUBLIC KEY".
    publicKeyPem: |
        <public_key_pem>

If you're using our terraform modules, then simply pull the latest version of the gretel-hybrid repo and you'll see the public key mappings already added in. Here are the examples for

If you aren't using our terraform, then you can create the keys manually and pass them in, making sure that the workflow Kubernetes service account has access to decrypt using the private key (see our terraform for examples).

Please be sure your KMS Key ARN points to an asymmetric key that the Kubernetes Service Account running workflows can access.

Here's a command you can use to get the public key, otherwise you can copy it from the UI

key_id="arn:aws:kms:us-east-1:12345678901:key/a852c401-21f0-4340-8786-029e1d3142ed"
echo "-----BEGIN PUBLIC KEY-----"
# The following sed command wraps at 67 characters
aws kms get-public-key --key-id "$key_id" --query PublicKey --output text | sed -e "s/.\{67\}/&\n/g"
echo "-----END PUBLIC KEY-----"

The result of this can be passed as a file to the helm install, or inlined in the values.yaml

gretelConfig:
  asymmetricEncryption:
    keyId: aws-kms:arn:aws:kms:us-east-1:12345678901:key/a852c401-21f0-4340-8786-029e1d3142ed
    algorithm: RSA_4096_OAEP_SHA256
    publicKeyPem: |
        -----BEGIN PUBLIC KEY-----
        MIIBITANBgkqhkiG9w0BAQEFAAOCAQ4AMIIBCQKCAQBZnMm/gv3GP+viz5sToVGK
        H/x7W1ZF9isDwTOcW24jHQFelm7jyL7R5qj5P6uuYHiFQz5hfZE3WUrsUcUX2agt
        Z5LJ6gZQOMhtqR++ZonzW6rqBHssvdaa9ApdUGOmkz1uxn7eRQNv38yh6tluSfvk
        P1uvQOxLZBTVRIteBPoD3T9PGw1kJ/4CRZ3wS6z9ESEOIur5rzBs56NmQqeCVP08
        EDRuJqdCNW+pcWzp4/d7gXRdPvXgITuMW1Ly38y/Q/C9X6wTUyHjdka0JPIZ2GyP
        VEiEpHimBNvXocCw5HhHK+Lz4WdkvtpAeWnvAGKpX0RH2q9Zm6ox6qi2zwhmHNNb
        AgMBAAE=
        -----END PUBLIC KEY-----

Creating Connections

Prerequisites

Gretel Client Installation

Install and configure the Gretel CLI following our guide here. Be sure you install the hybrid client dependencies for your cloud provider and configure Gretel authentication as outlined in that guide.

Cloud Provider Authentication

Within the previously mentioned Gretel Client installation guide there is a specific section covering cloud provider authentication. Make sure your CLI or SDK environment is set up to authenticate with your cloud provider.

CLI Walkthrough

Step 1 - Create a JSON file with connection configuration

Each individual connector type has a specific configuration schema defined. Refer to the connector documentation for information on the connector type you wish to create. In this example we are creating a MySQL connector and the below JSON snippet was copied directly from the documentation.

Create a local JSON file with the connection configuration. For this example the file is named hybrid-connector.json. Customize the configuration parameters as required for the data source you are connecting to. In the example below we would need to customize the parameters in the "config" section and we would also need to set the password in the "credentials" section.

These sensitive credentials will be encrypted before being sent to the Gretel API and Gretel's control plane will not be able to decrypt these credentials. Be sure you clean up this file after following along with this guide. This will be covered in an explicit step after we finish creating our connection.

hybrid-connector.json

{
  "type": "mssql",
  "name": "my-mssql-connection",
  "config": {
    "username": "john",
    "host": "myserver.example.com",
    "port": 1443,
    "database": "mydatabase",
    "schema": "dbo",
    "params": "TrustServerCertificate=True"
  },
  "credentials": {
    "password": "your-password"
  }
}

Step 2 - Create a Gretel Project

Create a Gretel Project which will contain the connector we're going to create. Anyone that you share the project with will have access to use the connector in any of their own existing Gretel Projects. We'll use the --set-default flag so that we don't have to pass the project as an input when creating the connection in the following step. For more information about sharing connections with other users please see the section covering Connection Sharing.

gretel projects create --name "Gretel-Hybrid-Connections" --display-name "Gretel Hybrid Connections" --set-default

Step 3 - Create the Connection

Create the connection using the below command. Your credentials will be encrypted in memory using your cloud provider's Python SDK before the connection details are sent to the Gretel API.

Please be sure your KMS Key ARN points to the key provisioned during the Gretel Hybrid deployment process. You can retrieve the Key ARN using the AWS Console or the AWS CLI.

gretel connections create --from-file hybrid-connector.json --aws-kms-key-arn "arn:aws:kms:us-west-2:012345678912:key/12345678-726d-4cd9-ab8a-123456789012"

Step 4 - Clean up your sensitive credentials

Now that the Gretel Connection has been created it may be referenced and used with Gretel Workflows. You should clean up your sensitive data by editing the JSON file and redacting the password, or by deleting the file entirely.

Example 1 - Redact credentials

Redact the credentials in case you may need to refer back to the connection configuration for future reference.

{
  "type": "mssql",
  "name": "my-mssql-connection",
  "config": {
    "username": "john",
    "host": "myserver.example.com",
    "port": 1443,
    "database": "mydatabase",
    "schema": "dbo",
    "params": "TrustServerCertificate=True"
  },
  "credentials": {
    "password": "REDACTED"
  }
}

Example 2 - Delete the file

rm -f hybrid-connector.json

SDK Walkthrough

import getpass

from gretel_client import (
    aws_hybrid,
    configure_hybrid_session,
    create_or_get_unique_project,
)
from gretel_client.config import get_session_config
from gretel_client.rest_v1.api.connections_api import ConnectionsApi
from gretel_client.rest_v1.models import (
    CreateConnectionRequest,
    UpdateConnectionRequest,
)

# The vault and key will be created as part of the terraform setup
# for the AWS hybrid install
# of the form arn:aws:kms:us-east-1:123456789010:key/aaaaaaaa-1234-1234-1234-aaaaaaaaaaaa
KEY_ARN = "..."

creds_encryption = aws_hybrid.KMSEncryption(KEY_ARN)

# The user who is associated with your hybrid deployment
DEPLOYMENT_USER = "...@email.com"

# This sets up a hybrid session, using our previously created credentials encrypter
# and the configured deployment user. All SDK functions that don't receive an explicit
# session will use this hybrid session.
configure_hybrid_session(
    api_key=getpass.getpass("Enter API Key:"),
    creds_encryption=creds_encryption,
    deployment_user=DEPLOYMENT_USER,
)

session = get_session_config()
connection_api = session.get_v1_api(ConnectionsApi)

project = create_or_get_unique_project(name="workflow-testing")

connection = connection_api.create_connection(
    CreateConnectionRequest(
        name="my-s3-conn",
        project_id=project.project_guid,
        type="s3",
        # note: best practice is to read in credentials from a file
        # or secret instead of directly embedding sensitive values
        # in python code.
        credentials={
            "access_key_id": "...",
            "secret_access_key": "...",
        },
    )
)

Managing Connections

Last updated