Integrate Gretel with your existing data services using Workflows in a hybrid environment
Gretel Workflows provide an easy to use, config driven API for automating and operationalizing Gretel. Using Workflows, you can connect to various data sources such as S3 or MySQL and schedule recurring jobs to make it easy to secure and share data across your organization.
Workflows are composed of many Workflow Actions. Each Workflow Action is responsible for integrating with some service and performing some processing on its set of inputs and/or producing outputs. These services could be external data stores (e.g. for reading source data or writing synthetic data), or Gretel (e.g. for training and running models).
Connections are used to authenticate a Gretel Action to an external service such as GCS or Snowflake. Each action is tied to at most one external service, and needs to be configured with a connection for the appropriate service.
For more information about Gretel Workflows and Connectors, please see Gretel Workflows. For reference documentation covering the different connector types see Connectors.
How Hybrid Connectors Work
When Gretel Hybrid is deployed an encryption key is created in AWS KMS, Azure Key Vault, or GCP KMS (depending on your cloud provider). This key is used to encrypt your connection credentials, or in the case of asymmetric encryption, the public key is used for encryption. The encrypted credentials are passed to the Gretel API when a hybrid connection is created. When a workflow run is scheduled within your Gretel Hybrid deployment, the Kubernetes pod responsible for interacting with your data source will retrieve the encrypted connection credentials and then use your cloud provider's SDK to decrypt them.
Gretel's control plane does not have access to encrypt or decrypt data with this encryption key and unencrypted credentials will never be passed to the Gretel API. The only identity which may access your encryption key and decrypt credentials is the IAM Role associated with your Gretel Workflow Worker pods.
Enabling Asymmetric Encryption
In order to enable asymmetric encryption (i.e. allowing encryption using a public key from a customer managed key), you have to configure your helm installation of Gretel Hybrid with the following fields:
gretelConfig:asymmetricEncryption:## An identifier of the key to be used. This is cloud-provider specific; valid ID schemes are:## - aws-kms:<arn> for an AWS KMS key.## - gcp-kms:<resource name> for a GCP KMS key.## - azure-keyvault:<vault-uri>/<key-name>[/<key-version>] for an Azure Keyvault key.keyId:<key_id_with_prefix>## The asymmetric encryption algorithm to use. If asymmetric encryption is used, the only supported## algorithm is currently RSA_4096_OAEP_SHA256.algorithm:RSA_4096_OAEP_SHA256## The PEM-encoded public key. This should be a PEM block of type "RSA PUBLIC KEY".publicKeyPem:| <public_key_pem>
If you're using our terraform modules, then simply pull the latest version of the gretel-hybrid repo and you'll see the public key mappings already added in. Here are the examples for
If you aren't using our terraform, then you can create the keys manually and pass them in, making sure that the workflow Kubernetes service account has access to decrypt using the private key (see our terraform for examples).
Please be sure your KMS Key ARN points to an asymmetric key that the Kubernetes Service Account running workflows can access.
Here's a command you can use to get the public key, otherwise you can copy it from the UI
key_id="arn:aws:kms:us-east-1:12345678901:key/a852c401-21f0-4340-8786-029e1d3142ed"echo"-----BEGIN PUBLIC KEY-----"# The following sed command wraps at 67 charactersawskmsget-public-key--key-id"$key_id"--queryPublicKey--outputtext|sed-e"s/.\{67\}/&\n/g"echo"-----END PUBLIC KEY-----"
The result of this can be passed as a file to the helm install, or inlined in the values.yaml
gretelConfig:asymmetricEncryption:keyId:aws-kms:arn:aws:kms:us-east-1:12345678901:key/a852c401-21f0-4340-8786-029e1d3142edalgorithm:RSA_4096_OAEP_SHA256publicKeyPem:| -----BEGIN PUBLIC KEY----- MIIBITANBgkqhkiG9w0BAQEFAAOCAQ4AMIIBCQKCAQBZnMm/gv3GP+viz5sToVGK H/x7W1ZF9isDwTOcW24jHQFelm7jyL7R5qj5P6uuYHiFQz5hfZE3WUrsUcUX2agt Z5LJ6gZQOMhtqR++ZonzW6rqBHssvdaa9ApdUGOmkz1uxn7eRQNv38yh6tluSfvk P1uvQOxLZBTVRIteBPoD3T9PGw1kJ/4CRZ3wS6z9ESEOIur5rzBs56NmQqeCVP08 EDRuJqdCNW+pcWzp4/d7gXRdPvXgITuMW1Ly38y/Q/C9X6wTUyHjdka0JPIZ2GyP VEiEpHimBNvXocCw5HhHK+Lz4WdkvtpAeWnvAGKpX0RH2q9Zm6ox6qi2zwhmHNNb AgMBAAE= -----END PUBLIC KEY-----
Please be sure your Azure Key Vault ID points to a key that the Kubernetes Service Account running workflows can access.
Here's a command you can use to get the public key, otherwise you can copy it from the UI
The result of this can be passed as a file to the helm install, or inlined in the values.yaml
gretelConfig:asymmetricEncryption:keyId:azure-keyvault:https://mykey.vault.azure.net/keys/gretel-hybrid-keyalgorithm:RSA_4096_OAEP_SHA256publicKeyPem:| -----BEGIN PUBLIC KEY----- MIIBITANBgkqhkiG9w0BAQEFAAOCAQ4AMIIBCQKCAQBZnMm/gv3GP+viz5sToVGK H/x7W1ZF9isDwTOcW24jHQFelm7jyL7R5qj5P6uuYHiFQz5hfZE3WUrsUcUX2agt Z5LJ6gZQOMhtqR++ZonzW6rqBHssvdaa9ApdUGOmkz1uxn7eRQNv38yh6tluSfvk P1uvQOxLZBTVRIteBPoD3T9PGw1kJ/4CRZ3wS6z9ESEOIur5rzBs56NmQqeCVP08 EDRuJqdCNW+pcWzp4/d7gXRdPvXgITuMW1Ly38y/Q/C9X6wTUyHjdka0JPIZ2GyP VEiEpHimBNvXocCw5HhHK+Lz4WdkvtpAeWnvAGKpX0RH2q9Zm6ox6qi2zwhmHNNb AgMBAAE= -----END PUBLIC KEY-----
Creating Connections
Prerequisites
Gretel Client Installation
Install and configure the Gretel CLI following our guide here. Be sure you install the hybrid client dependencies for your cloud provider and configure Gretel authentication as outlined in that guide.
Cloud Provider Authentication
Within the previously mentioned Gretel Client installation guide there is a specific section covering cloud provider authentication. Make sure your CLI or SDK environment is set up to authenticate with your cloud provider.
CLI Walkthrough
Step 1 - Create a JSON file with connection configuration
Each individual connector type has a specific configuration schema defined. Refer to the connector documentation for information on the connector type you wish to create. In this example we are creating a MySQL connector and the below JSON snippet was copied directly from the documentation.
Create a local JSON file with the connection configuration. For this example the file is named hybrid-connector.json. Customize the configuration parameters as required for the data source you are connecting to. In the example below we would need to customize the parameters in the "config" section and we would also need to set the password in the "credentials" section.
These sensitive credentials will be encrypted before being sent to the Gretel API and Gretel's control plane will not be able to decrypt these credentials. Be sure you clean up this file after following along with this guide. This will be covered in an explicit step after we finish creating our connection.
Create a Gretel Project which will contain the connector we're going to create. Anyone that you share the project with will have access to use the connector in any of their own existing Gretel Projects. We'll use the --set-default flag so that we don't have to pass the project as an input when creating the connection in the following step. For more information about sharing connections with other users please see the section covering Connection Sharing.
Create the connection using the below command. Your credentials will be encrypted in memory using your cloud provider's Python SDK before the connection details are sent to the Gretel API.
Please be sure your KMS Key ARN points to the key provisioned during the Gretel Hybrid deployment process. You can retrieve the Key ARN using the AWS Console or the AWS CLI.
Please be sure your Azure Key Vault URL and Key ID parameters are referencing the Azure Key Vault Key provisioned during the Gretel Hybrid deployment process. You can retrieve these values using the Azure Console or the az CLI.
Please be sure your GCP KMS Key resource name points to the key provisioned during the Gretel Hybrid deployment process. You can retrieve this value using the GCP Console or the gcloud CLI.
Now that the Gretel Connection has been created it may be referenced and used with Gretel Workflows. You should clean up your sensitive data by editing the JSON file and redacting the password, or by deleting the file entirely.
Example 1 - Redact credentials
Redact the credentials in case you may need to refer back to the connection configuration for future reference.
import getpassfrom gretel_client import ( aws_hybrid, configure_hybrid_session, create_or_get_unique_project,)from gretel_client.config import get_session_configfrom gretel_client.rest_v1.api.connections_api import ConnectionsApifrom gretel_client.rest_v1.models import ( CreateConnectionRequest, UpdateConnectionRequest,)# The vault and key will be created as part of the terraform setup# for the AWS hybrid install# of the form arn:aws:kms:us-east-1:123456789010:key/aaaaaaaa-1234-1234-1234-aaaaaaaaaaaaKEY_ARN ="..."creds_encryption = aws_hybrid.KMSEncryption(KEY_ARN)# The user who is associated with your hybrid deploymentDEPLOYMENT_USER ="...@email.com"# This sets up a hybrid session, using our previously created credentials encrypter# and the configured deployment user. All SDK functions that don't receive an explicit# session will use this hybrid session.configure_hybrid_session( api_key=getpass.getpass("Enter API Key:"), creds_encryption=creds_encryption, deployment_user=DEPLOYMENT_USER,)session =get_session_config()connection_api = session.get_v1_api(ConnectionsApi)project =create_or_get_unique_project(name="workflow-testing")connection = connection_api.create_connection(CreateConnectionRequest( name="my-s3-conn", project_id=project.project_guid, type="s3",# note: best practice is to read in credentials from a file# or secret instead of directly embedding sensitive values# in python code. credentials={"access_key_id": "...","secret_access_key": "...", }, ))
import getpassfrom gretel_client import ( azure_hybrid, configure_hybrid_session, create_or_get_unique_project,)from gretel_client.config import get_session_configfrom gretel_client.rest_v1.api.connections_api import ConnectionsApifrom gretel_client.rest_v1.models import ( CreateConnectionRequest, UpdateConnectionRequest,)# The vault and key will be created as part of the terraform setup# for the Azure hybrid installvault_url ="https://....vault.azure.net"key_id ="..."creds_encryption = azure_hybrid.KeyVaultEncryption( vault_url=vault_url, key_id=key_id)# The user who is associated with your hybrid deploymentDEPLOYMENT_USER ="...@email.com"# This sets up a hybrid session, using our previously created credentials encrypter# and the configured deployment user. All SDK functions that don't receive an explicit# session will use this hybrid session.configure_hybrid_session( api_key=getpass.getpass("Enter API Key:"), creds_encryption=creds_encryption, deployment_user=DEPLOYMENT_USER,)session =get_session_config()connection_api = session.get_v1_api(ConnectionsApi)project =create_or_get_unique_project(name="workflow-testing")azure_conn = connection_api.create_connection(CreateConnectionRequest( name="my-azure-conn", project_id=project.project_guid, type="azure",# note: best practice is to read in credentials from a file# or secret instead of directly embedding sensitive values# in python code. credentials={"access_key": "...",# "sas_token": sas_token,# "entra_password": entra_password }, config={"account_name": "...","default_container": "...",# "entra_config": {# "client_id": client_id,# "tenant_id": tenant_id,# "username": username,# }, }, ))
import getpassfrom gretel_client import ( configure_hybrid_session, create_or_get_unique_project, gcp_hybrid,)from gretel_client.config import get_session_configfrom gretel_client.rest_v1.api.connections_api import ConnectionsApifrom gretel_client.rest_v1.models import ( CreateConnectionRequest, UpdateConnectionRequest,)# The vault and key will be created as part of the terraform setup# for the GCP hybrid installKEY_ARN ="..."creds_encryption = gcp_hybrid.KMSEncryption( key_resource_name=KEY_ARN)# The user who is associated with your hybrid deploymentDEPLOYMENT_USER ="...@email.com"# This sets up a hybrid session, using our previously created credentials encrypter# and the configured deployment user. All SDK functions that don't receive an explicit# session will use this hybrid session.configure_hybrid_session( api_key=getpass.getpass("Enter API Key:"), creds_encryption=creds_encryption, deployment_user=DEPLOYMENT_USER,)session =get_session_config()connection_api = session.get_v1_api(ConnectionsApi)project =create_or_get_unique_project(name="workflow-testing")connection = connection_api.create_connection(CreateConnectionRequest( name="my-gcs-conn", project_id=project.project_guid, type="gcs",# note: best practice is to read in credentials from a file# or secret instead of directly embedding sensitive values# in python code. credentials={"private_key_json": "..." }, ))
import getpassfrom gretel_client import ( configure_hybrid_session, create_or_get_unique_project, gcp_hybrid,)from gretel_client.config import get_session_configfrom gretel_client.rest_v1.api.clusters_api import ClustersApifrom gretel_client.rest_v1.api.connections_api import ConnectionsApifrom gretel_client.rest_v1.models import ( CreateConnectionRequest, UpdateConnectionRequest,)# The following will work if you create a project and associate it with# a cluster that has asymmetric encryption configured# The user who is associated with your hybrid deploymentDEPLOYMENT_USER ="...@email.com"# This sets up a hybrid session, using the asymmetric encrypter by default.# All SDK functions that don't receive an explicit# session will use this hybrid session.configure_hybrid_session( api_key=getpass.getpass("Enter API Key:"), deployment_user=DEPLOYMENT_USER,)session =get_session_config()connection_api = session.get_v1_api(ConnectionsApi)clusters_api = session.get_v1_api(ClustersApi)# This will suggest the first cluster the user has access to. You can narrow the query or select by name as neededclusters = clusters_api.list_clusters().clustersifnot clusters:raiseException("No clusters found for user")project =create_or_get_unique_project(name="workflow-testing", hybrid_environment_guid=clusters[0].guid)connection = connection_api.create_connection(CreateConnectionRequest( name="my-gcs-conn", project_id=project.project_guid, type="gcs",# note: best practice is to read in credentials from a file# or secret instead of directly embedding sensitive values# in python code. credentials={"private_key_json": "..." }, ))