Running Gretel Hybrid
Gretel Hybrid provides customers with the flexibility to deploy their own data plane within their preferred cloud tenant. When you choose this option, Gretel Cloud's only role is job and workflow orchestration, ensuring all your data and models remain in your own tenant.
We will walk you through the following steps to deploy Gretel Hybrid.
Please note that once Hybrid is setup, job management tasks such including training and running Gretel models, are available exclusively through the SDK/CLI interfaces. You will still be able to view your projects and model activity on the Gretel console.
When using Gretel Hybrid, you must configure your SDK to work in Hybrid mode and have a valid bucket to store output artifacts.
The Gretel Console will only give you viewing functionality for the models created in Hybrid mode. When running in Hybrid mode, the following data will be stored in Gretel Cloud:
- Project names and descriptions
- Model configuration (The YAML configuration created for each model)
- Model name and ID
- Model status (
created
,active
,completed
, etc) - Model run ID (when using a model to create more data)
- Model run status (
created
,active
,completed
, etc) - The email address of the user that created a model
- The email address of the user that ran a model
- Model creation and model run logs. These logs only include metadata and error information.
- Names of data source and results (file names only, no data is stored)
An example of viewing a hybrid job using Gretel Transform API:

Logs are the only artifacts stored in Gretel Cloud. Data source and generated result names can be viewed, but data is not stored in Gretel Cloud.
The following data is not stored in Gretel Cloud when using Hybrid mode:
- Model training data. This will be stored and accessed from your own object storage (buckets you create).
- Model training artifacts. These will be written to your object storage (buckets you create) instead. This includes:
- The trained model archive / weights
- Quality and privacy reports
- Sample data generated during training
- Model run artifacts. These will be written to your object storage instead. This includes:
- Generated data
- Model run reports (if applicable)
These instructions will run you through the following for each Cloud Provider:
- Installing necessary command line tooling
- Setting up your data source and sink buckets
- Creating and managing your Kubernetes cluster with the required configuration and access controls
- Testing your deployment with sample jobs
Before getting started, you’ll need to install some tools on your system. If you’re using MacOS, we recommend that you install homebrew.
You’ll need
kubectl
to communicate with your Kubernetes cluster.brew update
brew install kubectl
kubectl version --client
You’ll need
helm
to configure your Kubernetes cluster.brew install helm
helm version
You should be on at least helm version
v3.10.2
to make sure you don’t run into any issues.
Install and configure your client before proceeding further, and ensure session configuration is set as follows. The hybrid environment configuration will apply to everything run with the Gretel client, including libraries like Gretel Trainer and Gretel Relational.
The system that you are running Gretel SDKs from should have access to the
artifact_endpoint
below, which is an object storage bucket. This bucket should be the SINK_BUCKET
that you configure in respective cloud-specific setups.from gretel_client import configure_session
configure_session(
api_key="prompt", # for Notebook environments
validate=True,
default_runner="hybrid",
artifact_endpoint="s3://my-sink-bucket" # or gcs://, azure://
)
The Gretel Client uses cloud provider specific libraries to interact with the underlying object storage via the
smart_open
library.S3
When using S3, the Gretel Client will look for default credentials already configured on your system. Docs for configuring S3 credentials can be found here.
GCS
When using GCS, the Gretel Client will look for default credentials already configured on your machine. Docs for configuring GCS credentials can be found here.
Azure
There is no standard way to configure credentials for Azure. The Gretel Client will look for credentials under the
AZURE_STORAGE_CONNECTION_STRING
or OAUTH_STORAGE_ACCOUNT_NAME
environment variable. To fetch a connection string for the
AZURE_STORAGE_CONNECTION_STRING
you can run the following command from your terminal using the Azure CLI.az storage account show-connection-string \
--name ${STORAGE_ACCOUNT_NAME} \
--resource-group "${RESOURCE_GROUP}" --query="connectionString"
Be sure to replace
STORAGE_ACCOUNT_NAME
and RESOURCE_GROUP
with the appropriate values for your storage container.The
OAUTH_STORAGE_ACCOUNT_NAME
may be used to configure the Gretel Client with system assigned managed identities. OAUTH_STORAGE_ACCOUNT_NAME
should contain the value of the storage account associated with your storage container.
Last modified 1mo ago