Architecture
Last updated
Last updated
When running in Hybrid mode, the following data will be stored in Gretel's control plane and may be passed between your Gretel Hybrid environment and the Gretel API.
Project names and descriptions
Model configuration (The YAML configuration created for each model)
Model name and ID
Model status (created
, active
, completed
, etc)
Model run ID (when using a model to create more data)
Model run status (created
, active
, completed
, etc)
Workflow IDs, Workflow Run IDs and Workflow Task IDs
Workflow Task Statuses and overall Workflow Run Status
The email address of the user that created a model
The email address of the user that ran a model
Model creation and model run logs. These logs only include metadata and error information.
Workflow Task logs. These logs include metadata and error information, and allow users to view logs in the Console.
Names of data source and results (file names only, no data is stored)
The following data is not stored in Gretel's control plane when using Hybrid mode.
Model training data. This will be stored and accessed from your own object storage (buckets you create).
Model training artifacts. These will be written to your object storage (buckets you create) instead. This includes:
The trained model archive / weights
Quality and privacy reports
Sample data generated during training
Model run artifacts. These will be written to your object storage instead. This includes:
Generated data
Model run reports (if applicable)
An example of viewing a hybrid job using Gretel Transform API:
Gretel Hybrid relies on outbound connections to reach out to the Gretel API and pull container images. No inbound network connections are required for Gretel Hybrid to function. The below endpoints must be reachable from the network associated with the Kubernetes cluster hosting Gretel Hybrid.
api.gretel.cloud (HTTPS / TCP 443)
- The Gretel API. This must be reachable by all Gretel pods running within your Kubernetes cluster for the purposes of job scheduling and orchestration.
artifacts.gretel.cloud (HTTPS / TCP 443)
- This endpoint provides presigned S3 URLs for pulling certain base model artifacts when a model training job starts. This must be reachable by all Gretel pods running within your Kubernetes cluster.
074762682575.dkr.ecr.us-west-2.amazonaws.com (HTTPS / TCP 443)
- Gretel's Contain Registry hosted on AWS ECR. This must be reachable by Kubernetes nodes so that pod container images may be pulled.
s3.amazonaws.com (HTTPS / TCP 443)
- AWS S3 is the persistent storage that backs ECR and this endpoint must be reachable by Kubernetes nodes so that they can pull Gretel container images.
s3-us-west-2.amazonaws.com (HTTPS / TCP 443)
- AWS S3 is the persistent storage that backs ECR and this endpoint must be reachable by Kubernetes nodes so that they can pull Gretel container images.