Search…
Operations
Day-to-day connector operations.

Monitoring

The S3 connector publishes a CloudWatch dashboard under [StackName]-Dashboard. Where StackName is the name of the Connector's CloudFormation stack.
This dashboard can be used to monitor the health and throughput of the Connector.

Handling Failures

Failed Jobs

Depending on the data source, or model configuration, a Gretel Job might fail. Failures won't be retried, as they will likely require intervention either by updating the connector's model config or data source.
You can view failures by navigating to the application's CloudWatch dashboard and viewing the "Errors" section of the dashboard.
After you have remedied the failure, the job may be retried by re-pushing failed job's source object to S3.

Lost Jobs

A Gretel Job may become "lost" if a worker loses connectivity to Gretel's control plane, or is shut down before the job completes. In these scenarios, the connector will automatically retry the job.
You can find lost jobs by searching through the connector application logs or navigating to the application's CloudWatch dashboard and viewing the "Errors" section of the dashboard.

Failed Writes to the Destination Bucket

S3 destination write failures will be retried a maximum of two times before failing. To view failed or retried writes you can view connector application logs from CloudWatch or navigate to the application's CloudWatch dashboard and viewing the "Errors" section of the dashboard.

Backups

No data is stored on connector or worker EC2 instances. Model and connector logs are stored in CloudWatch for 90 days. Model artifacts are stored in an intermediate S3 bucket and removed automatically after 24 hours.
Gretel specific model configurations are stored in SSM Parameter Store and must be manually backed up.

Runbook

Running Gretel Connectors and Workers on-prem sometimes requires maintenance. These guides will help you perform common operations on your Gretel S3 connector and on-prem worker instances.

Rolling Instances

Many configuration updates require rolling either connector or worker instances. Both connectors and workers are managed under auto scaling groups. This means you can roll instances by terminating individual hosts from the EC2 console, or initiating an instance refresh from the Auto Scaling console.

Updating a Connector Config, Model or Project

Connector, model and projects configs are managed from Parameter Store. The stack will provision the following parameters:
  • /gretel/{stack-name}/config/artifact-endpoint
  • /gretel/{stack-name}/config/connector
  • /gretel/{stack-name}/config/model
  • /gretel/{stack-name}/config/project
After updating any of these parameters, be sure to roll instances that are dependent on the updated parameter.

Updating the Gretel API Key

The Gretel API Key for the pipeline is stored using Secrets Manager.
  • /gretel/{stack-name}/secrets/api-key
If this key is updated, be sure to roll any of the affected instances.

Troubleshooting Connector or Worker EC2 Instances

Both connector and worker instance publish application and system logs to CloudWatch. The CloudFormation stack will create the following log groups:
  • /gretel/{stack-name}/application/connectors
  • /gretel/{stack-name}/application/workers
  • /gretel/{stack-name}/system/connectors
  • /gretel/{stack-name}/system/workers

Upgrading the CloudFormation Stack

Occasionally there may be updates to the CloudFormation stack. To apply these updates, navigate to the S3 Connector stack from the CloudFormation Console, and select "Update".
On the next screen, enter the updated CloudFormation template, and click "Next".
The next screen will ask to update or confirm your stack's parameters. If necessary, update any params and then press "Next".
The next step will ask you to configure additional stack options. Update these accordingly and then press "Next".
The final step in the wizard will ask you to review the changes to your stack. After these changes have been reviewed, click "Update Stack" to apply the update.
If any changes were made to worker or connector configurations, be sure to roll those instances to pick up any new changes.

Upgrading the Connector Service

The connector service is comprised of two upgradeable components. The connector service, and connector container. The connector container version is managed via the connector config. If the config is configured to use the latest container, roll the connector instance to pick up the latest version.

Upgrading On-Prem Gretel Workers

There are two upgradeable components that make up the Gretel Worker stack. The Gretel Agent, and Gretel Worker containers. Worker containers will automatically be updated as new releases are published by Gretel.
The Gretel Agent runs as a systemd script on each instance and is responsible for launching workers as new jobs come in from the control plane. To update the agent, roll your worker instances using one of the techniques described in Rolling Instances.

Rightsizing your Worker Cluster

Depending on expected throughput you may need to adjust the size of your worker cluster. SQS is used to buffer new objects before they are pushed through the pipeline. The queue's "Number of Messages Received" and "Approximate Number Of Messages Visible" will give you a good indication of the number of worker instances that should be configured for the auto scaling group.

Service Limit Considerations

The connector relies on service limits provided by EC2, SQS and S3. Depending on throughput and compute requirements, service limits for these services may need to be adjusted.

Support

Need help working with the connector? Please contact us at [email protected].
Copy link
On this page
Monitoring
Handling Failures
Failed Jobs
Lost Jobs
Failed Writes to the Destination Bucket
Backups
Runbook
Rolling Instances
Updating a Connector Config, Model or Project
Updating the Gretel API Key
Troubleshooting Connector or Worker EC2 Instances
Upgrading the CloudFormation Stack
Upgrading the Connector Service
Upgrading On-Prem Gretel Workers
Rightsizing your Worker Cluster
Service Limit Considerations
Support