AWS Setup

Overview

We will deploy Gretel Hybrid's required cloud infrastructure using Terraform (an infrastructure as code tool). This guide will walk you through all the steps necessary to install and configure Terraform, even if you haven't used it before.

Prerequisites

Cloud Infrastructure Requirements

Some Gretel Models require GPUs (the models are listed here). We recommend setting up an Amazon EKS cluster with the instance types listed below.

g5.xlarge for GPU based jobs

m5.xlarge for CPU based jobs (or any CPU node type with at least 16GiB memory and 4vCPUs)

Choose a region. We have tested the following regions for GPU based clusters. If you need to use another region, you will need to verify GPU availability before deploying.

  • us-east-1

  • us-east-2

  • us-west-2

Install Common CLI Tools

If you’re missing brew, helm or kubectl, refer to the Prerequisites.

Install Gretel CLI

Even if you have already followed the documentation to install the Gretel CLI, we need to make sure that the AWS libraries are installed for testing our deployment once it is finished. Run the below command to install the latest version of the Gretel CLI with the AWS dependencies.

pip install -U "gretel-client[aws]"

AWS CLI

The official AWS CLI installation documentation is located here.

MacOS

brew install awscli

Linux

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

Windows

Navigate to the Windows installation section of the official documentation here.

Configure AWS CLI

The easiest approach for deploying Gretel Hybrid for testing purposes would be to utilize an admin user in a sandbox account.

For production use cases and scenarios where unrestricted permissions are not feasible you will need a minimum of ec2:*, iam:*, eks:* and s3:* permissions to be able to create and manage an EKS cluster, create any necessary IAM roles and policies for EKS Service Accounts, and create and manage the required Gretel Hybrid S3 Buckets.

Once you have appropriate access configured for your AWS Account, follow the official AWS documentation to authenticate to the proper AWS Account with the AWS CLI.

You can verify that you have configured your AWS CLI session as expected by running the below command.

aws sts get-caller-identity

Terraform Based Installation

The Terraform CLI will utilize the authenticated session from the AWS CLI to deploy and manage your Gretel Hybrid infrastructure. Here is a link to our provided Terraform modules.

Install Terraform

The official Terraform installation instructions are located here. After navigating to the linked documentation there is a dropdown menu where you are able to select your OS and installation method.

Clone and Setup the Gretel Hybrid Repository

The Gretel Hybrid git repository is located here. You may clone the repository by running the below command.

git clone https://github.com/gretelai/gretel-hybrid.git

Now that you have a local copy of the repository, let's enter the proper working directory for deploying a full Gretel Hybrid environment with Terraform

cd gretel-hybrid/terraform-v2/aws/examples/full_deployment

Here is what our working directory looks like.

full_deployment
├── main.tf
├── outputs.tf
├── terraform.tfvars
└── variables.tf

(Optional) Configure Terraform Backend and Provider

Terraform stores information about currently deployed resources in the Terraform State. By default Terraform stores this information in a local file within the current working directory. You can store the Terraform State in an AWS S3 Bucket instead. This provides two benefits.

  1. It is much harder to accidentally delete a state file from your cloud storage bucket.

  2. Anyone with access to the state file can make changes to the infrastructure as code files, allowing for collaboration within your team or organization.

If you're deploying this Gretel Hybrid Cluster for a production environment or for any sort of extended test, you should create an AWS S3 Bucket to keep your Terraform State so that you do not lose state information.

We provide a script to create this S3 Bucket for you. Simply run the command below, setting the flags as desired.

# The script is located in gretel-hybrid/terraform-v2/aws/scripts
../../scripts/bootstrap_state_backend.sh
Usage: ../../scripts/bootstrap_state_backend.sh --aws-region <AWS_REGION> --bucket-name <BUCKET_NAME>

# Create the AWS resources for your Terraform Backend State Store
../../scripts/bootstrap_state_backend.sh --aws-region us-west-2 --bucket-name my-unique-tf-state-bucket-name

Now that the AWS S3 Bucket is created, we need to tell Terraform to use it. Edit the main.tf file and uncomment the backend block within the terraform block at the beginning of the file. It should look something like this after you uncomment it. Make sure to enter the appropriate bucket and region that you used when running the provided script. Save these changes to the main.tf file.

terraform {
  backend "s3" {
    bucket = "my-unique-tf-state-bucket-name"
    key    = "state/aws-gretel-hybrid/terraform.tfstate"
    region = "us-west-2"
  }
}

In the main.tf file you should also uncomment the AWS provider and configure it for your use case. An example is shown below.

provider "aws" {
  region = local.region
  # Uncomment the below block if you are authenticated in Account A and need to assume a role and deploy to Account B
  # assume_role {
  #   role_arn = "arn:aws:iam::012345678912:role/TerraformExecution"
  # }
}

Configure Variables

The next step is to review the variables in the terraform.tfvars file and configure them as desired. Here is what they look like by default.

region                    = "us-west-2"
deployment_name           = "gretel-hybrid-env"
kubernetes_version        = "1.27"
gretel_source_bucket_name = "gretel-hybrid-source"
gretel_sink_bucket_name   = "gretel-hybrid-sink"
# Provide any IAM users or roles which should be allowed to run "aws eks update-kubeconfig" to gain access to the cluster
cluster_admin_roles = {
  # Format: "<name_for_kubernetes_rbac>" = "<iam_role_arn>"
  # Example: "adminrole" = "arn:aws:iam::012345678912:role/cloud_team_admin_role"
}
cluster_admin_users = {
  # Format: "<name_for_kubernetes_rbac>" = "<iam_user_arn>"
  # Example: "poweruser" = "arn:aws:iam::012345678912:user/cloud_team_admin_user"
}

You will need to change the bucket_name variables since S3 bucket names must be globally unique.

Make any desired changes to the provided variables. These will be used to create the AWS and Kubernetes resources that are part of the Gretel Hybrid deployment.

Setup Gretel API Key

Since we do not want to statically define sensitive credentials in clear text, we will pass your Gretel API Key to Terraform using an environment variable in the format TF_VAR_<terraform_variable_name>.

You can get your Gretel API key from the console by clicking the drop down menu in the top right hand corner of the console and selecting "API Key" under the "Account Settings" section. Here is a direct link to this page. If you haven't logged into the Gretel Console or set up an account, follow our documentation here.

After retrieving your API Key, export the necessary variable using the below command. The variable name must match TF_VAR_gretel_api_key exactly.

export TF_VAR_gretel_api_key="<insert_key_here>"

Deploy Gretel Hybrid

Run these terraform commands from the full_deployment directory.

Initialize terraform. This is an idempotent operation and is always safe to do (resources will not be created/destroyed).

terraform init

View the changes terraform will make upon deployment. Use this any time you make changes to take a closer look at what is going on.

terraform plan

Deploy the module. This will require user confirmation so don't walk away from your shell until you confirm by typing "yes" to start the deployment.

terraform apply

It will take 10-20 minutes for all the necessary resources to be deployed. Congratulations! You've deployed everything necessary to run Gretel Hybrid within your own cloud tenant.

Test Your Deployment

Follow our guide to test your deployment by running a model training job. Test Your Deployment

(Optional) Tear Down Your Deployment

If you would like to clean up your provisioned AWS resources, the following command will cause all provisioned resourced to be deleted. The command will ask for confirmation before proceeding.

terraform destroy

Last updated