GCP Setup

Overview

We will deploy Gretel Hybrid's required cloud infrastructure using Terraform (an infrastructure as code tool). This guide will walk you through all the steps necessary to install and configure Terraform, even if you haven't used it before.

Prerequisites

Cloud Infrastructure Requirements

Gretel Hybrid must be deployed within a GKE cluster. You may choose to use a standard GKE cluster or an autopilot cluster.

For CPU and GPU based Gretel Jobs, we recommend configuring node types with at least 16GiB memory and 4vCPUs. We only require one GPU device per worker run.

The specific node types we recommend are:

  • CPU Gretel Model Workers -> n2-standard-4

  • GPU Gretel Model Workers -> g2-standard-4

Please consult the GPU Quotas section in the appendix for help requesting a GPU quota increase if you run into any resource constraints when deploying GPU nodes.

GPU instances are only available in certain GCP Regions. Please consult the Compute Engine GPU Regions and Zones Availability documentation when selecting a region in which to deploy Gretel Hybrid.

Install Common CLI Tools

If you’re missing brew, helm or kubectl, refer to the Prerequisites.

Install Gretel CLI

Even if you have already followed the documentation to install the Gretel CLI, we need to make sure that the GCP libraries are installed for testing our deployment once it is finished. Run the below command to install the latest version of the Gretel CLI with the GCP dependencies.

pip install -U "gretel-client[gcp]"

Install gcloud (GCP) CLI

The official gcloud CLI installation documentation is located here.

MacOS

You can install gcloud CLI using brew.

brew install --cask google-cloud-sdk

Linux

Please follow the installation instructions referenced here.

Windows

Please follow the installation instructions referenced here.

Install gke-gcloud-auth-plugin

After installing the gcloud CLI you will need to install the GKE authentication plugin for kubectl. Please run the below command.

gcloud components install gke-gcloud-auth-plugin

Configure GCP CLI

If you're using kubectl version 1.25 or earlier, you'll need to follow some extra steps to use the gke-gcloud-auth-plugin. These steps are located here.

Login with the gcloud CLI using the following commands.

# Login as normal
gcloud auth login

# Terraform uses the application default login for authentication
gcloud auth application-default login

If you have not done so for your GCP Project, you will need to enable the compute and container APIs using the following commands.

gcloud services enable compute.googleapis.com
gcloud services enable container.googleapis.com

Terraform Based Installation

The Terraform CLI will utilize the authenticated session from the gcloud CLI to deploy and manage your Gretel Hybrid infrastructure. Here is a link to our provided Terraform modules.

Install Terraform

The official Terraform installation instructions are located here. After navigating to the linked documentation there is a dropdown menu where you are able to select your OS and installation method.

Clone and Setup the Gretel Hybrid Repository

The Gretel Hybrid git repository is located here. You may clone the repository by running the below command.

git clone https://github.com/gretelai/gretel-hybrid.git

Now that you have a local copy of the repository, let's enter the proper working directory for deploying a full Gretel Hybrid environment with Terraform

cd gretel-hybrid/terraform-v2/gcp/examples/full_deployment

Here is what our working directory looks like.

full_deployment
├── backend.tf.example
├── main.tf
├── outputs.tf
├── terraform.tfvars.example
└── variables.tf

(Optional) Configure Terraform Backend

Terraform stores information about currently deployed resources in the Terraform State. By default Terraform stores this information in a local file within the current working directory. You can store the Terraform State in a GCS Bucket instead. This provides two benefits.

  1. It is much harder to accidentally delete a state file from your bucket.

  2. Anyone with access to the state file can make changes to the infrastructure as code files, allowing for collaboration within your team or organization.

If you're deploying this Gretel Hybrid Cluster for a production environment or for any sort of extended test, you should create a GCS Bucket to keep your Terraform State so that you do not lose state information.

We provide a script to create this Bucket for you. Simply run the command below, setting the flags as desired.

# The script is located in gretel-hybrid/terraform-v2/gcp/scripts
$ ../../scripts/bootstrap_state_backend.sh --help

Usage: ../../scripts/bootstrap_state_backend.sh -b|--bucket-name BUCKET_NAME -l|--location LOCATION

# Create the GCP Bucket Terraform Backend State Store
$ ../../scripts/bootstrap_state_backend.sh -b example-state-backend -l us-central1 
Creating gs://example-state-backend/...

Bucket 'example-state-backend' created successfully in location 'us-central1'.

Example Terraform Backend Configuration for GCS:
----------------------------------------------
terraform {
  backend "gcs" {
    bucket  = "example-state-backend"
    prefix  = "terraform/state"
  }
}

Now that the GCS Bucket is created, we need to tell Terraform to use it. Rename the backend.tf.example file to backend.tf and edit thebackend block at the beginning of the file. Save your changes to the backend.tf file.

terraform {
  backend "gcs" {
    bucket = "example-terraform-state-bucket"
    prefix = "terraform/state"
  }
}

Configure Variables

The next step is to configure the variables Terraform will use to create the resources for Gretel Hybrid. First, rename the terraform.tfvars.example file to terraform.tfvars with the mv terraform.tfvars terraform.tfvars command. Then review the variables inside the file and configure them as desired. Here is what they look like by default.

# Provide an existing GCP Project ID
project_id      = "your-project"

# The deployment name is used to name created resources.
# eg. the GKE cluster will be: gretel-hybrid-cluster
deployment_name = "gretel-hybrid"

# Set the below variables as desired.
location        = "us-central1"
node_locations  = ["us-central1-a", "us-central1-b", "us-central1-c"]
bucket_location = "US"
gke_version     = "1.27.3-gke.100"

# GCS bucket names need to be globally unique. Change these. 
# These buckets will be created for you.
gretel_source_bucket_name = "gretel-hybrid-source"
gretel_sink_bucket_name   = "gretel-hybrid-sink"

# Provide a list of IAM principal identifiers (users, service accounts, etc.) that 
# are allowed to perform an encrypt operation using the KMS key used for 
# credentials encryption. See: https://cloud.google.com/iam/docs/principal-identifiers
# example: ["user:ben@gretel.ai"]
gretel_credentials_encryption_key_users = []

# Uncomment these two lines for a test/sandbox deployment you plan to destroy afterward
# cluster_prevent_destroy = false
# gretel_credentials_encryption_key_prevent_destroy = false

Make any desired changes to the provided variables. These will be used to create the GCP and Kubernetes resources that are part of the Gretel Hybrid deployment.

Setup Gretel API Key

You can get your Gretel API key from the console by clicking the drop down menu in the top right hand corner of the console and selecting "API Key" under the "Account Settings" section. Here is a direct link to this page. If you haven't logged into the Gretel Console or set up an account, follow our documentation here.

After retrieving your API Key, export the necessary variable using the below command. The variable name must match TF_VAR_gretel_api_key exactly.

export TF_VAR_gretel_api_key="<insert_key_here>"

Deploy Gretel Hybrid

Run these terraform commands from the full_deployment directory.

Initialize terraform. This is an idempotent operation and is always safe to do (resources will not be created/destroyed).

terraform init

View the changes terraform will make upon deployment. Use this any time you make changes to take a closer look at what is going on.

terraform plan

Deploy the module. This will require user confirmation so don't walk away from your shell until you confirm by typing "yes" to start the deployment.

terraform apply

It will take 5-10 minutes for all the necessary resources to be deployed. Congratulations! You've deployed everything necessary to run Gretel Hybrid within your own cloud tenant.

Test Your Deployment

Follow our guide to test your deployment by running a model training job. Test Your Deployment

(Optional) Tear Down Your Deployment

If you would like to clean up your provisioned GCP resources, the following command will cause all provisioned resourced to be deleted. The command will ask for confirmation before proceeding.

terraform destroy

Appendix

GPU Quotas

For Gretel models that utilize GPUs, you’ll need to request GPU quota increases. Visit the Quotas page.

Using the filters up top, you can search for GPU quotas in the region you are operating in.

First you can filter by using the gpu string, then choose your GPU type:

Once you select the GPU type you want to request an increase for, you can then filter on your region:

Finally, ensure you select the specific quota, and select Edit Quotas on the top right and select the quota required. We recommend using L4, T4, or A100 GPUs. Next, you need to make sure the Global GPU quota is set for your project:

You can filter on Quota: GPUs (all regions) and increase the quota.

Last updated