Azure Setup

Overview

We will deploy Gretel Hybrid's required cloud infrastructure using Terraform (an infrastructure as code tool). This guide will walk you through all the steps necessary to install and configure Terraform, even if you haven't used it before.

Prerequisites

Cloud Infrastructure Requirements

Some Gretel Models require GPUs (the models are listed here). We recommend using NCasT4 V3 series or ND A100 nodes for GPU based jobs. Other Azure GPU node types are detailed here.

For CPU based jobs, we recommend node types with at least 16GiB memory and 4vCPUs.

In addition to these GPU/CPU job nodes, the Gretel Agent deployment will also need to run on a "regular" node that is not dedicated to Gretel Jobs. This node type does not have any specific resource requirements as the Gretel Agent is lightweight.

Only certain Azure regions support GPU nodes. We have tested the following regions for GPU based clusters.

  • East US

  • East US 2

  • South Central US

Please consult the GPU Quotas section in the appendix for help requesting a GPU quota increase if you run into any resource constraints when deploying GPU nodes.

If you need to use a different region you will need to verify GPU availability before deploying. GPU instance types are documented here and the "Azure Products by Region" tool can help you verify the capabilities of a different region.

Install Common CLI tools

If you’re missing brew, helm or kubectl, refer to the Prerequisites.

Install Gretel CLI

Even if you have already followed the documentation to install the Gretel CLI, we need to make sure that the Azure libraries are installed for testing our deployment once it is finished. Run the below command to install the latest version of the Gretel CLI with the Azure dependencies.

pip install -U "gretel-client[azure]"

Azure CLI

The official Azure CLI installation documentation is located here.

MacOS

brew install az

Linux

# Debian
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

Windows

Follow the Azure CLI for Windows installation guide.

Configure Azure CLI

Get logged in from the CLI with the following command.

az login

And continue in the browser with the appropriate account to log in.

You can verify that you have logged in with the correct Azure subscription by running the following command.

az account show

Terraform Based Installation

The Terraform CLI will utilize the authenticated session from the Azure CLI to deploy and manage your Gretel Hybrid infrastructure. Here is a link to our provided Terraform modules.

Install Terraform

The official Terraform installation instructions are located here. After navigating to the linked documentation there is a dropdown menu where you are able to select your OS and installation method.

Clone and Setup the Gretel Hybrid Repository

The Gretel Hybrid git repository is located here. You may clone the repository by running the below command.

git clone https://github.com/gretelai/gretel-hybrid.git

Now that you have a local copy of the repository, let's enter the proper working directory for deploying a full Gretel Hybrid environment with Terraform

cd gretel-hybrid/terraform-v2/azure/examples/full_deployment

Here is what our working directory looks like.

full_deployment/
├── backend.tf.example
├── main.tf
├── outputs.tf
├── terraform.tfvars.example
└── variables.tf

(Optional) Configure Terraform Backend

Terraform stores information about currently deployed resources in the Terraform State. By default Terraform stores this information in a local file within the current working directory. You can store the Terraform State in an Azure Storage Container instead. This provides two benefits.

  1. It is much harder to accidentally delete a state file from your cloud storage container.

  2. Anyone with access to the state file can make changes to the infrastructure as code files, allowing for collaboration within your team or organization.

If you're deploying this Gretel Hybrid Cluster for a production environment or for any sort of extended test, you should create an Azure Storage Container to keep your Terraform State so that you do not lose state information.

We provide a script to create this Storage Container for you. Simply run the command below, setting the flags as desired.

# The script is located in gretel-hybrid/terraform/-v2/azure/scripts
../../scripts/bootstrap_state_backend.sh --help

Usage: ../../scripts/bootstrap_state_backend.sh [OPTIONS]
Options:
  -h, --help                 Show this help message
  -r, --resource-group NAME  Specify the resource group name. This resource group should not exist yet and will be created. (default: tfstate)
  -s, --storage-account NAME Specify the storage account name. This storage account should not exist yet and will be created. (default: tfstategretel)
  -c, --container NAME       Specify the blob container name. (default: tfstate)
  -l, --location NAME        Specify the location/region (default: southcentralus)

Description:
  This script creates an Azure resource group, a storage account, and a blob container for managing Terraform state.

# Create the Azure resources for your Terraform Backend State Store
../../scripts/bootstrap_state_backend.sh --location "southcentralus" --resource-group "tfstate" --storage-account "tfstategretel" --container "tfstate"

Now that the Azure Blob Container is created, we need to tell Terraform to use it. Rename the backend.tf.example file to backend.tf and edit thebackend block at the beginning of the file. Save your changes to the backend.tf file.

terraform {
  backend "azurerm" {
    resource_group_name  = "tfstate"
    storage_account_name = "tfstategretel"
    container_name       = "tfstate"
    key                  = "gretel-hybrid-env.tfstate"
  }
}

Configure Variables

The next step is to configure the variables Terraform will use to create the resources for Gretel Hybrid. First, rename the terraform.tfvars.example file to terraform.tfvars with the mv terraform.tfvars terraform.tfvars command. Then review the variables inside the file and configure them as desired. Here is what they look like by default.

resource_group_name                  = "gretel-hybrid-env"
region                               = "South Central US"
deployment_name                      = "gretel-hybrid-env"
kubernetes_version                   = "1.27"
gretel_storage_account_name          = "gretelhybrid"  # This will need to be changed
gretel_source_storage_container_name = "gretel-hybrid-source"
gretel_sink_storage_container_name   = "gretel-hybrid-sink"
gretel_key_vault_name                = "gretel-hybrid"  # This will need to be changed

Make any desired changes to the provided variables. (NOTE Storage account and key vault names must be globally unique.) These will be used to create the Azure and Kubernetes resources that are part of the Gretel Hybrid deployment.

Setup Gretel API Key

Since we do not want to statically define sensitive credentials in clear text, we will pass your Gretel API Key to Terraform using an environment variable in the format TF_VAR_<terraform_variable_name>.

You can get your Gretel API key from the console by clicking the drop down menu in the top right hand corner of the console and selecting "API Key" under the "Account Settings" section. Here is a direct link to this page. If you haven't logged into the Gretel Console or set up an account, follow our documentation here.

After retrieving your API Key, export the necessary variable using the below command. The variable name must match TF_VAR_gretel_api_key exactly.

export TF_VAR_gretel_api_key="<insert_key_here>"

Deploy Gretel Hybrid

Run these terraform commands from the full_deployment directory.

Initialize terraform. This is an idempotent operation and is always safe to do (resources will not be created/destroyed).

terraform init

View the changes terraform will make upon deployment. Use this any time you make changes to take a closer look at what is going on.

terraform plan

Deploy the module. This will require user confirmation so don't walk away from your shell until you confirm by typing "yes" to start the deployment.

terraform apply

It will take 5-10 minutes for all the necessary resources to be deployed. Congratulations! You've deployed everything necessary to run Gretel Hybrid within your own cloud tenant.

Test Your Deployment

Follow our guide to test your deployment by running a model training job. Test Your Deployment

(Optional) Tear Down Your Deployment

If you would like to clean up your provisioned Azure resources, the following command will cause all provisioned resourced to be deleted. The command will ask for confirmation before proceeding.

terraform destroy

Appendix

GPU Quotas

If you plan to use Gretel Hybrid to run or test any GPU based models, you will need to deploy GPU Instances for your AKS Cluster. You may run into an issue with the vCPU quota for the recommended GPU Instance Type Standard_NC4as_T4_v3. Each instance of this type has 4 vCPUs, so to deploy a single GPU node your quota will need to be configured to at least 4.

In the attached screenshot you’ll see on the Azure Quotas screen, we have increased our vCPU count for T4s to support at least one Node (the quotas are by CPU, and the minimum size for a D4 series GPU node is 4vCPUs).

To request a quota increase for Standard_NC4as_T4_v3 vCPU count, consult the Azure documentation here.

Deploying Gretel Hybrid for testing purposes? You can utilize our Amplify and Tabular DP models without any GPU nodes. You can follow this deployment guide and choose the "CPU Example" in the Test Your Deployment section.

Last updated