Deploying an LLM

Certain Gretel Models can utilize an online LLM (Large Language Model) to improve functionality. This guide will walk you through the steps to deploy an LLM in your Gretel Hybrid environment.

Prerequisites

Ensure you have completed the general prerequisites for deploying Gretel Hybrid, found in the Deployment guide.

  • You'll need to have already installed Gretel Hybrid.

  • This guide will utlize helm to install a chart within your Kubernetes cluster.

Apply the helm chart

The Gretel Inference LLM chart is available in the Gretel Helm repository.

  1. To add repository to your local helm installation, run the following command:

helm repo add gretel https://gretel-blueprints-pub.s3.us-west-2.amazonaws.com/helm-charts/stable/
helm repo update
  1. Create a values.yml file:

gretelConfig:
  # This should match the secret ref that was created as part of Gretel Hybrid
  apiKeySecretRef: "gretel-api-key"

gretelLLMConfig:
  modelName: "mistral-7b"

# Ensure the tolerations allow the LLM pods to run on GPU nodes.
# For example, these tolerations will allow the pod to run if you
# used our terraform modules to create your cluster.
tolerations:
  - effect: NoSchedule
    key: gretel-worker
    operator: Equal
    value: gpu-model
  - effect: NoSchedule
    key: nvidia.com/gpu
    operator: Exists
  1. Ensure your kubectl context is set to the correct cluster where you're already running Gretel Hybrid.

  2. Apply the chart to your Kubernetes cluster:

helm upgrade --namespace gretel-hybrid \
    --install gretel-inference-llm gretel/gretel-inference-llm \
    --values values.yml
  1. After giving the pod a few minutes to spin up, ensure that the pod is in a healthy state:

kubectl --namespace gretel-hybrid get pods -l app.kubernetes.io/name=gretel-inference-llm

Usage

Transform v2 can utilize the Gretel Inference LLM service for classification. For an example of how to configure a hybrid Transform v2 job to use classification, see the Transform v2 guide.

Last updated