LogoLogo
  • Welcome to Gretel!
  • Gretel Basics
    • Getting Started
      • Quickstart
      • Blueprints
      • Use Case Examples
      • Environment Setup
        • Console
        • SDK
      • Projects
      • Inputs and Outputs
      • Gretel Connectors
        • Object Storage
          • Amazon S3
          • Google Cloud Storage
          • Azure Blob
        • Database
          • MySQL
          • PostgreSQL
          • MS SQL Server
          • Oracle Database
        • Data Warehouse
          • Snowflake
          • BigQuery
          • Databricks
        • Gretel Project
    • Release Notes
      • Platform Release Notes
        • May 2025
        • April 2025
        • March 2025
        • February 2025
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
        • July 2024
        • June 2024
      • Console Release Notes
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
      • Python SDKs
  • Create Synthetic Data
    • Gretel Safe Synthetics
      • Transform
        • Reference
        • Examples
        • Supported Entities
      • Synthetics
        • Gretel Tabular Fine-Tuning
        • Gretel Text Fine-Tuning
        • Gretel Tabular GAN
        • Benchmark Report
        • Privacy Protection
      • Evaluate
        • Synthetic Quality & Privacy Report
        • Tips to Improve Synthetic Data Quality
        • Data Privacy 101
      • SDK
    • Gretel Data Designer
      • Getting Started with Data Designer
      • Define your Data Columns
        • Column Types
        • Add Constraints to Columns
        • Custom Model Configurations
        • Upload Files as Seeds
      • Building your Dataset
        • Seeding your Dataset
        • Generating Data
      • Generate Realistic Personal Details
      • Structured Outputs
      • Code Validation
      • Data Evaluation
      • Magic Assistance
      • Using Jinja Templates
  • Gretel Playground [Legacy]
    • Getting Started
    • Prompts Tips & Best Practices
    • FAQ
    • SDK Examples
    • Tutorials
    • Videos
    • Gretel Playground [Legacy] Inference API
    • Batch Job SDK
  • Reference
    • Gretel's Python Client
    • Gretel’s Open Source Synthetic Engine
    • Gretel’s REST API
    • Homepage
    • Model Suites
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Create Synthetic Data
  2. Gretel Data Designer
  3. Define your Data Columns

Custom Model Configurations

Custom Model Configurations

Control Your AI Generation Parameters

Data Designer allows you to customize the AI models used for text generation through custom model configurations. This feature gives you precise control over model selection and generation parameters, enabling you to fine-tune the behavior of your AI-generated data.

Model Suites Overview

Data Designer offers different model suites with pre-configured model aliases:

Apache-2.0 Suite

suite_name: apache-2.0
license: Apache-2.0
license_url: https://www.apache.org/licenses/LICENSE-2.0
model_aliases:
  text: gretel/mistralai/Mistral-Small-24B-Instruct-2501
  code: gretel/Qwen/Qwen2.5-Coder-32B-Instruct
  judge: gretel/mistralai/Mistral-Small-24B-Instruct-2501-judge
models:
  - gretel/Qwen/Qwen2.5-Coder-32B-Instruct
  - gretel/mistralai/Mistral-Small-24B-Instruct-2501
  - gretel/mistralai/Mistral-Small-24B-Instruct-2501-judge

Llama-3.x Suite

suite_name: llama-3.x
license: Llama 3.x
license_url: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE
model_aliases:
  text: bedrock/meta-llama/Llama-3.1-70B-Instruct
  code: bedrock/meta-llama/Llama-3.1-70B-Instruct
  judge: bedrock/meta-llama/Llama-3.1-70B-Instruct
models:
  - bedrock/meta-llama/Llama-3.1-70B-Instruct

Basic Usage

To use custom model configurations, first define your model configs and then pass them when initializing your Data Designer instance:

from gretel_client.navigator_client import Gretel
from gretel_client.workflows.configs.workflows import ModelConfig, GenerationParameters
from gretel_client.data_designer import columns as C

# Define custom model configurations
model_configs = [
    ModelConfig(
        alias="precise-model",
        model_name="gretel/mistralai/Mistral-Small-24B-Instruct-2501",
        generation_parameters=GenerationParameters(
            temperature=0.2, 
            top_p=0.9
        )
    ),
    ModelConfig(
        alias="creative-model",
        model_name="gretel/Qwen/Qwen2.5-Coder-32B-Instruct",
        generation_parameters=GenerationParameters(
            temperature=0.8, 
            top_p=0.95
        )
    )
]

# Initialize Data Designer with custom model configs
aidd = gretel.data_designer.new(
    model_suite="apache-2.0",
    model_configs=model_configs
)

# Reference models by alias in LLMGenColumn
aidd.add_column(
    C.LLMTextColumn(
        name="factual_description",
        prompt="Write a factual description of a {{product_type}}.",
        model_alias="precise-model"  # Reference by alias
    )
)

aidd.add_column(
    C.LLMTextColumn(
        name="creative_description",
        prompt="Write a creative, engaging description of a {{product_type}}.",
        model_alias="creative-model"  # Reference by alias
    )
)

Using Default Model Aliases

You can also use the pre-configured model aliases from your chosen model suite:

# Initialize Data Designer with a specific model suite
aidd = gretel.data_designer.new(model_suite="apache-2.0")

# Use the pre-configured 'text' model alias
aidd.add_column(
    C.LLMTextColumn(
        name="product_description",
        prompt="Write a description for a {{product_type}}.",        model_alias="text"  # Uses gretel/mistralai/Mistral-Small-24B-Instruct-2501
    )
)

# Use the pre-configured 'code' model alias
aidd.add_column(
    C.LLMCodeColumn(
        name="product_code",
        prompt="Write a Python class to represent a {{product_type}}.",
    )
)

Model Configuration Components

ModelConfig Class

The ModelConfig class takes the following parameters:

  • alias: A unique identifier you'll use to reference this model configuration

  • model_name: The fully qualified model name (depends on your Gretel deployment)

  • generation_parameters: Controls how the model generates text

Generation Parameters

The GenerationParameters class supports various parameters:

  • temperature: Controls randomness in the output

    • Higher values (0.7-1.0) produce more creative, diverse outputs

    • Lower values (0.0-0.3) produce more deterministic, focused outputs

    • Fixed value: temperature=0.75

    • Variable range: temperature={"type": "uniform", "params": {"low": 0.50, "high": 0.90"}}

  • top_p: Controls the diversity of word selection

  • Additional parameters may be available depending on the model

Using Variable Parameters

You can define parameters with dynamic ranges for increased variety in your generated data:

ModelConfig(
    alias="variable-temp-model",
    model_name="gretel/mistralai/Mistral-Small-24B-Instruct-2501",
    generation_parameters=GenerationParameters(
        temperature={"type": "uniform", "params": {"low": 0.50, "high": 0.90"}},
        top_p=0.9
    )
)

This configuration will use a different temperature value for each record, sampled from a uniform distribution between 0.5 and 0.9.

Best Practices

  • Purpose-specific models: Create different model configurations for different types of content (factual vs. creative)

  • Parameter tuning: Start with default parameters and adjust based on your specific needs

  • Consistent naming: Use descriptive aliases that indicate the model's purpose

  • Testing: Preview your results with different configurations before large-scale generation

  • Reproducibility: Use fixed parameters when consistency is important

  • Variety: Use variable parameters when diversity is desired

  • License compliance: Choose model suites according to your organization's licensing requirements

Choosing Between Model Suites

  • Apache-2.0 suite: Provides specialized models for different tasks (text, code, judging)

  • Llama-3.x suite: Uses Meta's Llama 3.1 model via Amazon Bedrock for all tasks

When selecting a model suite, consider your specific needs:

  • Apache-2.0 suite offers task-specific models

  • Llama-3.x suite provides consistent behavior across tasks

  • Licensing requirements may dictate which suite you can use

PreviousAdd Constraints to ColumnsNextUpload Files as Seeds

Last updated 29 days ago

Was this helpful?