LogoLogo
  • Welcome to Gretel!
  • Gretel Basics
    • Getting Started
      • Quickstart
      • Blueprints
      • Use Case Examples
      • Environment Setup
        • Console
        • SDK
      • Projects
      • Inputs and Outputs
      • Gretel Connectors
        • Object Storage
          • Amazon S3
          • Google Cloud Storage
          • Azure Blob
        • Database
          • MySQL
          • PostgreSQL
          • MS SQL Server
          • Oracle Database
        • Data Warehouse
          • Snowflake
          • BigQuery
          • Databricks
        • Gretel Project
    • Release Notes
      • Platform Release Notes
        • May 2025
        • April 2025
        • March 2025
        • February 2025
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
        • July 2024
        • June 2024
      • Console Release Notes
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
      • Python SDKs
  • Create Synthetic Data
    • Gretel Safe Synthetics
      • Transform
        • Reference
        • Examples
        • Supported Entities
      • Synthetics
        • Gretel Tabular Fine-Tuning
        • Gretel Text Fine-Tuning
        • Gretel Tabular GAN
        • Benchmark Report
        • Privacy Protection
      • Evaluate
        • Synthetic Quality & Privacy Report
        • Tips to Improve Synthetic Data Quality
        • Data Privacy 101
      • SDK
    • Gretel Data Designer
      • Getting Started with Data Designer
      • Define your Data Columns
        • Column Types
        • Add Constraints to Columns
        • Custom Model Configurations
        • Upload Files as Seeds
      • Building your Dataset
        • Seeding your Dataset
        • Generating Data
      • Generate Realistic Personal Details
      • Structured Outputs
      • Code Validation
      • Data Evaluation
      • Magic Assistance
      • Using Jinja Templates
  • Gretel Playground [Legacy]
    • Getting Started
    • Prompts Tips & Best Practices
    • FAQ
    • SDK Examples
    • Tutorials
    • Videos
    • Gretel Playground [Legacy] Inference API
    • Batch Job SDK
  • Reference
    • Gretel's Python Client
    • Gretel’s Open Source Synthetic Engine
    • Gretel’s REST API
    • Homepage
    • Model Suites
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Create Synthetic Data
  2. Gretel Data Designer

Getting Started with Data Designer

PreviousGretel Data DesignerNextDefine your Data Columns

Last updated 29 days ago

Was this helpful?

Installation

First, ensure you have the latest version of the Gretel SDK installed:

pip install -U gretel_client

Initializing Data Designer

To create a new Data Designer instance, use the data_designer attribute of your Gretel client:

from gretel_client.navigator_client import Gretel

# Initialize Gretel client
gretel = Gretel(api_key="YOUR_API_KEY")

# Create a new Data Designer instance
aidd = gretel.data_designer.new(model_suite="apache-2.0")

Model Suites

When initializing Data Designer, you can specify which model suite to use:

  • apache-2.0: Uses models with Apache 2.0 licenses (default)

  • llama-3.x: Uses Llama 3 models (requires appropriate permissions)

Choose a suite based on your compliance and licensing requirements. Sign up for Gretel Enterprise if you want access to more model suites! Learn about model suites .

Basic Workflow

The general workflow when using Data Designer includes:

  1. Initialize: Create a new Data Designer instance

  2. Define Columns: Add columns with various types and parameters

  3. Preview: Generate a small dataset for inspection

  4. Create Full Dataset: Run a batch job to create a larger dataset

Example: Simple Data Generation

Here's a minimal example to get you started:

from gretel_client.navigator_client import Gretel
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P

# Initialize Gretel client
gretel = Gretel(api_key="YOUR_API_KEY")

# Create a new Data Designer instance
aidd = gretel.data_designer.new(model_suite="apache-2.0")

# Add a category column
aidd.add_column(
    C.SamplerColumn(
        name="product_category",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(
            values=["Electronics", "Clothing", "Home & Kitchen", "Books", "Toys"],
            weights=[0.3, 0.25, 0.2, 0.15, 0.1]  # Optional: control the distribution
        )
    )
)

# Add an LLM-generated column
aidd.add_column(
    C.LLMTextColumn(
        name="product_description",
        system_prompt="You are an expert at writing product descriptions. Your writing style is concise, short, informative.",
        prompt="Generate a detailed description for a product in the {{product_category}} category."
    )
)

# Preview the results
preview = aidd.preview()
preview.display_sample_record()

Submitting a batch job

# Generate a full dataset
workflow_run = aidd.create(
    num_records=100,
    name="product_descriptions"
)

workflow_run.wait_until_done()

# Grab your dataset
df = workflow_run.dataset.df
# Poll the job
run = gretel.workflows.get_workflow_run(workflow_run_id=workflow_run.id)
run.poll()
here