Getting Started with Data Designer

Installation

First, ensure you have the latest version of the Gretel SDK installed:

pip install -U gretel_client

Initializing Data Designer

To create a new Data Designer instance, use the data_designer attribute of your Gretel client:

from gretel_client.navigator_client import Gretel

# Initialize Gretel client
gretel = Gretel(api_key="YOUR_API_KEY")

# Create a new Data Designer instance
aidd = gretel.data_designer.new(model_suite="apache-2.0")

Model Suites

When initializing Data Designer, you can specify which model suite to use:

  • apache-2.0: Uses models with Apache 2.0 licenses (default)

  • llama-3.x: Uses Llama 3 models (requires appropriate permissions)

Choose a suite based on your compliance and licensing requirements. Sign up for Gretel Enterprise if you want access to more model suites! Learn about model suites here.

Basic Workflow

The general workflow when using Data Designer includes:

  1. Initialize: Create a new Data Designer instance

  2. Define Columns: Add columns with various types and parameters

  3. Preview: Generate a small dataset for inspection

  4. Create Full Dataset: Run a batch job to create a larger dataset

Example: Simple Data Generation

Here's a minimal example to get you started:

from gretel_client.navigator_client import Gretel
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P

# Initialize Gretel client
gretel = Gretel(api_key="YOUR_API_KEY")

# Create a new Data Designer instance
aidd = gretel.data_designer.new(model_suite="apache-2.0")

# Add a category column
aidd.add_column(
    C.SamplerColumn(
        name="product_category",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(
            values=["Electronics", "Clothing", "Home & Kitchen", "Books", "Toys"],
            weights=[0.3, 0.25, 0.2, 0.15, 0.1]  # Optional: control the distribution
        )
    )
)

# Add an LLM-generated column
aidd.add_column(
    C.LLMTextColumn(
        name="product_description",
        system_prompt="You are an expert at writing product descriptions. Your writing style is concise, short, informative.",
        prompt="Generate a detailed description for a product in the {{product_category}} category."
    )
)

# Preview the results
preview = aidd.preview()
preview.display_sample_record()

Submitting a batch job

# Generate a full dataset
workflow_run = aidd.create(
    num_records=100,
    name="product_descriptions"
)

workflow_run.wait_until_done()

# Grab your dataset
df = workflow_run.dataset.df
# Poll the job
run = gretel.workflows.get_workflow_run(workflow_run_id=workflow_run.id)
run.poll()

Last updated

Was this helpful?