Getting Started with Data Designer
Installation
First, ensure you have the latest version of the Gretel SDK installed:
pip install -U gretel_client
Initializing Data Designer
To create a new Data Designer instance, use the data_designer
attribute of your Gretel client:
from gretel_client.navigator_client import Gretel
# Initialize Gretel client
gretel = Gretel(api_key="YOUR_API_KEY")
# Create a new Data Designer instance
aidd = gretel.data_designer.new(model_suite="apache-2.0")
Model Suites
When initializing Data Designer, you can specify which model suite to use:
apache-2.0
: Uses models with Apache 2.0 licenses (default)llama-3.x
: Uses Llama 3 models (requires appropriate permissions)
Choose a suite based on your compliance and licensing requirements. Sign up for Gretel Enterprise if you want access to more model suites! Learn about model suites here.
Basic Workflow
The general workflow when using Data Designer includes:
Initialize: Create a new Data Designer instance
Define Columns: Add columns with various types and parameters
Preview: Generate a small dataset for inspection
Create Full Dataset: Run a batch job to create a larger dataset
Example: Simple Data Generation
Here's a minimal example to get you started:
from gretel_client.navigator_client import Gretel
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P
# Initialize Gretel client
gretel = Gretel(api_key="YOUR_API_KEY")
# Create a new Data Designer instance
aidd = gretel.data_designer.new(model_suite="apache-2.0")
# Add a category column
aidd.add_column(
C.SamplerColumn(
name="product_category",
type=P.SamplerType.CATEGORY,
params=P.CategorySamplerParams(
values=["Electronics", "Clothing", "Home & Kitchen", "Books", "Toys"],
weights=[0.3, 0.25, 0.2, 0.15, 0.1] # Optional: control the distribution
)
)
)
# Add an LLM-generated column
aidd.add_column(
C.LLMTextColumn(
name="product_description",
system_prompt="You are an expert at writing product descriptions. Your writing style is concise, short, informative.",
prompt="Generate a detailed description for a product in the {{product_category}} category."
)
)
# Preview the results
preview = aidd.preview()
preview.display_sample_record()
Submitting a batch job
# Generate a full dataset
workflow_run = aidd.create(
num_records=100,
name="product_descriptions"
)
workflow_run.wait_until_done()
# Grab your dataset
df = workflow_run.dataset.df
# Poll the job
run = gretel.workflows.get_workflow_run(workflow_run_id=workflow_run.id)
run.poll()
Last updated
Was this helpful?