If you prefer not to use YAML, you can use the Gretel SDK to define your Data Designer workflow, here is a simple example.
Step 1: Define your Model Suite and System Prompt
from gretel_client.navigator import DataDesigner
session_kwargs = {
"api_key": "<YOUR_API_KEY>",
"endpoint": "https://api.gretel.cloud",
}
model_suite = 'apache-2.0'
special_system_instructions = """
You are an expert conversation designer and domain specialist. Your job is to
produce realistic user-assistant dialogues for fine-tuning a model. Always ensure:
- Responses are factually correct and contextually appropriate.
- Communication is clear, helpful, and matches the complexity level.
- Avoid disallowed content and toxicity.
- After the two-turn conversation, provide a single toxicity assessment for the user's messages in the entire conversation.
"""
data_designer.add_generated_data_column(
name="timestamp",
generation_prompt=(
"Generate a realistic timestamp for an {event_type} event within the last 24 hours." \
"Format: YYYY-MM-DD HH:MM:SS"
)
)
data_designer.add_generated_data_column(
name="event_details",
generation_prompt=(
"Create event details for a {event_type} by a {user_type} user on {device_type} with status {action_status}." \
"Include basic information like device ID and session duration if applicable."
)
)
Generate Data
Once you have a define DataDesignerobject, you can generate your dataset.
Preview Data
You can generate a quick preview of your dataset, assess the data generated, and adjust your config if needed.
preview = data_designer.generate_dataset_preview()
-
[17:11:23] [INFO] 🚀 Generating dataset preview
[17:11:24] [INFO] 📥 Step 1: Load data seeds
[17:11:24] [INFO] 🎲 Step 2: Sample data seeds
[17:11:24] [INFO] 🦜 Step 3: Generate column from template >> generating timestamp
[17:11:25] [INFO] 🦜 Step 4: Generate column from template >> generating event details
[17:11:27] [INFO] 👀 Your dataset preview is ready for a peek!
Batch jobs may take a while to complete depending on how much data you create. Batch jobs create a Gretel Workflow that has an ID and you can use that ID to fetch your dataset.