Data Designer SDK
If you prefer not to use YAML, you can use the Gretel SDK to define your Data Designer workflow, here is a simple example.
Step 1: Define your Model Suite and System Prompt
from gretel_client.navigator import DataDesigner
session_kwargs = {
"api_key": "<YOUR_API_KEY>",
"endpoint": "https://api.gretel.cloud",
}
model_suite = 'apache-2.0'
special_system_instructions = """
You are an expert conversation designer and domain specialist. Your job is to
produce realistic user-assistant dialogues for fine-tuning a model. Always ensure:
- Responses are factually correct and contextually appropriate.
- Communication is clear, helpful, and matches the complexity level.
- Avoid disallowed content and toxicity.
- After the two-turn conversation, provide a single toxicity assessment for the user's messages in the entire conversation.
"""
Step 2: Add your Seed Columns
data_designer = DataDesigner(
model_suite=model_suite,
special_system_instructions=special_system_instructions,
**session_kwargs
)
data_designer.add_categorical_seed_column(
name="event_type",
values=["Login", "Logout", "PageView"]
)
data_designer.add_categorical_seed_column(
name="user_type",
values=["Anonymous", "Registered"]
)
data_designer.add_categorical_seed_column(
name="device_type",
values=["Mobile", "Desktop", "Tablet"]
)
data_designer.add_categorical_seed_column(
name="action_status",
values=["Success", "Failure"]
)
Step 3: Add your Data Columns
data_designer.add_generated_data_column(
name="timestamp",
generation_prompt=(
"Generate a realistic timestamp for an {event_type} event within the last 24 hours." \
"Format: YYYY-MM-DD HH:MM:SS"
)
)
data_designer.add_generated_data_column(
name="event_details",
generation_prompt=(
"Create event details for a {event_type} by a {user_type} user on {device_type} with status {action_status}." \
"Include basic information like device ID and session duration if applicable."
)
)
Generate Data
Once you have a define DataDesigner
object, you can generate your dataset.
Preview Data
You can generate a quick preview of your dataset, assess the data generated, and adjust your config if needed.
preview = data_designer.generate_dataset_preview()
-
[17:11:23] [INFO] ๐ Generating dataset preview
[17:11:24] [INFO] ๐ฅ Step 1: Load data seeds
[17:11:24] [INFO] ๐ฒ Step 2: Sample data seeds
[17:11:24] [INFO] ๐ฆ Step 3: Generate column from template >> generating timestamp
[17:11:25] [INFO] ๐ฆ Step 4: Generate column from template >> generating event details
[17:11:27] [INFO] ๐ Your dataset preview is ready for a peek!
Display a record
preview.display_sample_record()
-
Categorical Seed Columns
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Name โ Value โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ event_type โ PageView โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ user_type โ Registered โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ device_type โ Tablet โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ action_status โ Failure โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Generated Data Columns
โโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Name โ Value โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ timestamp โ 2023-11-28 14:35:42 โ
โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ event_details โ device_id: 1234567890, session_duration: 00:03:45, user_type: Registered, device_type: Tablet, โ
โ โ action_status: Failure, page_viewed: home_page โ
โโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Submit a Batch Job
Once you are happy with your configuration, you can submit a batch job to generate as many records as you want!
batch_job = data_designer.submit_batch_workflow(num_records=100)
Last updated
Was this helpful?