Data Designer Configuration
Data Designer provides a high level YAML interface to declaratively define your dataset.
Note: This is a simple configuration that may not yield high quality data. This config is used for illustration purposes only.
model_suite: apache-2.0
special_system_instructions: >-
You are an expert at generating consistent event log entries. Your job is to create realistic event data.
categorical_seed_columns:
- name: event_type
values: [Login, Logout, PageView]
- name: user_type
values: [Anonymous, Registered]
- name: device_type
values: [Mobile, Desktop, Tablet]
- name: action_status
values: [Success, Failure]
generated_data_columns:
- name: timestamp
generation_prompt: >-
Generate a realistic timestamp for an {event_type} event within the last 24 hours.
Format: YYYY-MM-DD HH:MM:SS
- name: event_details
generation_prompt: >-
Create event details for a {event_type} by a {user_type} user on {device_type} with status {action_status}.
Include basic information like device ID and session duration if applicable.
columns_to_list_in_prompt: all_categorical_seed_columns
Model Suites: To learn more about model suites, check out this page!
Load your Config
Once you define the configuration in YAML, you can use the Gretel SDK to load the configuration and then generate data.
from gretel_client.navigator import DataDesigner
session_kwargs = {
"api_key": "<YOUR_API_KEY>",
"endpoint": "https://api.gretel.cloud",
}
data_designer = DataDesigner.from_config(config_string, **session_kwargs)
Generate Data
Once you have a define DataDesigner
object, you can generate your dataset.
Preview Data
You can generate a quick preview of your dataset, assess the data generated, and adjust your config if needed.
preview = data_designer.generate_dataset_preview()
-
[17:11:23] [INFO] ๐ Generating dataset preview
[17:11:24] [INFO] ๐ฅ Step 1: Load data seeds
[17:11:24] [INFO] ๐ฒ Step 2: Sample data seeds
[17:11:24] [INFO] ๐ฆ Step 3: Generate column from template >> generating timestamp
[17:11:25] [INFO] ๐ฆ Step 4: Generate column from template >> generating event details
[17:11:27] [INFO] ๐ Your dataset preview is ready for a peek!
Display a record
preview.display_sample_record()
-
Categorical Seed Columns
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Name โ Value โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ event_type โ PageView โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ user_type โ Registered โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ device_type โ Tablet โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ action_status โ Failure โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Generated Data Columns
โโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Name โ Value โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ timestamp โ 2023-11-28 14:35:42 โ
โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ event_details โ device_id: 1234567890, session_duration: 00:03:45, user_type: Registered, device_type: Tablet, โ
โ โ action_status: Failure, page_viewed: home_page โ
โโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Submit a Batch Job
Once you are happy with your configuration, you can submit a batch job to generate as many records as you want!
batch_job = data_designer.submit_batch_workflow(num_records=100)
Last updated
Was this helpful?