Generating Data
Data Generation in Data Designer
Bringing Your Data Design to Life
Once you've set up your Data Designer with appropriate seeds and column definitions, you're ready for the exciting part: generating data! This guide explains how to preview your design, create full datasets, and access your generated data.
The Data Generation Process
Data Designer follows this straightforward workflow when generating data:
Design Phase: Define your data schema by adding columns and establishing their relationships
Preview Phase: Generate a small sample for validation
Iteration Phase: Refine your design based on preview results
Batch Generation: Scale up to create large datasets
Understanding the Generation Workflow
1. Design Phase
During this first phase, you define what data you want to generate by adding columns, setting up relationships, and establishing constraints.
Key activities:
Adding columns of various types (sampling-based, LLM-based)
Setting up person samplers
Defining constraints between columns
Creating templates that reference other columns
Data Designer automatically analyzes your column definitions to determine the correct generation order based on how columns reference each other.
2. Preview Phase
The preview phase generates a small dataset (typically 10 records) to help you validate your design:
This quick process lets you see your design in action without waiting for a full dataset generation.
Inspecting Preview Results
Data Designer provides several ways to examine your preview results:
These inspection methods help you assess whether your design is producing the expected data. You'll often go through multiple design-preview-iterate cycles before you're ready to generate a full dataset.
3. Iteration Phase
Based on preview results, you can refine your design by modifying columns, adjusting parameters, or changing templates:
This iterative cycle helps you optimize your design before generating a full dataset.
4. Batch Generation
Once your design meets your requirements, you can scale up to create a full dataset:
Parameters for Batch Generation
num_records: The number of records to generate
workflow_run_name: A descriptive name for your job (helps with identification later)
wait_for_completion:
True
: The function will block until the job completesFalse
: The function will return immediately, and you can check status later
Checking Job Status
If you didn't wait for completion, you can check the status later:
After successful generation, you can access your data as follows:
If you didn't wait for completion or need to reconnect to a previous job:
Saving your Data Designer Object
You can save your Data Designer object as a configuration by running the following code:
You can create a new Data Designer object form an existing config as follows:
Best Practices for Data Generation
Always Preview First: Validate your design with a preview before generating a full dataset.
Start Small: Begin with a small number of records to test your design before scaling up.
Name Jobs Clearly: Use descriptive workflow run names to help identify your jobs later.
Monitor Performance: For large datasets, monitor the job status and resources.
Process in Batches: For very large datasets, consider generating and processing in smaller batches.
Note: If you're looking for a more automated approach to creating data designs with less configuration, check out the Magic SDK documentation.
Last updated
Was this helpful?