Generate Realistic Personal Details
Person Objects in Data Designer
Generate Realistic Person Data
Data Designer provides powerful capabilities for generating realistic person data. This feature allows you to create synthetic individuals with complete demographic profiles, including names, contact information, addresses, and more. These synthetic personas can be used for a wide range of applications, from testing user databases to creating realistic sample data for applications.
Creating Person Samplers
Person samplers generate realistic person entities with various attributes. You can create them using the with_person_samplers
method:
Each sampler creates a different person object that you can reference throughout your data design.
Configuration Options
Person samplers accept these configuration parameters:
sex
: Specify "Male" or "Female" (optional)locale
: Language and region code (optional, e.g., "en_US", "fr_FR", "de_DE")city
: City within the specified locale (optional)age_range
: Age range for filtering (default: ages above 18 only)state
: US state code, only valid when locale is set to "en_US" (optional)keep_persons_columns (default: False)
: When set to False, all person columns will be dropped from the final dataset.
Locale Support and Data Quality
Important Quality Difference Between Locales:
US Locale (
en_US
): Forlocale="en_US"
, Data Designer uses Gretel's proprietary Probabilistic Generative Model (PGM) trained on US census demographic data. This provides extremely high-quality, realistic, and demographically accurate person data. The relationships between attributes (e.g., age, occupation, education level) are preserved, resulting in coherent and plausible person profiles.Other Locales: For non-US locales, Data Designer uses the Faker library as a fallback. While Faker provides decent data for basic attributes like names and addresses, it doesn't maintain the same level of demographic accuracy or attribute relationships as the PGM. The data quality is notably lower than for US-based personas.
If demographic accuracy and realism are important for your use case, consider using the en_US
locale whenever possible.
Examples
US-Based Realistic Personas
This will generate high-quality, demographically accurate US-based person data using the PGM.
International Personas (Faker-based)
These will use Faker to generate person data for the respective locales.
Accessing Person Attributes
Person objects have many attributes you can reference in your data generation:
first_name
str
Person's first name
middle_name
str | None
Person's middle name
last_name
str
Person's last name
sex
Sex
Person's sex (enum type)
age
int
Person's age
zipcode
str
Zipcode/Postal Code
street_number
int | str
Street number (can be numeric or alphanumeric)
street_name
str
Name of the street
unit
str
Unit/apartment number (US locale only)
city
str
City name
state
str | None
State (US locale only)
county
str | None
County (US locale only)
country
str
Country name
ethnic_background
str | None
Ethnic background (US locale only)
marital_status
str | None
Marital status
education_level
str | None
Education level
bachelors_field
str | None
Field of bachelor's degree
occupation
str | None
Occupation
uuid
str | None
Unique identifier
locale
str
Locale setting
phone_number
str | None
Generated phone number based on location (None for age < 18)
email_address
str | None
Generated email address (None for age < 18)
birth_date
date
Calculated birth date based on age
ssn
str | None
SSN (US locale only)
Using Person Data in Columns
There are two main ways to use person data in your dataset:
1. Creating Columns from Person Attributes
Extract specific attributes from a person into separate columns:
2. Referencing Person Attributes in Prompts
Use person attributes in prompt templates for LLM-generated columns:
Complete Example
Here's a full example showing person sampler usage with locale differences highlighted:
Best Practices for Person Samplers
Use en_US for Maximum Quality: When demographic accuracy is important, prefer the US locale to leverage the high-quality PGM.
Create Multiple Personas: Generate different personas for different roles in your data scenarios (e.g., customers, employees, support agents).
Use Filters: Filter person objects based on sex, location, and age.
Test Different Locales: If you need international data, test the Faker-generated attributes to ensure they meet your quality requirements.
Last updated
Was this helpful?