Generate Realistic Personal Details
Person Objects in Data Designer
Generate Realistic Person Data
Data Designer provides powerful capabilities for generating realistic person data. This feature allows you to create synthetic individuals with complete demographic profiles, including names, contact information, addresses, and more. These synthetic personas can be used for a wide range of applications, from testing user databases to creating realistic sample data for applications.
Creating Person Samplers
Person samplers generate realistic person entities with various attributes. You can create them using the with_person_samplers
method:
aidd.with_person_samplers({
"customer": {"sex": "Female", "locale": "en_US"},
"employee": {"sex": "Female", "locale": "en_GB"},
"random_person": {} # Default settings
},
keep_person_columns=True # False by default
)
Each sampler creates a different person object that you can reference throughout your data design.
Configuration Options
Person samplers accept these configuration parameters:
sex
: Specify "Male" or "Female" (optional)locale
: Language and region code (optional, e.g., "en_US", "fr_FR", "de_DE")city
: City within the specified locale (optional)age_range
: Age range for filtering (default: ages above 18 only)state
: US state code, only valid when locale is set to "en_US" (optional)keep_persons_columns (default: False)
: When set to False, all person columns will be dropped from the final dataset.
Locale Support and Data Quality
Important Quality Difference Between Locales:
US Locale (
en_US
): Forlocale="en_US"
, Data Designer uses Gretel's proprietary Probabilistic Generative Model (PGM) trained on US census demographic data. This provides extremely high-quality, realistic, and demographically accurate person data. The relationships between attributes (e.g., age, occupation, education level) are preserved, resulting in coherent and plausible person profiles.Other Locales: For non-US locales, Data Designer uses the Faker library as a fallback. While Faker provides decent data for basic attributes like names and addresses, it doesn't maintain the same level of demographic accuracy or attribute relationships as the PGM. The data quality is notably lower than for US-based personas.
If demographic accuracy and realism are important for your use case, consider using the en_US
locale whenever possible.
Examples
US-Based Realistic Personas
aidd.with_person_samplers({
"us_customer": {"locale": "en_US", "sex": "Female"}
})
This will generate high-quality, demographically accurate US-based person data using the PGM.
International Personas (Faker-based)
aidd.with_person_samplers({
"french_customer": {"locale": "fr_FR"},
"german_customer": {"locale": "de_DE"},
"spanish_customer": {"locale": "es_ES"}
})
These will use Faker to generate person data for the respective locales.
Accessing Person Attributes
Person objects have many attributes you can reference in your data generation:
first_name
str
Person's first name
middle_name
str | None
Person's middle name
last_name
str
Person's last name
sex
Sex
Person's sex (enum type)
age
int
Person's age
zipcode
str
Zipcode/Postal Code
street_number
int | str
Street number (can be numeric or alphanumeric)
street_name
str
Name of the street
unit
str
Unit/apartment number (US locale only)
city
str
City name
state
str | None
State (US locale only)
county
str | None
County (US locale only)
country
str
Country name
ethnic_background
str | None
Ethnic background (US locale only)
marital_status
str | None
Marital status
education_level
str | None
Education level
bachelors_field
str | None
Field of bachelor's degree
occupation
str | None
Occupation
uuid
str | None
Unique identifier
locale
str
Locale setting
phone_number
str | None
Generated phone number based on location (None for age < 18)
email_address
str | None
Generated email address (None for age < 18)
birth_date
date
Calculated birth date based on age
ssn
str | None
SSN (US locale only)
Using Person Data in Columns
There are two main ways to use person data in your dataset:
1. Creating Columns from Person Attributes
Extract specific attributes from a person into separate columns:
aidd.add_column(
name="first_name",
type="expression",
expr="{{customer.first_name}}"
)
aidd.add_column(
name="last_name",
type="expression",
expr="{{customer.last_name}}"
)
aidd.add_column(
name="email",
type="expression",
expr="{{customer.email_address}}"
)
2. Referencing Person Attributes in Prompts
Use person attributes in prompt templates for LLM-generated columns:
aidd.add_column(
name="customer_profile",
prompt="""
Create a customer profile summary for:
Name: {{customer.first_name}} {{customer.last_name}}
Age: {{customer.age}}
Occupation: {{customer.occupation}}
Education: {{customer.education_level}}
The summary should be professional and highlight their background and potential needs.
"""
)
Complete Example
Here's a full example showing person sampler usage with locale differences highlighted:
from gretel_client.navigator_client import Gretel
# Initialize Gretel client
gretel = Gretel(api_key="YOUR_API_KEY")
# Create a new Data Designer instance
aidd = gretel.data_designer.new(model_suite="apache-2.0")
# Create person samplers - note the different locales
aidd.with_person_samplers({
"us_customer": {"sex": "Female", "locale": "en_US"}, # Uses PGM for high-quality data
"intl_customer": {"sex": "Male", "locale": "fr_FR"} # Uses Faker as fallback
})
# Extract customer attributes
aidd.add_column(
name="customer_id",
type="uuid",
params={"prefix": "CUST-"}
)
# US customer (PGM-based)
aidd.add_column(
name="us_customer_name",
type="expression",
expr="{{us_customer.first_name}} {{us_customer.last_name}}"
)
aidd.add_column(
name="us_customer_email",
type="expression",
expr="{{us_customer.email_address}}"
)
aidd.add_column(
name="us_customer_location",
type="expression",
expr="{{us_customer.city}}, {{us_customer.region}}"
)
aidd.add_column(
name="us_customer_demographics",
type="expression",
expr="{{us_customer.education_level}}/{{us_customer.occupation}}"
)
# International customer (Faker-based)
aidd.add_column(
name="intl_customer_name",
type="expression",
expr="{{intl_customer.first_name}} {{intl_customer.last_name}}"
)
aidd.add_column(
name="intl_customer_location",
type="expression",
expr="{{intl_customer.city}} {{intl_customer.country}}"
)
# Add a support scenario category
aidd.add_column(
name="support_scenario",
type="category",
params={
"values": ["Account Access", "Billing Issue", "Technical Problem", "Feature Request"]
}
)
aidd.add_column(
name="intl_customer_location",
type="expression",
expr="{{intl_customer.city}}, {{intl_customer.country}}"
)
# Add a support scenario category
aidd.add_column(
name="support_scenario",
type="category",
params={
"values": ["Account Access", "Billing Issue", "Technical Problem", "Feature Request"]
}
)
# Generate a comparative customer support interaction
aidd.add_column(
name="support_conversation",
prompt="""
Generate a support conversation snippet between two customers and a support agent.
US Customer: {{us_customer.first_name}} {{us_customer.last_name}}
US Customer Location: {{us_customer.city}}, {{us_customer.state}}
US Customer Demographics: {{us_customer_demographics}}
International Customer: {{intl_customer.first_name}} {{intl_customer.last_name}}
International Customer Location: {{intl_customer.city}}, {{intl_customer.country}}
Support Scenario: {support_scenario}
Write a realistic support conversation where both customers experience the same {support_scenario}
but have slightly different needs based on their backgrounds and locations.
"""
)
# Preview the results
preview = aidd.preview()
preview.display_sample_record()
Best Practices for Person Samplers
Use en_US for Maximum Quality: When demographic accuracy is important, prefer the US locale to leverage the high-quality PGM.
Create Multiple Personas: Generate different personas for different roles in your data scenarios (e.g., customers, employees, support agents).
Use Filters: Filter person objects based on sex, location, and age.
Test Different Locales: If you need international data, test the Faker-generated attributes to ensure they meet your quality requirements.
Last updated
Was this helpful?