Column Types
Data Designer supports various column types that determine how data is generated. This guide explains the different column types available and how to use them.
Two Ways to Define Columns
Data Designer offers two approaches to define columns:
Simplified API: Direct parameter passing with string type names
Typed API: More verbose but provides better type checking and IDE support
Both approaches offer the same functionality - choose the style that works best for your needs.
Simplified API Example
The simplified approach is concise and easy to use:
# Simplified API approach
aidd.add_column(
name="product_category",
type="category",
params={"values": ["Electronics", "Clothing", "Home Goods"]}
)
Typed API Example
The typed API provides better code completion and type checking:
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P
# Typed API approach
aidd.add_column(
C.SamplerColumn(
name="product_category",
type=P.SamplerType.CATEGORY,
params=P.CategorySamplerParams(values=["Electronics", "Clothing", "Home Goods"])
)
)
When to Use Each Approach
Choose the Simplified API when:
You prefer concise, readable code
You're working on quick prototypes or simple designs
You don't need IDE autocompletion for parameters
Choose the Typed API when:
You want code completion and type checking in your IDE
You're working on complex designs where type safety helps prevent errors
You need clarity about available parameters and their types
You're collaborating with a team and want more self-documenting code
Both approaches use the same underlying implementation, so you can mix and match them as needed.
Column Type Categories
Data Designer columns fall into these main categories:
Sampling-based columns: Generate data through statistical sampling methods
Expression columns: Generate data by evaluating expressions
LLM-based columns: Generate data using large language models
Sampling-Based Column Types
Category
Creates categorical values from a defined set of options.
Simplified API:
aidd.add_column(
name="product_category",
type="category",
params={
"values": ["Electronics", "Clothing", "Home Goods", "Books"],
"weights": [0.4, 0.3, 0.2, 0.1], # Optional: probability weights
"description": "Product category classification" # Optional
}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="product_category",
type=P.SamplerType.CATEGORY,
params=P.CategorySamplerParams(
values=["Electronics", "Clothing", "Home Goods", "Books"],
weights=[0.4, 0.3, 0.2, 0.1], # Optional: probability weights
description="Product category classification" # Optional
)
)
)
Subcategory
Creates values associated with a parent category.
Simplified API:
aidd.add_column(
name="product_subcategory",
type="subcategory",
params={
"category": "product_category", # Parent category column
"values": {
"Electronics": ["Smartphones", "Laptops", "Headphones"],
"Clothing": ["Shirts", "Pants", "Dresses", "Shoes"],
"Home Goods": ["Kitchen", "Bathroom", "Bedroom"]
}
}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="product_subcategory",
type=P.SamplerType.SUBCATEGORY,
params=P.SubcategoryParams(
category="product_category", # Parent category column
values={
"Electronics": ["Smartphones", "Laptops", "Headphones"],
"Clothing": ["Shirts", "Pants", "Dresses", "Shoes"],
"Home Goods": ["Kitchen", "Bathroom", "Bedroom"]
}
)
)
)
UUID
Generates unique identifiers.
Simplified API:
aidd.add_column(
name="order_id",
type="uuid",
params={
"prefix": "ORD-", # Optional: adds a prefix
"short_form": True, # Optional: uses a shorter format
"uppercase": True # Optional: uses uppercase letters
}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="order_id",
type=P.SamplerType.UUID,
params=P.UUIDSamplerParams(
prefix="ORD-", # Optional: adds a prefix
short_form=True, # Optional: uses a shorter format
uppercase=True # Optional: uses uppercase letters
)
)
)
Numerical Samplers
Uniform Distribution
Simplified API:
aidd.add_column(
name="product_rating",
type="uniform",
params={"low": 1, "high": 5},
convert_to="int" # Optional: converts to integer
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="product_rating",
type=P.SamplerType.UNIFORM,
params=P.UniformSamplerParams(low=1, high=10),
convert_to="int" # Optional: converts to integer
)
)
Gaussian Distribution
Simplified API:
aidd.add_column(
name="item_weight",
type="gaussian",
params={"mean": 50, "stddev": 10}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="item_weight",
type=P.SamplerType.GAUSSIAN,
params=P.GaussianSamplerParams(mean=50, stddev=10)
)
)
Poisson Distribution
Simplified API:
aidd.add_column(
name="number_of_pets",
type="poisson",
params={"mean": 2}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="number_of_pets",
type=P.SamplerType.POISSON,
params=P.PoissonSamplerParams(mean=2)
)
)
Bernoulli Distribution
Simplified API:
aidd.add_column(
name="is_in_stock",
type="bernoulli",
params={"p": 0.8}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="is_in_stock",
type=P.SamplerType.BERNOULLI,
params=P.BernoulliSamplerParams(p=0.8)
)
)
Bernoulli Mixture Distribution
Simplified API:
aidd.add_column(
name="bern_exp",
type="bernoulli_mixture",
params={"p": 0.4, "dist_name": "expon", "dist_params": {"scale": 10}}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="bern_exp",
type=P.SamplerType.BERNOULLI_MIXTURE,
params=P.BernoulliMixtureSamplerParams(p=0.8, dist_name="expon", dist_params={"scale": 10})
)
)
Binomial Distribution
Simplified API:
aidd.add_column(
name="items_returned",
type="binomial",
params={"n": 10, "p": 0.1}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="items_returned",
type=P.SamplerType.BINOMIAL,
params=P.BinomialSamplerParams(n=10, p=0.1)
)
)
Scipy Sampler
Use this sampler to access any statistical methods available in scipy.stats
Simplified API:
aidd.add_column(
name="log_gaussian",
type="scipy",
params={
"dist_name": "lognorm",
"dist_params": {
"s": 0.9, # sigma
"scale": 8, # exp(mean)
}
}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="log_gaussian",
type=P.SamplerType.SCIPY,
params=P.ScipySamplerParams(dist_name="lognorm", dist_params={"s": 0.9, "scale": 8})
)
)
Date and Time
DateTime
Simplified API:
aidd.add_column(
name="order_date",
type="datetime",
params={"start": "2023-01-01", "end": "2023-12-31"}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="order_date",
type=P.SamplerType.DATETIME,
params=P.DatetimeSamplerParams(start="2023-01-01", end="2023-12-31")
)
)
TimeDelta
Simplified API:
aidd.add_column(
name="delivery_date",
type="timedelta",
params={
"dt_min": 1, # Minimum days
"dt_max": 7, # Maximum days
"reference_column_name": "order_date" # Reference date column
}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="delivery_date",
type=P.SamplerType.TIMEDELTA,
params=P.TimedeltaSamplerParams(
dt_min=1, # Minimum days
dt_max=7, # Maximum days
reference_column_name="order_date" # Reference date column
)
)
)
Person
Defines person samplers that create realistic person entities.
Simplified API:
# Define person samplers
aidd.with_person_samplers(
{
"customer": {
"sex": "Female", # Optional
"locale": "en_US" # Optional
}
}
)
Typed API:
# Define person samplers
aidd.with_person_samplers(
{
"customer": P.PersonSamplerParams(
sex="Female", # Optional
locale="en_US", # Optional
)
}
)
Expression Columns
The Expression column type computes values using expressions involving other columns.
Basic Expressions
Simplified API:
add.add_column(
name="final_price",
type="expression",
expr="{{ base_price }} * {{ new_price }}" # Or "{{ base_price * new_price }}"
)
Typed API:
aidd.add_column(
C.ExpressionColumn(
name="total_price",
expr="{{quantity}} * {{unit_price}}" # Or "{{ base_price * new_price }}"
)
)
Person Attribute Expressions
Simplified API:
aidd.with_person_samplers(
{
"customer": P.PersonSamplerParams(),
},
)
aidd.add_column(
name="customer_full_name",
type="expression",
params={"expr": "{{ customer.first_name }} {{ customer.last_name }}"}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="person", # This creates a nested object with all person attributes
type=P.SamplerType.PERSON,
params=P.PersonSamplerParams(
locale="en_US",
age_range=[22, 65],
state="CA"
)
)
)
LLM-Based Column Types
LLM Generated Content
Generates text data using large language models based on prompts.
There are three types of llm columns, llm-text
, llm-code
, llm-structured
The default type is llm-text
, if you are generating code using an LLM, use the type llm-code
, and use output_format
to provide the code language for formatting. If you are defining structured outputs for the LLM responses, use llm-structured
, and provide a Pydantic or JSON schema to the output_format
argument.
Simplified API:
aidd.add_column(
name="product_description",
type="llm-text" # "llm-code", "llm-structured"
model_alias="text" # Optional (default: text)
prompt="Generate a detailed description for a {{product_category}} product.",
system_prompt="You are a professional product copywriter.", # Optional
# output_format=".." # Optional
)
Typed API:
aidd.add_column(
C.LLMGenColumn(
name="product_description",
output_type="text" # "code", "structured"
model_alias="text",
prompt="Generate a detailed description for a {{product_category}} product.",
system_prompt="You are a professional product copywriter.", # Optional
# output_format=".." # Optional
)
)
For details on using Structured Outputs for LLM generated content, read this section.
Data Designer supports text
, code
, and judge
as default model aliases, if using the llm-judge
by default the column will use the judge
alias. You can define your own custom model aliases with the generation parameters you want, learn more about how to do that in the model configuration section.
LLM Judge
Evaluates data quality using large language models.
Simplified API:
from gretel_client.data_designer.judge_rubrics import TEXT_TO_PYTHON_LLM_JUDGE_PROMPT_TEMPLATE, PYTHON_RUBRICS
aidd.add_column(
name="code_quality",
type="llm-judge",
prompt=TEXT_TO_PYTHON_LLM_JUDGE_PROMPT_TEMPLATE,
rubrics=PYTHON_RUBRICS
)
Typed API:
aidd.add_column(
C.LLMJudgeColumn(
name="code_quality",
prompt=TEXT_TO_PYTHON_LLM_JUDGE_PROMPT_TEMPLATE,
rubrics=PYTHON_RUBRICS
)
)
Code Validation
Validates code in another column.
Simplified API:
aidd.add_column(
name="code_validation_result",
type="code-validation",
code_lang="python", # Language to validate
target_column="code_implementation" # Column containing code
)
Typed API:
aidd.add_column(
C.CodeValidationColumn(
name="code_validation_result",
code_lang="python", # Language to validate
target_column="code_implementation" # Column containing code
)
)
Using Conditional Parameters
The Data Designer supports conditional parameters that change based on other column values:
Simplified API:
aidd.add_column(
name="pet_type",
type="category",
params={"values": ["dog", "cat", "fish"], "weights": [0.5, 0.3, 0.2]},
conditional_params={
"number_of_pets == 0": {"values": ["none"]}
}
)
Typed API:
aidd.add_column(
C.SamplerColumn(
name="pet_type",
type=P.SamplingSourceType.CATEGORY,
params=P.CategorySamplerParams(values=["dog", "cat", "fish"], weights=[0.5, 0.3, 0.2]),
conditional_params={
"number_of_pets == 0": P.CategorySamplerParams(values=["none"])
}
)
)
Reference Table
"category"
P.SamplerType.CATEGORY
Categorical values
"subcategory"
P.SamplerType.SUBCATEGORY
Dependent categories
"uuid"
P.SamplerType.UUID
Unique identifiers
"uniform"
P.SamplerType.UNIFORM
Uniform distribution
"gaussian"
P.SamplerType.GAUSSIAN
Normal distribution
"poisson"
P.SamplerType.POISSON
Poisson distribution
"bernoulli"
P.SamplerType.BERNOULLI
Binary outcomes
"binomial"
P.SamplerType.BINOMIAL
Number of successes
"datetime"
P.SamplerType.DATETIME
Date/time values
"timedelta"
P.SamplerType.TIMEDELTA
Time intervals
"expression"
C.ExpressionColumn
Computed expressions
"llm-text"
C.LLMTextColumn
LLM-generated text content
"llm-structured"
C.LLMStructuredColumn
LLM-generated structured content
"llm-code"
C.LLMCodeColumn
LLM-generated code content
"llm-judge"
C.LLMJudgeColumn
LLM-based evaluation
"code-validation"
C.CodeValidationColumn
Code validation
Choosing the Right Approach
Data Designer offers flexibility in how you define your columns. Both approaches are fully supported, so you can choose the style that best fits your needs.
Key points to remember:
Same functionality: Both approaches provide access to the same features
Interchangeable: You can mix both styles in the same project
Simplified == concise: The simplified API is more concise
Typed == safer: The typed API offers better IDE support and type checking
For quick experiments, the simplified API might be more convenient. For larger projects, the additional safety of the typed API can help prevent errors.
Last updated
Was this helpful?