Data Designer supports various column types that determine how data is generated. This guide explains the different column types available and how to use them.
Two Ways to Define Columns
Data Designer offers two approaches to define columns:
Simplified API: Direct parameter passing with string type names
Typed API: More verbose but provides better type checking and IDE support
Both approaches offer the same functionality - choose the style that works best for your needs.
Simplified API Example
The simplified approach is concise and easy to use:
The typed API provides better code completion and type checking:
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P
# Typed API approach
aidd.add_column(
C.SamplerColumn(
name="product_category",
type=P.SamplerType.CATEGORY,
params=P.CategorySamplerParams(values=["Electronics", "Clothing", "Home Goods"])
)
)
When to Use Each Approach
Choose the Simplified API when:
You prefer concise, readable code
You're working on quick prototypes or simple designs
You don't need IDE autocompletion for parameters
Choose the Typed API when:
You want code completion and type checking in your IDE
You're working on complex designs where type safety helps prevent errors
You need clarity about available parameters and their types
You're collaborating with a team and want more self-documenting code
Both approaches use the same underlying implementation, so you can mix and match them as needed.
Column Type Categories
Data Designer columns fall into these main categories:
Sampling-based columns: Generate data through statistical sampling methods
Expression columns: Generate data by evaluating expressions
LLM-based columns: Generate data using large language models
Sampling-Based Column Types
Category
Creates categorical values from a defined set of options.
aidd.add_column(
C.SamplerColumn(
name="person", # This creates a nested object with all person attributes
type=P.SamplerType.PERSON,
params=P.PersonSamplerParams(
locale="en_US",
age_range=[22, 65],
state="CA"
)
)
)
LLM-Based Column Types
LLM Generated Content
Generates text data using large language models based on prompts.
There are three types of llm columns, llm-text, llm-code, llm-structured
The default type is llm-text, if you are generating code using an LLM, use the type llm-code, and use output_formatto provide the code language for formatting. If you are defining structured outputs for the LLM responses, use llm-structured, and provide a Pydantic or JSON schema to the output_formatargument.
Simplified API:
aidd.add_column(
name="product_description",
type="llm-text" # "llm-code", "llm-structured"
model_alias="text" # Optional (default: text)
prompt="Generate a detailed description for a {{product_category}} product.",
system_prompt="You are a professional product copywriter.", # Optional
# output_format=".." # Optional
)
Typed API:
aidd.add_column(
C.LLMGenColumn(
name="product_description",
output_type="text" # "code", "structured"
model_alias="text",
prompt="Generate a detailed description for a {{product_category}} product.",
system_prompt="You are a professional product copywriter.", # Optional
# output_format=".." # Optional
)
)
Data Designer supports text , code , and judge as default model aliases, if using the llm-judge by default the column will use the judge alias. You can define your own custom model aliases with the generation parameters you want, learn more about how to do that in the model configuration section.
LLM Judge
Evaluates data quality using large language models.
Data Designer offers flexibility in how you define your columns. Both approaches are fully supported, so you can choose the style that best fits your needs.
Key points to remember:
Same functionality: Both approaches provide access to the same features
Interchangeable: You can mix both styles in the same project
Simplified == concise: The simplified API is more concise
Typed == safer: The typed API offers better IDE support and type checking
For quick experiments, the simplified API might be more convenient. For larger projects, the additional safety of the typed API can help prevent errors.