Data Designer allows you to apply constraints to columns, ensuring that generated values meet specific criteria. This guide explains the types of constraints available and how to use them.
Overview of Column Constraints
Constraints are rules applied to columns that restrict the range or type of values they can contain. They are particularly useful for:
Ensuring numerical values stay within specific ranges
Enforcing relationships between columns
Validating that generated data meets business rules
The primary ways to establish constraints in Data Designer is using the add_constraint method for explicit constraint rules.
Constraints Work Only with Numerical Samplers
Constraints in Data Designer are currently only supported for numerical sampler methods (such as gaussian, uniform, poisson). Constraints do not work with expression columns or categorical samplers.
For non-numerical columns or complex logic, you should use conditional logic in expressions using Jinja templates as shown in the examples below.
Adding Explicit Constraints
To add a constraint to a column, use the add_constraint method:
aidd.add_constraint(
target_column="column_name", # The column to constrain
type="constraint_type", # Type of constraint
params={} # Constraint parameters
)
Types of Constraints
Scalar Inequality Constraints
These constraints enforce that a column's values meet an inequality comparison with a fixed value:
# Ensure 'age' is at least 18
aidd.add_constraint(
target_column="age",
type="scalar_inequality",
params={"operator": "ge", "rhs": 18}
)
# Ensure 'price' is less than 1000
aidd.add_constraint(
target_column="price",
type="scalar_inequality",
params={"operator": "lt", "rhs": 1000}
)
Supported operators:
lt, le, gt, ge
gt: Greater than
ge: Greater than or equal to
lt: Less than
le: Less than or equal to
Column Inequality Constraints
These constraints enforce relationships between two columns:
# Ensure 'end_date' is after 'start_date'
aidd.add_constraint(
target_column="end_date",
type="column_inequality",
params={"operator": "gt", "rhs": "start_date"} # Name of the column
)
# Ensure 'discount_price' is less than 'original_price'
aidd.add_constraint(
target_column="discount_price",
type="column_inequality",
params={"operator": "lt", "rhs": "original_price"} # Name of the column
)
Practical Examples
Age Constraints for Different Customer Segments
# Add customer segment column
aidd.add_column(
name="customer_segment",
type="category",
params={"values": ["Youth", "Adult", "Senior"]}
)
# Add age column based on gaussian distribution
aidd.add_column(
name="age",
type="gaussian",
params={"mean": 40, "stddev": 15},
convert_to="int"
)
# Add constraints based on segment
aidd.add_constraint(
target_column="age",
type="scalar_inequality",
params={"operator": "ge", "rhs": 1} # Base constraint - age must be positive
)
# We'll handle the segment-specific constraints in the LLM generation step
aidd.add_column(
name="customer_profile",
type="llm-text",
prompt="""
Create a brief, realistic customer profile for a {{customer_segment}} who is {{age}} years old.
The profile should include:
- Typical interests and hobbies
- General spending habits
- Technology usage patterns
- Shopping preferences
"""
)
Date Range Constraints
# Add order date column
aidd.add_column(
name="order_date",
type="datetime",
params={"start": "2023-01-01", "end": "2023-12-31"}
)
aidd.add_column(
name="delivery_date",
type="datetime",
params={"start": "2023-01-01", "end": "2024-12-31"}
)
# Ensure the delivery date is after the order date
aidd.add_constraint(
target_column="delivery_date",
type="column_inequality",
params={"operator": "gt", "rhs": "order_date"}
)
Inventory Management Example
# Add inventory level column
aidd.add_column(
name="inventory_level",
type="poisson",
params={"mean": 100}
)
# Add reorder threshold column
aidd.add_column(
name="reorder_threshold",
type="gaussian",
params={"mean": 25, "stddev": 5},
convert_to="int"
)
# Add reorder amount column
aidd.add_column(
name="reorder_amount",
type="gaussian",
params={"mean": 50, "stddev": 10},
convert_to="int"
)
# Ensure reorder threshold is positive
aidd.add_constraint(
target_column="reorder_threshold",
type="scalar_inequality",
params={"operator": "gt", "rhs": 0}
)
# Ensure reorder amount is positive
aidd.add_constraint(
target_column="reorder_amount",
type="scalar_inequality",
params={"operator": "gt", "rhs": 0}
)
# Add inventory status based on constraints
aidd.add_column(
name="inventory_status",
type="llm-text",
prompt="""
Inventory Level: {inventory_level}
Reorder Threshold: {reorder_threshold}
{% if inventory_level < reorder_threshold %}
Please generate a reorder notice indicating that inventory is below the threshold.
Recommend ordering {reorder_amount} units.
{% elif inventory_level < reorder_threshold * 2 %}
Inventory is adequate but approaching the reorder threshold.
No immediate action needed, but monitor closely.
{% else %}
Inventory levels are healthy and well above the reorder threshold.
{% endif %}
"""
)
Deleting Constraints
If you need to remove a constraint, use the delete_constraint method: