LogoLogo
  • Welcome to Gretel!
  • Gretel Basics
    • Getting Started
      • Quickstart
      • Blueprints
      • Use Case Examples
      • Environment Setup
        • Console
        • SDK
      • Projects
      • Inputs and Outputs
      • Gretel Connectors
        • Object Storage
          • Amazon S3
          • Google Cloud Storage
          • Azure Blob
        • Database
          • MySQL
          • PostgreSQL
          • MS SQL Server
          • Oracle Database
        • Data Warehouse
          • Snowflake
          • BigQuery
          • Databricks
        • Gretel Project
    • Release Notes
      • Platform Release Notes
        • May 2025
        • April 2025
        • March 2025
        • February 2025
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
        • July 2024
        • June 2024
      • Console Release Notes
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
      • Python SDKs
  • Create Synthetic Data
    • Gretel Safe Synthetics
      • Transform
        • Reference
        • Examples
        • Supported Entities
      • Synthetics
        • Gretel Tabular Fine-Tuning
        • Gretel Text Fine-Tuning
        • Gretel Tabular GAN
        • Benchmark Report
        • Privacy Protection
      • Evaluate
        • Synthetic Quality & Privacy Report
        • Tips to Improve Synthetic Data Quality
        • Data Privacy 101
      • SDK
    • Gretel Data Designer
      • Getting Started with Data Designer
      • Define your Data Columns
        • Column Types
        • Add Constraints to Columns
        • Custom Model Configurations
        • Upload Files as Seeds
      • Building your Dataset
        • Seeding your Dataset
        • Generating Data
      • Generate Realistic Personal Details
      • Structured Outputs
      • Code Validation
      • Data Evaluation
      • Magic Assistance
      • Using Jinja Templates
  • Gretel Playground [Legacy]
    • Getting Started
    • Prompts Tips & Best Practices
    • FAQ
    • SDK Examples
    • Tutorials
    • Videos
    • Gretel Playground [Legacy] Inference API
    • Batch Job SDK
  • Reference
    • Gretel's Python Client
    • Gretel’s Open Source Synthetic Engine
    • Gretel’s REST API
    • Homepage
    • Model Suites
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Create Synthetic Data
  2. Gretel Data Designer
  3. Define your Data Columns

Add Constraints to Columns

Column Constraints in Data Designer

Data Designer allows you to apply constraints to columns, ensuring that generated values meet specific criteria. This guide explains the types of constraints available and how to use them.

Overview of Column Constraints

Constraints are rules applied to columns that restrict the range or type of values they can contain. They are particularly useful for:

  • Ensuring numerical values stay within specific ranges

  • Enforcing relationships between columns

  • Validating that generated data meets business rules

The primary ways to establish constraints in Data Designer is using the add_constraint method for explicit constraint rules.

Constraints Work Only with Numerical Samplers

Constraints in Data Designer are currently only supported for numerical sampler methods (such as gaussian, uniform, poisson). Constraints do not work with expression columns or categorical samplers.

For non-numerical columns or complex logic, you should use conditional logic in expressions using Jinja templates as shown in the examples below.

Adding Explicit Constraints

To add a constraint to a column, use the add_constraint method:

aidd.add_constraint(
    target_column="column_name",  # The column to constrain
    type="constraint_type",       # Type of constraint
    params={}                     # Constraint parameters
)

Types of Constraints

Scalar Inequality Constraints

These constraints enforce that a column's values meet an inequality comparison with a fixed value:

# Ensure 'age' is at least 18
aidd.add_constraint(
    target_column="age",
    type="scalar_inequality",
    params={"operator": "ge", "rhs": 18}
)

# Ensure 'price' is less than 1000
aidd.add_constraint(
    target_column="price",
    type="scalar_inequality",
    params={"operator": "lt", "rhs": 1000}
)

Supported operators:

  • lt, le, gt, ge

  • gt: Greater than

  • ge: Greater than or equal to

  • lt: Less than

  • le: Less than or equal to

Column Inequality Constraints

These constraints enforce relationships between two columns:

# Ensure 'end_date' is after 'start_date'
aidd.add_constraint(
    target_column="end_date",
    type="column_inequality",
    params={"operator": "gt", "rhs": "start_date"} # Name of the column
)

# Ensure 'discount_price' is less than 'original_price'
aidd.add_constraint(
    target_column="discount_price",
    type="column_inequality",
    params={"operator": "lt", "rhs": "original_price"} # Name of the column
)

Practical Examples

Age Constraints for Different Customer Segments

# Add customer segment column
aidd.add_column(
    name="customer_segment",
    type="category",
    params={"values": ["Youth", "Adult", "Senior"]}
)

# Add age column based on gaussian distribution
aidd.add_column(
    name="age",
    type="gaussian",
    params={"mean": 40, "stddev": 15},
    convert_to="int"
)

# Add constraints based on segment
aidd.add_constraint(
    target_column="age",
    type="scalar_inequality",
    params={"operator": "ge", "rhs": 1}  # Base constraint - age must be positive
)

# We'll handle the segment-specific constraints in the LLM generation step
aidd.add_column(
  name="customer_profile",
  type="llm-text",
  prompt="""
  Create a brief, realistic customer profile for a {{customer_segment}} who is {{age}} years old. 
  
  The profile should include:
  - Typical interests and hobbies
  - General spending habits
  - Technology usage patterns
  - Shopping preferences
  """
)

Date Range Constraints

# Add order date column
aidd.add_column(
    name="order_date",
    type="datetime",
    params={"start": "2023-01-01", "end": "2023-12-31"}
)

aidd.add_column(
    name="delivery_date",
    type="datetime",
    params={"start": "2023-01-01", "end": "2024-12-31"}
)

# Ensure the delivery date is after the order date
aidd.add_constraint(
    target_column="delivery_date",
    type="column_inequality",
    params={"operator": "gt", "rhs": "order_date"}
)

Inventory Management Example

# Add inventory level column
aidd.add_column(
    name="inventory_level",
    type="poisson",
    params={"mean": 100}
)

# Add reorder threshold column
aidd.add_column(
    name="reorder_threshold",
    type="gaussian",
    params={"mean": 25, "stddev": 5},
    convert_to="int"
)

# Add reorder amount column
aidd.add_column(
    name="reorder_amount",
    type="gaussian",
    params={"mean": 50, "stddev": 10},
    convert_to="int"
)

# Ensure reorder threshold is positive
aidd.add_constraint(
    target_column="reorder_threshold",
    type="scalar_inequality",
    params={"operator": "gt", "rhs": 0}
)

# Ensure reorder amount is positive
aidd.add_constraint(
    target_column="reorder_amount",
    type="scalar_inequality",
    params={"operator": "gt", "rhs": 0}
)

# Add inventory status based on constraints
aidd.add_column(
    name="inventory_status",
    type="llm-text",
    prompt="""
    Inventory Level: {inventory_level}
    Reorder Threshold: {reorder_threshold}
    
    
{% if inventory_level < reorder_threshold %}
    Please generate a reorder notice indicating that inventory is below the threshold.
    Recommend ordering {reorder_amount} units.
    {% elif inventory_level < reorder_threshold * 2 %}
    Inventory is adequate but approaching the reorder threshold.
    No immediate action needed, but monitor closely.
    {% else %}
    Inventory levels are healthy and well above the reorder threshold.
    {% endif %}
    """
)

Deleting Constraints

If you need to remove a constraint, use the delete_constraint method:

aidd.delete_constraint(target_column="column_name")

Constraint Limitations

  • Constraints are currently primarily supported for numerical samplers

  • Complex constraints may require handling in LLM prompts using Jinja templates

  • If a constraint is impossible to satisfy, the data generation may fail

PreviousColumn TypesNextCustom Model Configurations

Last updated 29 days ago

Was this helpful?