Define your Data Columns
Introduction to Column Definition
In Data Designer, columns are the fundamental building blocks that determine what data you'll generate and how it will be structured. This guide introduces the key concepts for defining columns that produce high-quality synthetic data.
The Column Definition Process
The Data Designer workflow revolves around defining columns that work together to produce realistic data. Each column definition specifies:
What type of data to generate (statistical distributions, categories, AI-generated text, etc.)
How to generate the data (parameters, prompts, dependencies)
Relationships with other columns (constraints, dependencies)
Key Concepts in Column Definition
Column Types
Data Designer supports a rich variety of column types, from simple statistical distributions to complex AI-generated content:
Sampling-based columns: Generate data through statistical methods (categories, numbers, dates)
LLM-based columns: Generate realistic text and structured content using large language models
Learn more about column types →
Column Constraints
Constraints allow you to control the values your columns can contain, enforcing business rules and maintaining data consistency:
Scalar constraints: Restrict numerical values to specific ranges
Column relationships: Ensure logical relationships between columns
Learn more about column constraints →
Designing an Effective Column Strategy
Best Practices for Column Definition
Start with seed data: Define your categorical and structural columns first
Build relationships: Create dependencies between related columns
Layer complexity: Begin with basic columns, then add more sophisticated ones
Preview frequently: Use
aidd.preview()
to validate your design iteratively
Last updated
Was this helpful?