Define your Data Columns

Introduction to Column Definition

In Data Designer, columns are the fundamental building blocks that determine what data you'll generate and how it will be structured. This guide introduces the key concepts for defining columns that produce high-quality synthetic data.

The Column Definition Process

The Data Designer workflow revolves around defining columns that work together to produce realistic data. Each column definition specifies:

  • What type of data to generate (statistical distributions, categories, AI-generated text, etc.)

  • How to generate the data (parameters, prompts, dependencies)

  • Relationships with other columns (constraints, dependencies)

Key Concepts in Column Definition

Column Types

Data Designer supports a rich variety of column types, from simple statistical distributions to complex AI-generated content:

  • Sampling-based columns: Generate data through statistical methods (categories, numbers, dates)

  • LLM-based columns: Generate realistic text and structured content using large language models

Learn more about column types →

Column Constraints

Constraints allow you to control the values your columns can contain, enforcing business rules and maintaining data consistency:

  • Scalar constraints: Restrict numerical values to specific ranges

  • Column relationships: Ensure logical relationships between columns

Learn more about column constraints →

Designing an Effective Column Strategy

Best Practices for Column Definition

  1. Start with seed data: Define your categorical and structural columns first

  2. Build relationships: Create dependencies between related columns

  3. Layer complexity: Begin with basic columns, then add more sophisticated ones

  4. Preview frequently: Use aidd.preview() to validate your design iteratively

Last updated

Was this helpful?