LogoLogo
  • Welcome to Gretel!
  • Gretel Basics
    • Getting Started
      • Quickstart
      • Blueprints
      • Use Case Examples
      • Environment Setup
        • Console
        • SDK
      • Projects
      • Inputs and Outputs
      • Gretel Connectors
        • Object Storage
          • Amazon S3
          • Google Cloud Storage
          • Azure Blob
        • Database
          • MySQL
          • PostgreSQL
          • MS SQL Server
          • Oracle Database
        • Data Warehouse
          • Snowflake
          • BigQuery
          • Databricks
        • Gretel Project
    • Release Notes
      • Platform Release Notes
        • May 2025
        • April 2025
        • March 2025
        • February 2025
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
        • July 2024
        • June 2024
      • Console Release Notes
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
      • Python SDKs
  • Create Synthetic Data
    • Gretel Safe Synthetics
      • Transform
        • Reference
        • Examples
        • Supported Entities
      • Synthetics
        • Gretel Tabular Fine-Tuning
        • Gretel Text Fine-Tuning
        • Gretel Tabular GAN
        • Benchmark Report
        • Privacy Protection
      • Evaluate
        • Synthetic Quality & Privacy Report
        • Tips to Improve Synthetic Data Quality
        • Data Privacy 101
      • SDK
    • Gretel Data Designer
      • Getting Started with Data Designer
      • Define your Data Columns
        • Column Types
        • Add Constraints to Columns
        • Custom Model Configurations
        • Upload Files as Seeds
      • Building your Dataset
        • Seeding your Dataset
        • Generating Data
      • Generate Realistic Personal Details
      • Structured Outputs
      • Code Validation
      • Data Evaluation
      • Magic Assistance
      • Using Jinja Templates
  • Gretel Playground [Legacy]
    • Getting Started
    • Prompts Tips & Best Practices
    • FAQ
    • SDK Examples
    • Tutorials
    • Videos
    • Gretel Playground [Legacy] Inference API
    • Batch Job SDK
  • Reference
    • Gretel's Python Client
    • Gretel’s Open Source Synthetic Engine
    • Gretel’s REST API
    • Homepage
    • Model Suites
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Create Synthetic Data
  2. Gretel Data Designer

Structured Outputs

Data Designer provides powerful capabilities for generating structured, complex data objects rather than just simple text. This guide explains how to use structured outputs in your data generation workflows.

What Are Structured Outputs?

Structured outputs allow you to generate data with specific formats, schemas, and nested relationships. Instead of generating free-form text, you can generate JSON objects, Python data structures, or any other data conforming to a specific schema.

Use cases include:

  • Complex nested records (e.g., orders with line items)

  • Data with specific validation rules

  • Nested arrays and objects

  • Structured conversation data

Defining Data Models with Pydantic

The most common way to define structured outputs is using Pydantic models:

from pydantic import BaseModel, Field

# Define a simple product model
class Product(BaseModel):
    name: str = Field(..., description="Name of the product")
    price: float = Field(..., description="Price in USD")
    category: str = Field(..., description="Product category")
    in_stock: bool = Field(..., description="Whether the product is in stock")

# Define an order with nested products
class Order(BaseModel):
    order_id: str = Field(..., description="Unique order identifier")
    customer_name: str = Field(..., description="Name of the customer")
    order_date: str = Field(..., description="Date the order was placed")
    total_amount: float = Field(..., description="Total order amount")
    products: list[Product] = Field(..., description="List of products in the order")
    shipping_address: dict = Field(..., description="Shipping address")

Using Structured Outputs in Data Designer

Once you've defined your data models, you can use them in your Data Designer columns:

# Generate structured order data
aidd.add_column(
    C.LLMStructuredColumn(
        name="order_data",
        prompt="""
            Generate a realistic order for a customer named {{customer.first_name}} {{customer.last_name}}
            from {{customer.city}}, {{customer.state}}.
            Include between 1 and 5 products in the order.
            The order should be placed on a date between 2023-01-01 and 2023-12-31.
        """,
        output_format=Order
    )
)

Alternative: Using JSON Schema

If you prefer, you can also use JSON Schema directly:

# Using JSON schema directly
aidd.add_column(
    C.LLMStructuredColumn(
        prompt="Generate a realistic customer order with products.",
        output_format={
            "type": "structured", 
            "params": {
                "json_schema": {
                    "type": "object",
                    "properties": {
                        "order_id": {"type": "string"},
                        "customer_name": {"type": "string"},
                        "order_date": {"type": "string", "format": "date"},
                        "total_amount": {"type": "number"},
                        "products": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "name": {"type": "string"},
                                    "price": {"type": "number"},
                                    "category": {"type": "string"},
                                    "in_stock": {"type": "boolean"}
                                },
                                "required": ["name", "price", "category", "in_stock"]
                            }
                        },
                        "shipping_address": {"type": "object"}
                    },
                    "required": ["order_id", "customer_name", "order_date", "total_amount", "products"]
                }
            }
        }
    )
)

Example: Generate a Fruit Salad

from gretel_client.navigator_client import Gretel
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P
## Create our DD Instance
gretel = Gretel(api_key="prompt")

aidd = gretel.data_designer.new(
    model_suite="apache-2.0"  # Use apache-2.0 or llama-3.x based on your licensing needs
)

Add a category column for diversity

aidd.add_column(
    C.SamplerColumn(
        name="region",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(
            values=["Thailand", "France", "South Africa"]
        )
    )
)

Define the PyDantic structure

from pydantic import BaseModel, Field

class Fruit(BaseModel):
    name: str = Field(..., description="Name of the fruit.")
    cost: float = Field(..., description="Dollar value of the fruit.")
    weight: float = Field(..., description="Weight in lbs.")
    flavor: str = Field(..., description="Primary flavor profile of the fruit.")
    preparation: str = Field(..., description="How to prepare the fruit for a fruit salad.")


class FruitSalad(BaseModel):
    total_cost: float = Field(..., description="Total cost of all fruits.")
    name: str = Field(..., description="Name of this unique fruit salad.")
    haiku: str = Field(..., description="A beautiful haiku about this fruit salad.")
    fruits: list[Fruit]

Define our fruit salad column

## Tell DD to generate some fruit salads
aidd.add_column(
    C.LLMStructuredColumn(
        name="fruit_salad",
        prompt=(
            "Create a description of fruits to go in a regional fruit salad from {{region}}!"
        ),
        output_format=FruitSalad
    )
)

Preview the data!

preview = aidd.preview()
preview.display_sample_record() 
                                                                                                                 
                                                 Generated Columns                                                 
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name        ┃ Value                                                                                             ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ region      │ France                                                                                            │
├─────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ fruit_salad │ {                                                                                                 │
│             │     'total_cost': 12.5,                                                                           │
│             │     'name': 'French Regional Fruit Salad',                                                        │
│             │     'haiku': 'Peaches and plums bloom,\nApricots whisper in sun,\nFlavors of Provence.',          │
│             │     'fruits': [                                                                                   │
│             │         {                                                                                         │
│             │             'name': 'Peach',                                                                      │
│             │             'cost': 3.5,                                                                          │
│             │             'weight': 2,                                                                          │
│             │             'flavor': 'Sweet and juicy',                                                          │
│             │             'preparation': 'Slice the peaches into thin rounds and gently toss them with a bit of │
│             │ lemon juice to prevent browning.'                                                                 │
│             │         },                                                                                        │
│             │         {                                                                                         │
│             │             'name': 'Plum',                                                                       │
│             │             'cost': 4,                                                                            │
│             │             'weight': 1.5,                                                                        │
│             │             'flavor': 'Slightly tart and sweet',                                                  │
│             │             'preparation': 'Halve the plums and remove the pits before adding them to the salad.  │
│             │ Sprinkle with a touch of sugar for balance.'                                                      │
│             │         },                                                                                        │
│             │         {                                                                                         │
│             │             'name': 'Apricot',                                                                    │
│             │             'cost': 5,                                                                            │
│             │             'weight': 1,                                                                          │
│             │             'flavor': 'Rich and tangy',                                                           │
│             │             'preparation': 'Quarter the apricots and remove the pits. They can be added to the    │
│             │ salad whole or sliced.'                                                                           │
│             │         }                                                                                         │
│             │     ]                                                                                             │
│             │ }                                                                                                 │
└─────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────┘

Best Practices

  1. Choose the Right Format: Select the format that best fits your data needs

  2. Use Validation: For structured outputs, define clear schemas

  3. Provide Clear Prompts: Be specific about the expected format and content

  4. Reference Previous Columns: Use Jinja templates to reference data from other columns

  5. Test Incrementally: Preview your results to ensure the output matches expectations

PreviousGenerate Realistic Personal DetailsNextCode Validation

Last updated 29 days ago

Was this helpful?