Structured Outputs

Data Designer provides powerful capabilities for generating structured, complex data objects rather than just simple text. This guide explains how to use structured outputs in your data generation workflows.

What Are Structured Outputs?

Structured outputs allow you to generate data with specific formats, schemas, and nested relationships. Instead of generating free-form text, you can generate JSON objects, Python data structures, or any other data conforming to a specific schema.

Use cases include:

  • Complex nested records (e.g., orders with line items)

  • Data with specific validation rules

  • Nested arrays and objects

  • Structured conversation data

Defining Data Models with Pydantic

The most common way to define structured outputs is using Pydantic models:

from pydantic import BaseModel, Field

# Define a simple product model
class Product(BaseModel):
    name: str = Field(..., description="Name of the product")
    price: float = Field(..., description="Price in USD")
    category: str = Field(..., description="Product category")
    in_stock: bool = Field(..., description="Whether the product is in stock")

# Define an order with nested products
class Order(BaseModel):
    order_id: str = Field(..., description="Unique order identifier")
    customer_name: str = Field(..., description="Name of the customer")
    order_date: str = Field(..., description="Date the order was placed")
    total_amount: float = Field(..., description="Total order amount")
    products: list[Product] = Field(..., description="List of products in the order")
    shipping_address: dict = Field(..., description="Shipping address")

Using Structured Outputs in Data Designer

Once you've defined your data models, you can use them in your Data Designer columns:

# Generate structured order data
aidd.add_column(
    C.LLMStructuredColumn(
        name="order_data",
        prompt="""
            Generate a realistic order for a customer named {{customer.first_name}} {{customer.last_name}}
            from {{customer.city}}, {{customer.state}}.
            Include between 1 and 5 products in the order.
            The order should be placed on a date between 2023-01-01 and 2023-12-31.
        """,
        output_format=Order
    )
)

Alternative: Using JSON Schema

If you prefer, you can also use JSON Schema directly:

# Using JSON schema directly
aidd.add_column(
    C.LLMStructuredColumn(
        prompt="Generate a realistic customer order with products.",
        output_format={
            "type": "structured", 
            "params": {
                "json_schema": {
                    "type": "object",
                    "properties": {
                        "order_id": {"type": "string"},
                        "customer_name": {"type": "string"},
                        "order_date": {"type": "string", "format": "date"},
                        "total_amount": {"type": "number"},
                        "products": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "name": {"type": "string"},
                                    "price": {"type": "number"},
                                    "category": {"type": "string"},
                                    "in_stock": {"type": "boolean"}
                                },
                                "required": ["name", "price", "category", "in_stock"]
                            }
                        },
                        "shipping_address": {"type": "object"}
                    },
                    "required": ["order_id", "customer_name", "order_date", "total_amount", "products"]
                }
            }
        }
    )
)

Example: Generate a Fruit Salad

from gretel_client.navigator_client import Gretel
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P
## Create our DD Instance
gretel = Gretel(api_key="prompt")

aidd = gretel.data_designer.new(
    model_suite="apache-2.0"  # Use apache-2.0 or llama-3.x based on your licensing needs
)

Add a category column for diversity

aidd.add_column(
    C.SamplerColumn(
        name="region",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(
            values=["Thailand", "France", "South Africa"]
        )
    )
)

Define the PyDantic structure

from pydantic import BaseModel, Field

class Fruit(BaseModel):
    name: str = Field(..., description="Name of the fruit.")
    cost: float = Field(..., description="Dollar value of the fruit.")
    weight: float = Field(..., description="Weight in lbs.")
    flavor: str = Field(..., description="Primary flavor profile of the fruit.")
    preparation: str = Field(..., description="How to prepare the fruit for a fruit salad.")


class FruitSalad(BaseModel):
    total_cost: float = Field(..., description="Total cost of all fruits.")
    name: str = Field(..., description="Name of this unique fruit salad.")
    haiku: str = Field(..., description="A beautiful haiku about this fruit salad.")
    fruits: list[Fruit]

Define our fruit salad column

## Tell DD to generate some fruit salads
aidd.add_column(
    C.LLMStructuredColumn(
        name="fruit_salad",
        prompt=(
            "Create a description of fruits to go in a regional fruit salad from {{region}}!"
        ),
        output_format=FruitSalad
    )
)

Preview the data!

preview = aidd.preview()
preview.display_sample_record() 
                                                                                                                 
                                                 Generated Columns                                                 
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name        ┃ Value                                                                                             ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ region      │ France                                                                                            │
├─────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ fruit_salad │ {                                                                                                 │
│             │     'total_cost': 12.5,                                                                           │
│             │     'name': 'French Regional Fruit Salad',                                                        │
│             │     'haiku': 'Peaches and plums bloom,\nApricots whisper in sun,\nFlavors of Provence.',          │
│             │     'fruits': [                                                                                   │
│             │         {                                                                                         │
│             │             'name': 'Peach',                                                                      │
│             │             'cost': 3.5,                                                                          │
│             │             'weight': 2,                                                                          │
│             │             'flavor': 'Sweet and juicy',                                                          │
│             │             'preparation': 'Slice the peaches into thin rounds and gently toss them with a bit of │
│             │ lemon juice to prevent browning.'                                                                 │
│             │         },                                                                                        │
│             │         {                                                                                         │
│             │             'name': 'Plum',                                                                       │
│             │             'cost': 4,                                                                            │
│             │             'weight': 1.5,                                                                        │
│             │             'flavor': 'Slightly tart and sweet',                                                  │
│             │             'preparation': 'Halve the plums and remove the pits before adding them to the salad.  │
│             │ Sprinkle with a touch of sugar for balance.'                                                      │
│             │         },                                                                                        │
│             │         {                                                                                         │
│             │             'name': 'Apricot',                                                                    │
│             │             'cost': 5,                                                                            │
│             │             'weight': 1,                                                                          │
│             │             'flavor': 'Rich and tangy',                                                           │
│             │             'preparation': 'Quarter the apricots and remove the pits. They can be added to the    │
│             │ salad whole or sliced.'                                                                           │
│             │         }                                                                                         │
│             │     ]                                                                                             │
│             │ }                                                                                                 │
└─────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────┘

Best Practices

  1. Choose the Right Format: Select the format that best fits your data needs

  2. Use Validation: For structured outputs, define clear schemas

  3. Provide Clear Prompts: Be specific about the expected format and content

  4. Reference Previous Columns: Use Jinja templates to reference data from other columns

  5. Test Incrementally: Preview your results to ensure the output matches expectations

Last updated

Was this helpful?