Data Designer provides powerful capabilities for generating structured, complex data objects rather than just simple text. This guide explains how to use structured outputs in your data generation workflows.
What Are Structured Outputs?
Structured outputs allow you to generate data with specific formats, schemas, and nested relationships. Instead of generating free-form text, you can generate JSON objects, Python data structures, or any other data conforming to a specific schema.
Use cases include:
Complex nested records (e.g., orders with line items)
Data with specific validation rules
Nested arrays and objects
Structured conversation data
Defining Data Models with Pydantic
The most common way to define structured outputs is using Pydantic models:
from pydantic import BaseModel, Field
# Define a simple product model
class Product(BaseModel):
name: str = Field(..., description="Name of the product")
price: float = Field(..., description="Price in USD")
category: str = Field(..., description="Product category")
in_stock: bool = Field(..., description="Whether the product is in stock")
# Define an order with nested products
class Order(BaseModel):
order_id: str = Field(..., description="Unique order identifier")
customer_name: str = Field(..., description="Name of the customer")
order_date: str = Field(..., description="Date the order was placed")
total_amount: float = Field(..., description="Total order amount")
products: list[Product] = Field(..., description="List of products in the order")
shipping_address: dict = Field(..., description="Shipping address")
Using Structured Outputs in Data Designer
Once you've defined your data models, you can use them in your Data Designer columns:
# Generate structured order data
aidd.add_column(
C.LLMStructuredColumn(
name="order_data",
prompt="""
Generate a realistic order for a customer named {{customer.first_name}} {{customer.last_name}}
from {{customer.city}}, {{customer.state}}.
Include between 1 and 5 products in the order.
The order should be placed on a date between 2023-01-01 and 2023-12-31.
""",
output_format=Order
)
)
Alternative: Using JSON Schema
If you prefer, you can also use JSON Schema directly:
from gretel_client.navigator_client import Gretel
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P
## Create our DD Instance
gretel = Gretel(api_key="prompt")
aidd = gretel.data_designer.new(
model_suite="apache-2.0" # Use apache-2.0 or llama-3.x based on your licensing needs
)
from pydantic import BaseModel, Field
class Fruit(BaseModel):
name: str = Field(..., description="Name of the fruit.")
cost: float = Field(..., description="Dollar value of the fruit.")
weight: float = Field(..., description="Weight in lbs.")
flavor: str = Field(..., description="Primary flavor profile of the fruit.")
preparation: str = Field(..., description="How to prepare the fruit for a fruit salad.")
class FruitSalad(BaseModel):
total_cost: float = Field(..., description="Total cost of all fruits.")
name: str = Field(..., description="Name of this unique fruit salad.")
haiku: str = Field(..., description="A beautiful haiku about this fruit salad.")
fruits: list[Fruit]
Define our fruit salad column
## Tell DD to generate some fruit salads
aidd.add_column(
C.LLMStructuredColumn(
name="fruit_salad",
prompt=(
"Create a description of fruits to go in a regional fruit salad from {{region}}!"
),
output_format=FruitSalad
)
)