Magic Assistance

Overview

The Magic interface within Data Designer allows you to interactively define columns, preview samples, and refine data generation through natural language.

Key Benefits

Automatic prompt generation for LLM columns that reference other columns with correct formatting
Automatic structured output configuration for complex JSON schema definitions
Simplified categorical data creation with automatic inference of appropriate values
Interactive refinement through a conversational interface

Creating and Editing Columns with Magic

Magic SDK offers multiple ways to create columns.

Sampling Columns

Automatically create columns based on distributions or categories without manual configuration.

dd = Gretel().data_designer.new()

# Create a categorical weather column
dd.magic.add_sampling_column(
    "weather",
    "Possible weather types for Tokyo."
)

"""
SamplerColumn(
    name='weather',
    type='category',
    params={
    "values": [
        "Sunny",
        "Cloudy",
        "Rainy",
        "Snowy",
        "Windy"
    ],
    "weights": [
        0.4,
        0.3,
        0.2,
        0.05,
        0.05
    ]
}
)
"""

## Preview samples from the updated Data Designer object
dd.preview().dataset.df["weather"]
"""
0     Rainy
1    Cloudy
2    Cloudy
3    Cloudy
4    Cloudy
5     Sunny
6    Cloudy
7     Sunny
8     Sunny
9     Sunny
Name: weather, dtype: object
"""

Behind the scenes, Magic is doing the following for you:

infers that a categorical column is appropriate
Generates relevant values (e.g., "Sunny", "Cloudy", "Rainy", "Snowy", "Windy")
Configures any relevant distribution parameters (here: the likelihood of occurrence of each weather type).
Updates the Data Designer object (dd) with this new column definition.

However, let's say you aren't pleased with the result you receive from this function, or perhaps you'd like to edit the result in some way. Subsequent calls to the same function will edit existing columns in place.

dd = Gretel().data_designer.new()

dd.magic.add_sampling_column(
    "weather",
    "Possible weather types for Tokyo."
)
# values: ["Sunny", "Cloudy", "Rainy", "Snowy", "Windy"]

dd.magic.add_sampling_column(
    "weather",
    "All values should be in Japanese."
)
# "values": ["晴れ", "曇り", "雨", "雪", "風"]

This function can even be used to edit pre-existing, manually created columns.

dd = Gretel().data_designer.new()

## Manually define a uniform sampler column...
dd.add_column(
    name="temperature",
    type="uniform",
    params={"low": 32.0, "high": 212.0}
)

## ...and edit it with Magic.
dd.magic.add_sampling_column(
    "temperature",
    "Change from F to C"
)

"""
SamplerColumn(
    name='temperature',
    type='uniform',
    params={
    "low": 0.0,
    "high": 100.0
})
"""

Magic can create columns for a wide range of possible sampling types.

dd.magic.add_sampling_column("person_1", "An older man from upstate NY.")

Generated Config for person_1

SamplerColumn(
    name='person_1',
    type='person',
    params={
    "locale": "en_US",
    "sex": null,
    "city": null,
    "age_range": [
        50,
        114
    ],
    "state": [
        "NY"
    ]
}
)

dd.magic.add_sampling_column("product_id", "A product ID starting with 'INV/'")

Generated Config for product_id

SamplerColumn(
    name='product_id',
    type='uuid',
    params={
    "prefix": "INV/",
    "short_form": false,
    "uppercase": false
}
)

Extending Categorical Columns

A common way to increase diversity of a dataset is to include more possible values that a sampling column can take on. To help with this common pattern, magic offers tools for specifically for this.

dd = Gretel().data_designer.new()

dd.add_column(
    name="programming_concepts",
    type="category",
    params={"values": ["Linked Lists"]}
)

dd.magic.extend_category("programming_concepts")
# values: ['Linked Lists', 'Trees', 'Graphs', 'Hash Tables', 'Heaps', 'Stacks']


dd.magic.extend_category("programming_concepts", n=1)
# values: ['Linked Lists', ..., 'Queues']

We can also extend categories created by Magic, not just hand-crafted ones.

dd = Gretel().data_designer.new()

## Generate a category and then boost its values for extra diversity
dd.magic.add_sampling_column("japan_city", "Cities in Japan.")
dd.magic.extend_category("japan_city", n=3)
dd.magic.extend_category("japan_city", n=3)

Generated Sampler Column Config

SamplerColumn(
    name='japan_city',
    type='category',
    params={
    "values": [
        "Tokyo",
        "Kyoto",
        "Nagasaki",
        "Osaka",
        "Fukuoka",
        "Sapporo",
        "Hiroshima",
        "Kobe",
        "Saitama",
        "Yokohama",
        "Nara",
        "Kumamoto",
        "Niigata",
        "Fukushima",
        "Kochi",
        "Matsuyama",
        "Kagoshima",
        "Sendai",
        "Hamamatsu"
    ],
    "weights": null
}
)

LLM Generation Columns

Magic can also help with creating column configurations for LLM generation columns — helping you draft Jinja prompt templates and even define structured output schema. To get started, let's start with our previous weather examples

dd = Gretel().data_designer.new()

dd.magic.add_sampling_column("japan_city", "Cities in Japan")
dd.magic.add_sampling_column("weather", "Possible weather types for Japan")
dd.magic.add_sampling_column("temperature", "Possible temperature in Japan, in C")

# Create a text description that depends on the weather column
dd.magic.add_column(
    "forecast",
    "A realistic weather forecast, as would be written in a newspaper. Two to three sentences."
)

Generated Config for Column forecast

LLMGenColumn(
    model_suite='apache-2.0',
    error_rate=0.2,
    model_configs=None,
    model_alias='text',
    prompt='Write a realistic weather forecast for {{ japan_city }} as it would appear in a newspaper. Include the weather
conditions ({{ weather }}) and the temperature ({{ temperature }}°C) in your forecast. Keep it to two to three sentences.',
    name='forecast',
    system_prompt=None,
    output_type='text',
    output_format=None,
    description='A realistic weather forecast for a Japanese city, as it would appear in a newspaper, including weather conditions
and temperature, written in two to three sentences.'
)

dd.preview() Output

Column

Value

japan_city

Kyoto

weather

Sunny

temperature

22.137...

forecast

Tomorrow in Kyoto will be sunny with clear skies throughout the day. The temperature will reach a high of 22°C.

Similar to sampling columns, we can edit columns in place with instructions by calling the same function on an existing LLM generation column. Furthermore, we can specify exactly which pre-existing columns we require the LLM generation's prompt template depend on.

dd.magic.add_column(
  "forecast", 
  "The forecast should be two detailed paragraphs.",
  must_depend_on=["weather", "japan_city", "temperature"]
)

Updated Config for Column forecast

LLMGenColumn(
    model_suite='apache-2.0',
    error_rate=0.2,
    model_configs=None,
    model_alias='text',
    prompt='Write a realistic weather forecast for {{ japan_city }} as it would appear in
a reputable Japanese newspaper. The forecast should include two detailed paragraphs:\n\n1.
The first paragraph should provide a comprehensive description of the current weather
conditions ({{ weather }}) and temperature ({{ temperature }}°C). Include any notable
atmospheric phenomena, local impacts, and how the current weather affects daily life in
the city.\n\n2. The second paragraph should offer a detailed outlook for the next day,
including any expected changes in weather and temperature, potential impacts on daily
activities, and any precautions residents should take. Use specific examples to illustrate
your points.',
    name='forecast',
    system_prompt=None,
    output_type='text',
    output_format=None,
    description='A realistic weather forecast for a Japanese city, as it would appear in a
reputable newspaper. The forecast includes the city name, weather conditions, and
temperature in two detailed paragraphs.'
)

First Prompt Template

Updated Prompt Template

Write a realistic weather forecast for {{ japan_city }} as it would appear in a newspaper. Include the weather conditions ({{ weather }}) and the temperature ({{ temperature }}°C) in your forecast. Keep it to two to three sentences

Write a realistic weather forecast for {{ japan_city }} as it would appear in a reputable Japanese newspaper. The forecast should include two detailed paragraphs:

The first paragraph should provide a comprehensive description of the current weather conditions ({{ weather }}) and temperature ({{ temperature }}°C). Include any notable atmospheric phenomena, local impacts, and how the current weather affects daily life in the city.
The second paragraph should offer a detailed outlook for the next day, including any expected changes in weather and temperature, potential impacts on daily activities, and any precautions residents should take. Use specific examples to illustrate your points.

Structured Outputs. Magic isn't limited to just generating text columns, however. It can also be used to generate configurations for structured output columns without having to know JSONSchema or Pydantic.

dd.magic.add_column(
    "hourly_weather_data",
    "Structured data with hourly temperature, humidity, and wind speed predictions.",
    must_depend_on=["forecast", "japan_city"]
)

Generated Config for Column hourly_weather_data

LLMGenColumn(
    model_suite='apache-2.0',
    error_rate=0.2,
    model_configs=None,
    model_alias='text',
    prompt="Given the following weather forecast for {{ japan_city }}: '{{ forecast }}'. Generate structured hourly predictions for
temperature, humidity, and wind speed for the next 24 hours. Format the data as follows: Hour, Temperature (°C), Humidity (%), Wind
Speed (km/h).",
    name='hourly_weather_data',
    system_prompt=None,
    output_type='structured',
    output_format={
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "Hour": {
                "type": "string",
                "description": "The hour of the day in 24-hour format (e.g., '00:00', '01:00', ..., '23:00')."
            },
            "Temperature": {
                "type": "number",
                "description": "The predicted temperature in Celsius for the given hour."
            },
            "Humidity": {
                "type": "number",
                "description": "The predicted humidity percentage for the given hour."
            },
            "Wind_Speed": {
                "type": "number",
                "description": "The predicted wind speed in kilometers per hour for the given hour."
            }
        },
        "required": [
            "Hour",
            "Temperature",
            "Humidity",
            "Wind_Speed"
        ]
    }
}

In this configuration, we can see that Magic has correctly chosen the "structured" output type and has also included a JSONSchema definition for the output structure. This structure will be followed during data generation, as shown below.

Generated Sample of hourly_weather_data using dd.preview()

[
  {
    "Hour": "00:00",
    "Temperature": 18,
    "Humidity": 75,
    "Wind_Speed": 10
  },
  {
    "Hour": "01:00",
    "Temperature": 17,
    "Humidity": 78,
    "Wind_Speed": 9
  },
  {
    "Hour": "02:00",
    "Temperature": 16,
    "Humidity": 80,
    "Wind_Speed": 8
  },
  {
    "Hour": "03:00",
    "Temperature": 16,
    "Humidity": 82,
    "Wind_Speed": 7
  },
  {
    "Hour": "04:00",
    "Temperature": 16,
    "Humidity": 83,
    "Wind_Speed": 6
  },
  {
    "Hour": "05:00",
    "Temperature": 16,
    "Humidity": 84,
    "Wind_Speed": 5
  },
  {
    "Hour": "06:00",
    "Temperature": 17,
    "Humidity": 82,
    "Wind_Speed": 5
  },
  {
    "Hour": "07:00",
    "Temperature": 18,
    "Humidity": 80,
    "Wind_Speed": 6
  },
  {
    "Hour": "08:00",
    "Temperature": 19,
    "Humidity": 78,
    "Wind_Speed": 7
  },
  {
    "Hour": "09:00",
    "Temperature": 20,
    "Humidity": 75,
    "Wind_Speed": 8
  },
  {
    "Hour": "10:00",
    "Temperature": 21,
    "Humidity": 72,
    "Wind_Speed": 9
  },
  {
    "Hour": "11:00",
    "Temperature": 22,
    "Humidity": 70,
    "Wind_Speed": 10
  },
  {
    "Hour": "12:00",
    "Temperature": 22,
    "Humidity": 68,
    "Wind_Speed": 11
  },
  {
    "Hour": "13:00",
    "Temperature": 22,
    "Humidity": 66,
    "Wind_Speed": 12
  },
  {
    "Hour": "14:00",
    "Temperature": 22,
    "Humidity": 65,
    "Wind_Speed": 13
  },
  {
    "Hour": "15:00",
    "Temperature": 21,
    "Humidity": 67,
    "Wind_Speed": 12
  },
  {
    "Hour": "16:00",
    "Temperature": 20,
    "Humidity": 70,
    "Wind_Speed": 11
  },
  {
    "Hour": "17:00",
    "Temperature": 19,
    "Humidity": 72,
    "Wind_Speed": 10
  },
  {
    "Hour": "18:00",
    "Temperature": 18,
    "Humidity": 75,
    "Wind_Speed": 9
  },
  {
    "Hour": "19:00",
    "Temperature": 17,
    "Humidity": 78,
    "Wind_Speed": 8
  },
  {
    "Hour": "20:00",
    "Temperature": 17,
    "Humidity": 80,
    "Wind_Speed": 7
  },
  {
    "Hour": "21:00",
    "Temperature": 17,
    "Humidity": 82,
    "Wind_Speed": 6
  },
  {
    "Hour": "22:00",
    "Temperature": 17,
    "Humidity": 83,
    "Wind_Speed": 5
  },
  {
    "Hour": "23:00",
    "Temperature": 17,
    "Humidity": 84,
    "Wind_Speed": 4
  }
]

The output data structure definition can also be edited with successive calls. So, for instance, we can request edits to the output data structure itself, if we like.

dd.magic.add_column(
    "hourly_weather_data",
    "Make the fields lowercase. Also require an additional field, air_quality_index."
)

Generated Sample of hourly_weather_data using dd.preview()

Refining Prompts

As tweaking prompt templates is a common task when designing data generation steps, Magic also has tools specifically for refining and varying prompts (and nothing else). consider the following example.

dd = Gretel().data_designer.new()

dd.add_column(name="farmer", type="person")
dd.add_column(
    name="question",
    prompt="Ask {{ farmer.first_name }} about the price of apples today."
)
dd.magic.refine_prompt("question", "Use the farmer's full name.")
dd.magic.refine_prompt("question", "Ask about pears instead.")
dd.magic.refine_prompt("question", "Ask about pears if the farmer is a man and oranges if not.")

Step

Prompt Template

Original

Ask {{ farmer.first_name }} about the price of apples today.

Frist Edit

Ask {{ farmer.first_name }} {{ farmer.last_name }} about the price of apples today.

Second Edit

Ask {{ farmer.first_name }} {{ farmer.last_name }} about the price of pears today.

Fourth Edit

Ask {{ farmer.first_name }} {{ farmer.last_name }} about the price of {% if farmer.sex == 'Male' %}pears{% else %}oranges{% endif %} today.

This way of using refine_prompt is quite close to add_column's capability to edit columns, however, refine_prompt ensures that there can be no spurious changes to any other parts of the LLM generation config.

Sometimes, though, you simply want to rephrase a prompt. A bare call to refine_prompt on the target column will vary that prompt template with

for _ in range(3):
    dd.magic.refine_prompt("question")

Prompt Template Variations

You are a customer at a local market. Ask {{ farmer.first_name }} {{

farmer.last_name }} about the price of {% if farmer.sex == 'Male' %}pears{% else

%}oranges{% endif %} today. Your question should sound natural and polite, as if you are

having a conversation with a neighbor or friend.

You are a customer at a local market. Politely ask {{ farmer.first_name }} {{

farmer.last_name }} about the price of {% if farmer.sex == 'Male' %}pears{% else

%}oranges{% endif %} today. Your question should sound natural and friendly, as if you are

having a conversation with a neighbor or friend.

You are a customer at a local market. Please politely ask {{ farmer.first_name

}} {{ farmer.last_name }} about the price of {% if farmer.sex == 'Male' %}pears{% else

%}oranges{% endif %} today. Your question should sound natural and friendly, as if you are

having a conversation with a neighbor or friend. Begin your question with 'Hi' or 'Hello'

and use a casual, friendly tone.

Interactive Workflow

Sometimes you might want to see possible outputs to help you request edits to column configurations. Magic supports an interactive chat-like interface which can be accessed by setting

magic.add_sampling_column(..., interactive=True)
magic.add_column(..., interactive=True)

While in this mode, you will be able to view Magic-proposed column configurations, optionally preview the outputs of that column for the current state of the Data Designer object, and then optionally accept those changes or request further edits via text commands.

dd = Gretel().data_designer.new()

dd.add_column(name="farmer", type="person")
dd.add_column(
    name="question",
    prompt="Ask {{ farmer.first_name }} about the price of apples today.",
)
dd.magic.add_column(
    "question",
    "Use the farmer's full name.",
    interactive=True
)

Command

Description

accept

Save the current configuration

cancel

Discard changes

start-over

Reset to initial state

retry

Generate configuration again

preview

Generate sample data

preview-on/preview-off

Toggle automatic previews

Best Practices

For most projects, combine Magic SDK with the standard Data Designer SDK:

# Initialize
gretel = Gretel(api_key="YOUR_API_KEY")
dd = gretel.data_designer.new(model_suite="apache-2.0")

# Use explicit configuration for well-defined base columns
dd.add_column(
    C.SamplerColumn(
        name="employee_id",
        type=P.SamplerType.UUID,
        params=P.UUIDSamplerParams(
            prefix="GRETEL_", 
            short_form=True, 
            uppercase=True
        )
    )
) 

# Use Magic for more complex columns
dd.magic.add_sampling_column(
    "product_category",
    "Main product categories in an e-commerce store"
)

# Refine and extend
dd.magic.refine_prompt("product_description", "Add more technical specifications")
dd.magic.extend_category("product_category", n=10)

# Generate the full dataset
preview = dd.preview()
preview.display_sample_record()

PreviousData Evaluation NextUsing Jinja Templates

Last updated 3 months ago

Was this helpful?