Magic Assistance
Overview
The Magic interface within Data Designer allows you to interactively define columns, preview samples, and refine data generation through natural language.
Key Benefits
Automatic prompt generation for LLM columns that reference other columns with correct formatting
Automatic structured output configuration for complex JSON schema definitions
Simplified categorical data creation with automatic inference of appropriate values
Interactive refinement through a conversational interface
Creating and Editing Columns with Magic
Magic SDK offers multiple ways to create columns.
Sampling Columns
Automatically create columns based on distributions or categories without manual configuration.
dd = Gretel().data_designer.new()
# Create a categorical weather column
dd.magic.add_sampling_column(
"weather",
"Possible weather types for Tokyo."
)
"""
SamplerColumn(
name='weather',
type='category',
params={
"values": [
"Sunny",
"Cloudy",
"Rainy",
"Snowy",
"Windy"
],
"weights": [
0.4,
0.3,
0.2,
0.05,
0.05
]
}
)
"""
## Preview samples from the updated Data Designer object
dd.preview().dataset.df["weather"]
"""
0 Rainy
1 Cloudy
2 Cloudy
3 Cloudy
4 Cloudy
5 Sunny
6 Cloudy
7 Sunny
8 Sunny
9 Sunny
Name: weather, dtype: object
"""
Behind the scenes, Magic is doing the following for you:
infers that a categorical column is appropriate
Generates relevant values (e.g., "Sunny", "Cloudy", "Rainy", "Snowy", "Windy")
Configures any relevant distribution parameters (here: the likelihood of occurrence of each weather type).
Updates the Data Designer object (
dd
) with this new column definition.
However, let's say you aren't pleased with the result you receive from this function, or perhaps you'd like to edit the result in some way. Subsequent calls to the same function will edit existing columns in place.
dd = Gretel().data_designer.new()
dd.magic.add_sampling_column(
"weather",
"Possible weather types for Tokyo."
)
# values: ["Sunny", "Cloudy", "Rainy", "Snowy", "Windy"]
dd.magic.add_sampling_column(
"weather",
"All values should be in Japanese."
)
# "values": ["晴れ", "曇り", "雨", "雪", "風"]
This function can even be used to edit pre-existing, manually created columns.
dd = Gretel().data_designer.new()
## Manually define a uniform sampler column...
dd.add_column(
name="temperature",
type="uniform",
params={"low": 32.0, "high": 212.0}
)
## ...and edit it with Magic.
dd.magic.add_sampling_column(
"temperature",
"Change from F to C"
)
"""
SamplerColumn(
name='temperature',
type='uniform',
params={
"low": 0.0,
"high": 100.0
})
"""
Magic can create columns for a wide range of possible sampling types.
dd.magic.add_sampling_column("person_1", "An older man from upstate NY.")
dd.magic.add_sampling_column("product_id", "A product ID starting with 'INV/'")
Extending Categorical Columns
A common way to increase diversity of a dataset is to include more possible values that a sampling column can take on. To help with this common pattern, magic offers tools for specifically for this.
dd = Gretel().data_designer.new()
dd.add_column(
name="programming_concepts",
type="category",
params={"values": ["Linked Lists"]}
)
dd.magic.extend_category("programming_concepts")
# values: ['Linked Lists', 'Trees', 'Graphs', 'Hash Tables', 'Heaps', 'Stacks']
dd.magic.extend_category("programming_concepts", n=1)
# values: ['Linked Lists', ..., 'Queues']
We can also extend categories created by Magic, not just hand-crafted ones.
dd = Gretel().data_designer.new()
## Generate a category and then boost its values for extra diversity
dd.magic.add_sampling_column("japan_city", "Cities in Japan.")
dd.magic.extend_category("japan_city", n=3)
dd.magic.extend_category("japan_city", n=3)
LLM Generation Columns
Magic can also help with creating column configurations for LLM generation columns — helping you draft Jinja prompt templates and even define structured output schema. To get started, let's start with our previous weather examples
dd = Gretel().data_designer.new()
dd.magic.add_sampling_column("japan_city", "Cities in Japan")
dd.magic.add_sampling_column("weather", "Possible weather types for Japan")
dd.magic.add_sampling_column("temperature", "Possible temperature in Japan, in C")
# Create a text description that depends on the weather column
dd.magic.add_column(
"forecast",
"A realistic weather forecast, as would be written in a newspaper. Two to three sentences."
)
Similar to sampling columns, we can edit columns in place with instructions by calling the same function on an existing LLM generation column. Furthermore, we can specify exactly which pre-existing columns we require the LLM generation's prompt template depend on.
dd.magic.add_column(
"forecast",
"The forecast should be two detailed paragraphs.",
must_depend_on=["weather", "japan_city", "temperature"]
)
Write a realistic weather forecast for {{ japan_city }} as it would appear in a newspaper. Include the weather conditions ({{ weather }}) and the temperature ({{ temperature }}°C) in your forecast. Keep it to two to three sentences
Write a realistic weather forecast for {{ japan_city }} as it would appear in a reputable Japanese newspaper. The forecast should include two detailed paragraphs:
The first paragraph should provide a comprehensive description of the current weather conditions ({{ weather }}) and temperature ({{ temperature }}°C). Include any notable atmospheric phenomena, local impacts, and how the current weather affects daily life in the city.
The second paragraph should offer a detailed outlook for the next day, including any expected changes in weather and temperature, potential impacts on daily activities, and any precautions residents should take. Use specific examples to illustrate your points.
Structured Outputs. Magic isn't limited to just generating text columns, however. It can also be used to generate configurations for structured output columns without having to know JSONSchema or Pydantic.
dd.magic.add_column(
"hourly_weather_data",
"Structured data with hourly temperature, humidity, and wind speed predictions.",
must_depend_on=["forecast", "japan_city"]
)
In this configuration, we can see that Magic has correctly chosen the "structured"
output type and has also included a JSONSchema definition for the output structure. This structure will be followed during data generation, as shown below.
The output data structure definition can also be edited with successive calls. So, for instance, we can request edits to the output data structure itself, if we like.
dd.magic.add_column(
"hourly_weather_data",
"Make the fields lowercase. Also require an additional field, air_quality_index."
)
Refining Prompts
As tweaking prompt templates is a common task when designing data generation steps, Magic also has tools specifically for refining and varying prompts (and nothing else). consider the following example.
dd = Gretel().data_designer.new()
dd.add_column(name="farmer", type="person")
dd.add_column(
name="question",
prompt="Ask {{ farmer.first_name }} about the price of apples today."
)
dd.magic.refine_prompt("question", "Use the farmer's full name.")
dd.magic.refine_prompt("question", "Ask about pears instead.")
dd.magic.refine_prompt("question", "Ask about pears if the farmer is a man and oranges if not.")
Original
Ask {{ farmer.first_name }} about the price of apples today.
Frist Edit
Ask {{ farmer.first_name }} {{ farmer.last_name }} about the price of apples today.
Second Edit
Ask {{ farmer.first_name }} {{ farmer.last_name }} about the price of pears today.
Fourth Edit
Ask {{ farmer.first_name }} {{ farmer.last_name }} about the price of {% if farmer.sex == 'Male' %}pears{% else %}oranges{% endif %} today.
This way of using refine_prompt
is quite close to add_column
's capability to edit columns, however, refine_prompt
ensures that there can be no spurious changes to any other parts of the LLM generation config.
Sometimes, though, you simply want to rephrase a prompt. A bare call to refine_prompt
on the target column will vary that prompt template with
for _ in range(3):
dd.magic.refine_prompt("question")
You are a customer at a local market. Ask {{ farmer.first_name }} {{
farmer.last_name }} about the price of {% if farmer.sex == 'Male' %}pears{% else
%}oranges{% endif %} today. Your question should sound natural and polite, as if you are
having a conversation with a neighbor or friend.
You are a customer at a local market. Politely ask {{ farmer.first_name }} {{
farmer.last_name }} about the price of {% if farmer.sex == 'Male' %}pears{% else
%}oranges{% endif %} today. Your question should sound natural and friendly, as if you are
having a conversation with a neighbor or friend.
You are a customer at a local market. Please politely ask {{ farmer.first_name
}} {{ farmer.last_name }} about the price of {% if farmer.sex == 'Male' %}pears{% else
%}oranges{% endif %} today. Your question should sound natural and friendly, as if you are
having a conversation with a neighbor or friend. Begin your question with 'Hi' or 'Hello'
and use a casual, friendly tone.
Interactive Workflow
Sometimes you might want to see possible outputs to help you request edits to column configurations. Magic supports an interactive chat-like interface which can be accessed by setting
magic.add_sampling_column(..., interactive=True)
magic.add_column(..., interactive=True)
While in this mode, you will be able to view Magic-proposed column configurations, optionally preview the outputs of that column for the current state of the Data Designer object, and then optionally accept those changes or request further edits via text commands.
dd = Gretel().data_designer.new()
dd.add_column(name="farmer", type="person")
dd.add_column(
name="question",
prompt="Ask {{ farmer.first_name }} about the price of apples today.",
)
dd.magic.add_column(
"question",
"Use the farmer's full name.",
interactive=True
)
accept
Save the current configuration
cancel
Discard changes
start-over
Reset to initial state
retry
Generate configuration again
preview
Generate sample data
preview-on
/preview-off
Toggle automatic previews
Best Practices
For most projects, combine Magic SDK with the standard Data Designer SDK:
# Initialize
gretel = Gretel(api_key="YOUR_API_KEY")
dd = gretel.data_designer.new(model_suite="apache-2.0")
# Use explicit configuration for well-defined base columns
dd.add_column(
C.SamplerColumn(
name="employee_id",
type=P.SamplerType.UUID,
params=P.UUIDSamplerParams(
prefix="GRETEL_",
short_form=True,
uppercase=True
)
)
)
# Use Magic for more complex columns
dd.magic.add_sampling_column(
"product_category",
"Main product categories in an e-commerce store"
)
# Refine and extend
dd.magic.refine_prompt("product_description", "Add more technical specifications")
dd.magic.extend_category("product_category", n=10)
# Generate the full dataset
preview = dd.preview()
preview.display_sample_record()
Last updated
Was this helpful?