Upload Files as Seeds
Upload a Dataset
You can use an existing dataset (CSV, DataFrame, etc.) in Data Designer to add diversity to your LLM generated columns:
import pandas as pd
from gretel_client.navigator_client import Gretel
# Initialize Data Designer
gretel = Gretel(api_key="YOUR_API_KEY")
aidd = gretel.data_designer.new(model_suite="apache-2.0")
# Load your dataset
df_seed = pd.read_csv("https://gretel-datasets.s3.us-west-2.amazonaws.com/calmcode-datasets/pokemon_descriptive_columns.csv")
# Use it as seed data
aidd.with_seed_dataset(
df_seed,
sampling_strategy="shuffle", # "ordered" or "shuffle"
with_replacement=True # Set to True if you want to generate more records than in seed
)
Sampling Strategies
When using a seed dataset, you can control how it's used with these parameters:
sampling_strategy:
"ordered"
: Maintains the original order of records (default)"shuffle"
: Randomly shuffles the records
with_replacement:
False
: Each seed record is used only once (default)True
: Records can be reused, necessary when generating more records than in your seed
# Example: Using ordered sampling without replacement
aidd.with_seed_dataset(
df_seed,
sampling_strategy="ordered",
with_replacement=False
)
Referencing Columns
Once you've added a seed dataset, its columns become available to reference in your Data Designer:
# Generate a product description based on seed data
aidd.add_column(
name="product_description",
prompt="Write a detailed story about {{pokemon_name}} based on {{pokemon_type}}."
)
Last updated
Was this helpful?