Upload Files as Seeds
Upload a Dataset
You can use an existing dataset (CSV, DataFrame, etc.) in Data Designer to add diversity to your LLM generated columns:
import pandas as pd
from gretel_client.navigator_client import Gretel
# Initialize Data Designer
gretel = Gretel(api_key="YOUR_API_KEY")
aidd = gretel.data_designer.new(model_suite="apache-2.0")
# Load your dataset
df_seed = pd.read_csv("https://gretel-datasets.s3.us-west-2.amazonaws.com/calmcode-datasets/pokemon_descriptive_columns.csv")
# Use it as seed data
aidd.with_seed_dataset(
df_seed,
sampling_strategy="shuffle", # "ordered" or "shuffle"
with_replacement=True # Set to True if you want to generate more records than in seed
)Sampling Strategies
When using a seed dataset, you can control how it's used with these parameters:
sampling_strategy:
"ordered": Maintains the original order of records (default)"shuffle": Randomly shuffles the records
with_replacement:
False: Each seed record is used only once (default)True: Records can be reused, necessary when generating more records than in your seed
# Example: Using ordered sampling without replacement
aidd.with_seed_dataset(
df_seed,
sampling_strategy="ordered",
with_replacement=False
)Referencing Columns
Once you've added a seed dataset, its columns become available to reference in your Data Designer:
# Generate a product description based on seed data
aidd.add_column(
name="product_description",
prompt="Write a detailed story about {{pokemon_name}} based on {{pokemon_type}}."
)Last updated
Was this helpful?

