Upload Files as Seeds

Upload a Dataset

You can use an existing dataset (CSV, DataFrame, etc.) in Data Designer to add diversity to your LLM generated columns:

import pandas as pd
from gretel_client.navigator_client import Gretel

# Initialize Data Designer
gretel = Gretel(api_key="YOUR_API_KEY")
aidd = gretel.data_designer.new(model_suite="apache-2.0")

# Load your dataset
df_seed = pd.read_csv("https://gretel-datasets.s3.us-west-2.amazonaws.com/calmcode-datasets/pokemon_descriptive_columns.csv")

# Use it as seed data
aidd.with_seed_dataset(
    df_seed,
    sampling_strategy="shuffle",  # "ordered" or "shuffle"
    with_replacement=True  # Set to True if you want to generate more records than in seed
)

Sampling Strategies

When using a seed dataset, you can control how it's used with these parameters:

  • sampling_strategy:

    • "ordered": Maintains the original order of records (default)

    • "shuffle": Randomly shuffles the records

  • with_replacement:

    • False: Each seed record is used only once (default)

    • True: Records can be reused, necessary when generating more records than in your seed

# Example: Using ordered sampling without replacement
aidd.with_seed_dataset(
    df_seed,
    sampling_strategy="ordered",
    with_replacement=False
)

Referencing Columns

Once you've added a seed dataset, its columns become available to reference in your Data Designer:

# Generate a product description based on seed data
aidd.add_column(
    name="product_description",
    prompt="Write a detailed story about {{pokemon_name}} based on {{pokemon_type}}."
)

Last updated

Was this helpful?