LogoLogo
  • Welcome to Gretel!
  • Gretel Basics
    • Getting Started
      • Quickstart
      • Blueprints
      • Use Case Examples
      • Environment Setup
        • Console
        • SDK
      • Projects
      • Inputs and Outputs
      • Gretel Connectors
        • Object Storage
          • Amazon S3
          • Google Cloud Storage
          • Azure Blob
        • Database
          • MySQL
          • PostgreSQL
          • MS SQL Server
          • Oracle Database
        • Data Warehouse
          • Snowflake
          • BigQuery
          • Databricks
        • Gretel Project
    • Release Notes
      • Platform Release Notes
        • May 2025
        • April 2025
        • March 2025
        • February 2025
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
        • July 2024
        • June 2024
      • Console Release Notes
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
      • Python SDKs
  • Create Synthetic Data
    • Gretel Safe Synthetics
      • Transform
        • Reference
        • Examples
        • Supported Entities
      • Synthetics
        • Gretel Tabular Fine-Tuning
        • Gretel Text Fine-Tuning
        • Gretel Tabular GAN
        • Benchmark Report
        • Privacy Protection
      • Evaluate
        • Synthetic Quality & Privacy Report
        • Tips to Improve Synthetic Data Quality
        • Data Privacy 101
      • SDK
    • Gretel Data Designer
      • Getting Started with Data Designer
      • Define your Data Columns
        • Column Types
        • Add Constraints to Columns
        • Custom Model Configurations
        • Upload Files as Seeds
      • Building your Dataset
        • Seeding your Dataset
        • Generating Data
      • Generate Realistic Personal Details
      • Structured Outputs
      • Code Validation
      • Data Evaluation
      • Magic Assistance
      • Using Jinja Templates
  • Gretel Playground [Legacy]
    • Getting Started
    • Prompts Tips & Best Practices
    • FAQ
    • SDK Examples
    • Tutorials
    • Videos
    • Gretel Playground [Legacy] Inference API
    • Batch Job SDK
  • Reference
    • Gretel's Python Client
    • Gretel’s Open Source Synthetic Engine
    • Gretel’s REST API
    • Homepage
    • Model Suites
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Create Synthetic Data
  2. Gretel Data Designer
  3. Define your Data Columns

Upload Files as Seeds

Upload a Dataset

You can use an existing dataset (CSV, DataFrame, etc.) in Data Designer to add diversity to your LLM generated columns:

import pandas as pd
from gretel_client.navigator_client import Gretel

# Initialize Data Designer
gretel = Gretel(api_key="YOUR_API_KEY")
aidd = gretel.data_designer.new(model_suite="apache-2.0")

# Load your dataset
df_seed = pd.read_csv("https://gretel-datasets.s3.us-west-2.amazonaws.com/calmcode-datasets/pokemon_descriptive_columns.csv")

# Use it as seed data
aidd.with_seed_dataset(
    df_seed,
    sampling_strategy="shuffle",  # "ordered" or "shuffle"
    with_replacement=True  # Set to True if you want to generate more records than in seed
)

Sampling Strategies

When using a seed dataset, you can control how it's used with these parameters:

  • sampling_strategy:

    • "ordered": Maintains the original order of records (default)

    • "shuffle": Randomly shuffles the records

  • with_replacement:

    • False: Each seed record is used only once (default)

    • True: Records can be reused, necessary when generating more records than in your seed

# Example: Using ordered sampling without replacement
aidd.with_seed_dataset(
    df_seed,
    sampling_strategy="ordered",
    with_replacement=False
)

Referencing Columns

Once you've added a seed dataset, its columns become available to reference in your Data Designer:

# Generate a product description based on seed data
aidd.add_column(
    name="product_description",
    prompt="Write a detailed story about {{pokemon_name}} based on {{pokemon_type}}."
)
PreviousCustom Model ConfigurationsNextBuilding your Dataset

Last updated 1 month ago

Was this helpful?