Gretel Playground [Legacy] Inference API

Real-time data generation with Gretel Playground

from gretel_client import Gretel

gretel = Gretel(api_key="prompt")

Real-time vs batch data generation

In this section, we will introduce the Playground inference API, which makes it easy to generate high-quality synthetic tabular and text data – in real time – with just a few lines of code, powered by Gretel Playground.

Playground currently supports two data generation modes: tabular and natural_language. In both modes, you can choose the backend model that powers the generation, which we'll describe in more detail below.

Tabular data generation

The Gretel object has a factories attribute that provides helper methods for creating new objects that interact with Gretel's non-project-based APIs. Let's use the factories attribute to fetch the available backend models that power Playground's tabular data generation:

print(gretel.factories.get_navigator_model_list("tabular"))

This will print the list of available models, the first of which will be gretelai/auto, which automatically selects the current default model, which will change with time as models continue to evolve.

To initialize the Playground Tabular inference API, we use the initialize_navigator_api method. Then, we can generate synthetic data in real time using its generate method:

# the `backend_model` argument is optional and defaults "gretelai/auto" 
tabular = gretel.factories.initialize_navigator_api("tabular", backend_model="gretelai/auto")

prompt = """\
Generate customer bank transaction data. Include the following columns:
- customer_name
- customer_id
- transaction_date
- transaction_amount
- transaction_type
- transaction_category
- account_balance
"""

# generate tabular data from a natural language prompt
df = tabular.generate(prompt, num_records=25)

You can augment an existing dataset using the edit method:

# add column to the generated table using the `edit` method

edit_prompt = """\
Add the following column to the provided table:

- customer_address
"""

df_edited = tabular.edit(edit_prompt, seed_data=df)

Finally, Playground's tabular mode supports streaming data generation. To enable streaming, simply set the stream parameter to True:

prompt = """\
Generate positive and negative reviews for common household products purchased online.

Columns are: the product name, number of stars (1-5), review and customer id
"""

for record in tabular.generate(
    prompt=prompt,
    num_records=150,
    stream=True,
    sample_buffer_size=5
):
    print(record)

Note that Playground always streams data behind the scenes, with a maximum stream size of 100 records. Therefore, if you request more than 100 records, correlations will not be maintained across the entire generated dataset. To help maintain correlations, you can pass records as context from one batch to the next batch using the sample_buffer_size parameter. For example, above we set sample_buffer_size=5, which means that the last 5 records from the previous batch will be passed as context to the next batch.

Natural language generation

Playground's natural_language mode gives you access to state-of-the-art LLMs for generating text data. Let's fetch the available backend models that power Playground's natural_language data generation:

print(gretel.factories.get_navigator_model_list("natural_language"))

Similar to the tabular mode, this will print the list of available models, the first of which will be gretelai/gpt-auto, which automatically selects the current default model.

To initialize the Playground Natural Language inference API, we again use the initialize_navigator_api method. Then, we can generate synthetic text data in real time using its generate method:

llm = gretel.factories.initialize_navigator_api("natural_language")

text = llm.generate("Please tell me a funny joke about data scientists.")
print(text)

# let's see if the llm is funnier with a higher temperature
text_higher_temp = llm.generate("Please tell me a funny joke about data scientists.", temperature=2)
print(text_higher_temp)

PreviousVideos NextBatch Job SDK

Last updated 6 months ago

Was this helpful?