Gretel Navigator Inference API

Real-time data generation with Gretel Navigator

from gretel_client import Gretel

gretel = Gretel(api_key="prompt")

Real-time vs batch data generation

The previous sections on the Gretel SDK were focused on running batch jobs, which are project-based and do not support real-time interaction. In this section, we will introduce the Navigator inference API, which makes it easy to generate high-quality synthetic tabular and text data – in real time – with just a few lines of code, powered by Gretel Navigator.

Navigator currently supports two data generation modes: tabular and natural_language. In both modes, you can choose the backend model that powers the generation, which we'll describe in more detail below.

Tabular data generation

The Gretel object has a factories attribute that provides helper methods for creating new objects that interact with Gretel's non-project-based APIs. Let's use the factories attribute to fetch the available backend models that power Navigator's tabular data generation:

print(gretel.factories.get_navigator_model_list("tabular"))

This will print the list of available models, the first of which will be gretelai/auto, which automatically selects the current default model, which will change with time as models continue to evolve.

To initialize the Navigator Tabular inference API, we use the initialize_navigator_api method. Then, we can generate synthetic data in real time using its generate method:

# the `backend_model` argument is optional and defaults "gretelai/auto" 
tabular = gretel.factories.initialize_navigator_api("tabular", backend_model="gretelai/auto")

prompt = """\
Generate customer bank transaction data. Include the following columns:
- customer_name
- customer_id
- transaction_date
- transaction_amount
- transaction_type
- transaction_category
- account_balance
"""

# generate tabular data from a natural language prompt
df = tabular.generate(prompt, num_records=25)

You can augment an existing dataset using the edit method:

# add column to the generated table using the `edit` method

edit_prompt = """\
Add the following column to the provided table:

- customer_address
"""

df_edited = tabular.edit(edit_prompt, seed_data=df)

Finally, Navigator's tabular mode supports streaming data generation. To enable streaming, simply set the stream parameter to True:

prompt = """\
Generate positive and negative reviews for common household products purchased online.

Columns are: the product name, number of stars (1-5), review and customer id
"""

for record in tabular.generate(
    prompt=prompt,
    num_records=150,
    stream=True,
    sample_buffer_size=5
):
    print(record)

Natural language generation

Navigator's natural_language mode gives you access to state-of-the-art LLMs for generating text data. Let's fetch the available backend models that power Navigator's natural_language data generation:

print(gretel.factories.get_navigator_model_list("natural_language"))

Similar to the tabular mode, this will print the list of available models, the first of which will be gretelai/gpt-auto, which automatically selects the current default model.

To initialize the Navigator Natural Language inference API, we again use the initialize_navigator_api method. Then, we can generate synthetic text data in real time using its generate method:

llm = gretel.factories.initialize_navigator_api("natural_language")

text = llm.generate("Please tell me a funny joke about data scientists.")
print(text)

# let's see if the llm is funnier with a higher temperature
text_higher_temp = llm.generate("Please tell me a funny joke about data scientists.", temperature=2)
print(text_higher_temp) 

Last updated

Was this helpful?