Gretel Navigator Inference API

Real-time data generation with Gretel Navigator

from gretel_client import Gretel

gretel = Gretel(api_key="prompt")

Real-time vs batch data generation

The previous sections on the Gretel SDK were focused on running batch jobs, which are project-based and do not support real-time interaction. In this section, we will introduce the Navigator inference API, which makes it easy to generate high-quality synthetic tabular and text data – in real time – with just a few lines of code, powered by Gretel Navigator.

Navigator currently supports two data generation modes: tabular and natural_language. In both modes, you can choose the backend model that powers the generation, which we'll describe in more detail below.

Tabular data generation

The Gretel object has a factories attribute that provides helper methods for creating new objects that interact with Gretel's non-project-based APIs. Let's use the factories attribute to fetch the available backend models that power Navigator's tabular data generation:

print(gretel.factories.get_navigator_model_list("tabular"))

This will print the list of available models, the first of which will be gretelai/auto, which automatically selects the current default model, which will change with time as models continue to evolve.

To initialize the Navigator Tabular inference API, we use the initialize_navigator_api method. Then, we can generate synthetic data in real time using its generate method:

# the `backend_model` argument is optional and defaults "gretelai/auto" 
tabular = gretel.factories.initialize_navigator_api("tabular", backend_model="gretelai/auto")

prompt = """\
Generate customer bank transaction data. Include the following columns:
- customer_name
- customer_id
- transaction_date
- transaction_amount
- transaction_type
- transaction_category
- account_balance
"""

# generate tabular data from a natural language prompt
df = tabular.generate(prompt, num_records=25)

You can augment an existing dataset using the edit method:

# add column to the generated table using the `edit` method

edit_prompt = """\
Add the following column to the provided table:

- customer_address
"""

df_edited = tabular.edit(edit_prompt, seed_data=df)

Finally, Navigator's tabular mode supports streaming data generation. To enable streaming, simply set the stream parameter to True:

prompt = """\
Generate positive and negative reviews for common household products purchased online.

Columns are: the product name, number of stars (1-5), review and customer id
"""

for record in tabular.generate(
    prompt=prompt,
    num_records=150,
    stream=True,
    sample_buffer_size=5
):
    print(record)

Note that Navigator always streams data behind the scenes, with a maximum stream size of 100 records. Therefore, if you request more than 100 records, correlations will not be maintained across the entire generated dataset. To help maintain correlations, you can pass records as context from one batch to the next batch using the sample_buffer_size parameter. For example, above we set sample_buffer_size=5, which means that the last 5 records from the previous batch will be passed as context to the next batch.

Natural language generation

Navigator's natural_language mode gives you access to state-of-the-art LLMs for generating text data. Let's fetch the available backend models that power Navigator's natural_language data generation:

print(gretel.factories.get_navigator_model_list("natural_language"))

Similar to the tabular mode, this will print the list of available models, the first of which will be gretelai/gpt-auto, which automatically selects the current default model.

To initialize the Navigator Natural Language inference API, we again use the initialize_navigator_api method. Then, we can generate synthetic text data in real time using its generate method:

llm = gretel.factories.initialize_navigator_api("natural_language")

text = llm.generate("Please tell me a funny joke about data scientists.")
print(text)

# let's see if the llm is funnier with a higher temperature
text_higher_temp = llm.generate("Please tell me a funny joke about data scientists.", temperature=2)
print(text_higher_temp) 

Last updated