Batch Job SDK

Documentation for the batch job SDK. For using Navigator at scale.

  1. Initialize Navigator with a model config:

model_config = """
schema_version: 1.0
 - navigator:
       model_id: "gretelai/auto"
       output_format: "jsonl"

2. Utilize these helper functions to use batch SDK in your own workflows:

def submit_generate(model, prompt: str, params: dict, ref_data=None) -> pd.DataFrame:
   Generate or augment data from the Navigator model.

   model: The model object that will process the prompt.
   prompt (str): The text prompt to generate data from.
   params (dict): Parameters for data generation.
   ref_data: Optional existing dataset to edit or augment.

   pd.DataFrame: The generated data.
   data_processor = model.create_record_handler_obj(
       data_source=pd.DataFrame({"prompt": [prompt]}),
   poll(data_processor, verbose=False)
   return pd.read_json(data_processor.get_artifact_link("data"), lines=True, compression="gzip")


# Generate mock dataset
prompt = """\
Generate a mock dataset for users from the Foo company based in France.

Each user should have the following columns:
* first_name: traditional French first names.
* last_name: traditional French surnames.
* email: formatted as the first letter of their first name followed by their last name (e.g.,
* gender: Male/Female/Non-binary.
* city: a city in France.
* country: always 'France'.

params = {
   "num_records": 10,
   "temperature": 0.8,
   "top_p": 1,
   "top_k": 50
df = submit_generate(model=model, prompt=prompt, params=params)

Last updated