Gretel Benchmark

Gretel Benchmark is a source-available framework to evaluate multiple synthetic data models on any selection of datasets

Tip: You can use Benchmark to easily compare and analyze multiple synthetic generation algorithms (including, but not limited to, Gretel models) on multiple datasets. The Benchmark report provides Synthetic Data Quality Score (SQS) for each generated synthetic dataset, as well as train time, generate time, and total runtime (in secs).

Be sure to sign up for a free Gretel account to use Benchmark!


To use Gretel Benchmark, install Benchmark through Gretel Trainer.

! pip install -Uqq gretel-trainer
from gretel_trainer.benchmark import *

You can also use this notebook to get started without local installation:

Quick Start

Use this quickstart notebook to start comparing multiple models with a few lines of code. These are the three steps for running Benchmark:

  • Use your local datasets or choose Gretel-curated datasets by name, datatype, or tag

  • Load Gretel models or create your own model class

  • Start a session to run each model with each dataset

At the highest level, starting Benchmark looks like this:

session = compare(datasets=[...], models=[...])

The sections below walk through the different ways you can set up datasets and models to run in Benchmark.


Each dataset in a Benchmark session is categorized with a Datatype. There are three supported variants:

  • Datatype.TABULAR | “tabular” :: most data falls in this category, like ratings, labels, or measurements

  • Datatype.TIME_SERIES | “time_series” :: data with a time or date column, like stocks or price histories

  • Datatype.NATURAL_LANGUAGE | “natural_language” :: data that is free or natural text, like reviews, conversations, and tweets. Note: natural language datasets should be single-column if using the Gretel GPT model.

Generally speaking, functions expecting a Datatype argument can accept either the enum variant object or its string representation (in lower_snake_case).

Using Your Data

To use your own data for evaluating synthetic models in Benchmark, use the create_dataset function.

def create_dataset(
    source: Union[str, pd.DataFrame], 
    datatype: Union[Datatype, str],
    name: str, 
    delimiter: str = ",", 
) -> Dataset
  • source (required): A string path to a local CSV file, or a Pandas DataFrame object.

  • datatype (required): A Benchmark Datatype.

  • name (required): A name used to identify the dataset. Each dataset in a Benchmark session must have a unique name.

  • delimiter (optional): The delimiter character used in local CSV files. Defaults to comma.

Using Gretel Datasets

You can select popular publicly available datasets and find datasets that best match your use case by name, datatype or tags, provided by Gretel.

To access Gretel datasets, start by creating a repository instance:

repo = GretelDatasetRepo()

The following instance methods can be called on the repository.

def get_dataset(self, name: str) -> Dataset
# Fetches a Gretel-curated dataset from Gretel’s S3 bucket
  • name (required): The name of the dataset.

  • This function will raise an exception if no dataset exists with the supplied name

def list_datasets(
    datatype: Optional[Union[Datatype, str]] = None, 
    tags: Optional[list[str]] = None,
) -> list[Dataset]
# Returns a list of Gretel-curated datasets matching the specified datatype and tags. Uses "and" semantics—i.e. only returns datasets that match all supplied values. Providing neither datatype nor tags returns all datasets.
  • datatype (optional): Datatype to filter on

  • tags (optional): Tags to filter on. Various tags are applied to Gretel-curated datasets, see below

def list_dataset_tags(self) -> list[str]
# List all unique tags across all Gretel-curated datasets

Some tags include:

  • Data size: e.g. small, medium, large

  • Industry or category: e.g. finance, healthcare, e-commerce, marketing, ads, energy, government, environment, entertainment, telecom, employment, food


Gretel Models

You can use Gretel models to get Benchmark results on any dataset. Some models are better than others for synthesizing different types of data. To get the best results, follow this guide:

  • Gretel LSTM

    • This model works for a variety of synthetic data tasks including time-series, tabular, and text data. Generally useful for a few thousand records and upward. Dataset generally has a mix of categorical, continuous, and numerical values.

    • Data requirements: Source data should have <150 columns. We recommend using Gretel ACTGAN for high dimensional data.

  • Gretel ACTGAN

    • This model works well for high dimensional, largely numeric data. Use for datasets with more than 20 columns and/or 50,000 rows.

    • Data requirements: Not ideal if dataset contains free text field

  • Gretel GPT

    • This model is useful for natural language or plain text datasets such as reviews, tweets, and conversations.

    • Data requirements: Dataset must be single-column

  • Gretel Amplify

    • This model is great for generating lots of data quickly.

    • Note: Gretel Amplify is not a neural network model, but instead uses statistical means to generate lots of data from an input dataset. The SQS for data generated using Gretel Amplify may be lower.

    For more on using Gretel models, refer to our blueprints and example notebooks for popular use cases.

Out of the box, Benchmark exports objects wrapping several of our models' default templates. These class objects can be included in the list of models used to start a Benchmark session. For example:

from gretel_trainer.benchmark import GretelACTGAN, GretelLSTM, compare

session = compare(datasets=[...], models=[GretelACTGAN, GretelLSTM])

Specific Gretel Model Configurations

If you want to get more fine-grained with Gretel model configurations than the default templates above, you can create customized Gretel models. If you have a valid, standalone model configuration you want to use, create a subclass of Benchmark’s GretelModel class and specify the config property of the class.

class CustomizedLSTM(GretelModel):
    config = {...} # define configuration here or set string path to local config file

Custom Models

You can run Benchmark on any algorithm for synthetic data, not just Gretel models. To provide your own model implementation, define a Python class that meets this interface:

class MyCustomModel:
    def train(self, source: str, **kwargs) -> None:
        # your training code here
    def generate(self, **kwargs) -> pd.DataFrame:
        # your generation code here

Implement your custom model in Python; any third-party libraries you use must be installed as dependencies wherever you are running Benchmark.

Note: Custom model implementations run synchronously and sequentially—Benchmark will not parallelize the work in background threads like it does for Gretel models.

You can include either classes or instances in the models list argument. If a class is provided, Benchmark will initialize a unique instance of the class for each run with that model. If an instance is provided, the same instance will be shared across runs, so any internal state will accumulate.


Start a Benchmark session by combining datasets and models.

def compare(
    datasets: List[Dataset], 
    models: List[Model], 
    config: Optional[BenchmarkConfig] = None,
) -> Session
# Begin execution of supplied datasets against supplied models. Returns a Session object.
  • datasets (required): List of datasets acquired via create_dataset, repo.get_gretel_dataset, and/or repo.list_gretel_datasets.

  • models (required): List of models. This list can include both Gretel and custom models, both of which are documented further below.

  • config (optional): The BenchmarkConfig object supports customizing various aspects of the Benchmark session.

The Session will start immediately. You can call the call the following properties and methods on the Session object:

session.results -> pd.DataFrame
# Returns a Pandas DataFrame showing the progress and/or results of all runs.

session.export_results(destination: str)
# Exports results as a CSV
# -- `destination` (required): where to write the CSV file

# Blocks until all runs are complete


You can define your own BenchmarkConfig to override default settings about your Benchmark session. Every field is optional.

  • project_display_name (string): The display name to use for the project that gets automatically created in Gretel Console and under which all work is performed.

  • refresh_interval (integer): How often to poll the Gretel API for updates on jobs.

  • trainer (boolean): Whether to use Gretel Trainer instead of the standard Gretel Python client.

  • working_dir (string or Path): Where to store working files (e.g. datasets, output synthetic data).

  • additional_report_scores (list[string)): Fields from Gretel Evaluate to include in the final results dataframe. (SQS will always be included.)


If you encounter Skipped for a given model on a given dataset, this indicates that the data format was unsuitable for that model. Please see the documentation on Gretel models if you were using a Gretel model for more information on acceptable data formats.

A model may have failed to either train or generate due to a variety of reasons. For best results when using a Gretel model, check out our documentation on models and these blueprints!\

Some jobs may take a while to finish running - don't despair! If your input data is a particularly large file, models may take multiple hours to run. You can check on the status of each job using session.resultsto evaluate if any jobs have failed.

Tip: Every job kicked off in Benchmark can also be viewed in the Gretel Console while the job is running. In the Gretel Console, you can find more info about the projects including: (1) whether the model is active or stuck in pending, (2) what epoch training is in, and more.


When you’re running Benchmark, in addition to Gretel Trainer and Client SDK logs, you can also check on the status of each job anytime with the command session.results

Benchmark Results

The Benchmark results report provides: rows, columns, Synthetic Data Quality Score (SQS), train time (sec), generate time (sec), total runtime (sec)

  • The data shape (rows and columns) of the generated data will match the input dataset

  • Learn more about interpreting the Synthetic Data Quality Score (SQS)

  • Runtimes are reported as train time (sec) for model training, generate time (sec) for generating synthetic data, and total runtime (sec)

For more, check out the Benchmark Report of Gretel models on some popular publicly available ML datasets, categorized by industry.


Last updated