Gretel Benchmark

Gretel Benchmark is a source-available framework to evaluate multiple synthetic data models on any selection of datasets
Tip: You can use Benchmark to easily compare and analyze multiple synthetic generation algorithms (including, but not limited to, Gretel models) on multiple datasets. The Benchmark report provides Synthetic Data Quality Score (SQS) for each generated synthetic dataset, as well as train time, generate time, and total runtime (in secs).
Be sure to sign up for a free Gretel account to use Benchmark!


To use Gretel Benchmark, install Benchmark through Gretel Trainer.
! pip install -Uqq gretel-trainer
from gretel_trainer import benchmark
You can also use this notebook to get started without local installation:

Quick Start

Use this quickstart notebook to start comparing multiple models with a few lines of code. These are the three steps for running Benchmark:
  • Use your local datasets or choose ones by name, datatype, or tag
  • Load Gretel models or create your own model class
  • Run comparison to get the results.
    • While jobs are running, you can check on the status with comparison.results
    • When everything is finished, you can download a CSV of the Benchmark report using comparison.export_results

Functions and Types

Datatype (Enum)
# Qualitative description of a dataset. Public functions documented below generally accept either a Datatype variant or its string representation.
Valid variants for datatype are:
  • TABULAR_MIXED | “tabular_mixed”
  • TABULAR_NUMERIC | “tabular_numeric”
  • TIME_SERIES | “time_series”
  • NATURAL_LANGUAGE | “natural_language”
def make_dataset(
sources: Union[List[str], List[pd.DataFrame]],
datatype: Union[Datatype, str],
delimiter: str = ",",
namespace: Optional[str] = None) -> Dataset
# Creates a dataset from local data.
  • sources (required): A list of string paths to local CSV files, or a list of in-memory Pandas DataFrame objects. Raises an exception if:
    • List is empty.
    • List includes incorrect types.
    • List is not homogenous. All values in the list must be of the same type; mixing and matching files and DataFrames is not allowed. To include both files and DataFrames in a single comparison, make two separate datasets.
  • datatype (required): Must be either a Datatype enum variant or a valid string representation.
  • delimiter (optional): The delimiter character used in local CSV files. Defaults to comma.
  • namespace (optional): A prefix to add to input data names in the eventual results output. (Particularly useful for DataFrame datasets, as those objects do not have “names.”)
def get_gretel_dataset(name: str) -> Dataset
# Fetches a Gretel-curated dataset from Gretel’s S3 bucket
  • name (required): The name of the dataset.
  • This function will raise an exception if no dataset exists with the supplied name
def list_gretel_datasets(
datatype: Optional[Union[Datatype, str]] = None,
tags: Optional[List[str]] = None) -> List[Dataset]
# Returns a list of Gretel-curated datasets matching the specified datatype and tags. Uses "and" semantics—i.e. only returns datasets that match all supplied values.
  • datatype (optional): Datatype to filter on
  • tags (optional): Tags to filter on. Various tags are applied to Gretel-curated datasets, see below
def list_gretel_dataset_tags() -> List[str]
# List all unique tags across all Gretel-curated datasets
def compare(
datasets: List[Dataset],
models: List[Model],
auto_clean: bool=True) -> Comparison
# Begin execution of supplied datasets against supplied models. Returns a Comparison object, documented below.
  • datasets (required): List of datasets acquired via make_dataset, get_gretel_dataset, and/or list_gretel_datasets.
  • models (required): List of models.
    • For Gretel models, provide the class name, e.g. GretelACTGAN, GretelLSTM. All available model classes are exported from the main Benchmark module: from gretel_trainer.benchmark import GretelACTGAN
    • For custom model implementations, if your model does not take any initialization arguments, pass the class name. Alternatively, create an instance of your model elsewhere and pass it wrapped in a lambda to compare.
  • auto_clean (optional): If set to False, does not delete projects created by Benchmark in Gretel Cloud, nor removes the local cache directory (.benchmark). Defaults to True.
comparison.is_complete -> bool
# Returns true when all runs (dataset X model) in the comparison reach a terminal state (Completed, Failed, or Skipped)
comparison.results() -> pd.DataFrame
# Returns a Pandas DataFrame showing the progress and/or results of all runs.
comparison.export_results(destination: str)
# Exports results as a CSV
# -- `destination` (required): where to write the CSV file
# Blocks until all runs are complete

Using Your Data

To use your own data for evaluating synthetic models in Benchmark, use the make_dataset function:
sources: Union[List[str], List[pd.DataFrame]],
datatype: Union[Datatype, str],
delimiter: str=",",
namespace: Optional[str]=None)

Tip: Connecting your data

You can also use S3 connectors to connect your data. Check out these tutorials for how to connect data from Snowflake and more.

Using Gretel Datasets

You can select popular publicly available datasets and find datasets that best match your use case by name, datatype or tags, provided by Gretel.
datatype is one of:
  • tabular numeric: data that is only numeric, like ratings or measurements
  • tabular mixed: data that is a mix of numeric and text, like labels along with ratings or measurements
  • natural language: data that is free or natural text, like reviews, conversations, and tweets. Note: natural language datasets should be single-column if using the Gretel GPT model.
  • time series: data with a time or date column, like stocks or price histories
tags include:
  • Data size: e.g. small, medium, large
  • Industry or category: e.g. finance, healthcare, e-commerce, marketing, ads, energy, government, environment, entertainment, telecom, employment, food
To list all the available datasets, use:
list_gretel_datasets(datatype: Optional[Union[Datatype, str]] = None, tags: Optional[List[str]] = None)
To select a dataset, use:
get_gretel_dataset(name: str)

Gretel Models

You can use Gretel models to get Benchmark results on any dataset. Some models are better than others for synthesizing different types of data. To get the best results, follow this guide:
  • GretelAuto
    • This model will automatically pick the best solution between Gretel LSTM and Gretel ACTGAN for your dataset (see more below on the two models). This can be helpful if you want the Gretel engine to select the best model based on characteristics of your dataset.
    • This model works for a variety of synthetic data tasks including time-series, tabular, and text data. Generally useful for a few thousand records and upward. Dataset generally has a mix of categorical, continuous, and numerical values.
    • Data requirements: Source data should have <150 columns. We recommend using Gretel ACTGAN for high dimensional data.
    • This model works well for high dimensional, largely numeric data. Use for datasets with more than 20 columns and/or 50,000 rows.
    • Data requirements: Not ideal if dataset contains free text field
    • This model is useful for natural language or plain text datasets such as reviews, tweets, and conversations.
    • Data requirements: Dataset must be single-column
    • This model is great for generating lots of data quickly.
    • Note: Gretel Amplify is not a neural network model, but instead uses statistical means to generate lots of data from an input dataset. The SQS for data generated using Gretel Amplify may be lower.
    For more on using Gretel models, refer to blueprints for popular use cases, notebooks and documentation.

Custom Gretel Model Configurations

The Gretel model classes above are wrappers around several of our default templates. To include a Gretel model with a non-default configuration, create a subclass of Benchmark’s GretelModel class and specify the config property of the class.
class CustomizedLSTM(GretelModel):
config = {...} # define configuration here or set string path to local config file

Custom Models

You can run Benchmark on any algorithm for synthetic data, not just Gretel models. To provide your own model implementation, define a Python class that meets this interface:
class MyCustomModel:
def train(self, source: str, **kwargs) -> None:
# your training code here
def generate(self, **kwargs) -> pd.DataFrame:
# your generation code here
Implement your custom model in Python; any third-party libraries you use must be installed as dependencies wherever you are running Benchmark.
Note: Custom model implementations run synchronously and sequentially—Benchmark will not parallelize the work in background threads like it does for Gretel models.
The compare function expects a list of “zero-argument constructors” for its models argument, and calls these constructors under the hood. For an implementation like the one above, the class itself is a valid constructor—you pass the MyCustomModel class to compare, and compare calls MyCustomModel() to get an instance of the model. If your model class has an __init__ function that requires arguments, you should instantiate your model first and pass it wrapped in a lambda to compare:
class AnotherCustomModel
def __init__(self, count: int):
self.count = count
def train(self, source: str, **kwargs) -> None:
def generate(self, **kwargs) -> pd.DataFrame:
my_model = AnotherCustomModel(42)
compare(datasets=[...], models=[lambda: my_model])


If you encounter Skipped for a given model on a given dataset, this indicates that the data format was unsuitable for that model. Please see the documentation on Gretel models if you were using a Gretel model for more information on acceptable data formats.
A model may have failed to either train or generate due to a variety of reasons. For best results when using a Gretel model, check out our documentation on models and these blueprints!
Some jobs may take a while to finish running - don't despair! If your input data is a particularly large file, models may take multiple hours to run. You can check on the status of each job using comparison.resultsto evaluate if any jobs have failed.
Tip: Every job kicked off in Benchmark can also be viewed in the Gretel Console while the job is running. In the Gretel Console, you can find more info about the projects including: (1) whether the model is active or stuck in pending, (2) what epoch training is in, and more.


When you’re running Benchmark, in addition to Gretel Trainer and Client SDK logs, you can also check on the status of each job anytime with the command comparison.results

Benchmark Results

The Benchmark results report provides: rows, columns, Synthetic Data Quality Score (SQS), train time (sec), generate time (sec), total runtime (sec)
  • The data shape (rows and columns) of the generated data will match the input dataset
  • Learn more about interpreting the Synthetic Data Quality Score (SQS)
  • Runtimes are reported as train time (sec) for model training, generate time (sec) for generating synthetic data, and total runtime (sec)
For more, check out the Benchmark report of Gretel models on some popular publicly available ML datasets, categorized by industry.