Gretel Benchmark
Gretel Benchmark is a source-available framework to evaluate multiple synthetic data models on any selection of datasets
Last updated
Was this helpful?
Gretel Benchmark is a source-available framework to evaluate multiple synthetic data models on any selection of datasets
Last updated
Was this helpful?
Tip: You can use Benchmark to easily compare and analyze multiple synthetic generation algorithms (including, but not limited to, Gretel models) on multiple datasets. The Benchmark report provides Synthetic Data Quality Score (SQS) for each generated synthetic dataset, as well as train time, generate time, and total runtime (in secs).
To use Gretel Benchmark, install Benchmark through .
You can also use this notebook to get started without local installation:
Use this to start comparing multiple models with a few lines of code. These are the three steps for running Benchmark:
Use your local datasets or choose Gretel-curated datasets by name, datatype, or tag
Load Gretel models or create your own model class
Start a session to run each model with each dataset
At the highest level, starting Benchmark looks like this:
The sections below walk through the different ways you can set up datasets and models to run in Benchmark.
Each dataset in a Benchmark session is categorized with a Datatype
. There are three supported variants:
Datatype.TABULAR
| “tabular” :: most data falls in this category, like ratings, labels, or measurements
Datatype.TIME_SERIES
| “time_series” :: data with a time or date column, like stocks or price histories
Datatype.NATURAL_LANGUAGE
| “natural_language” :: data that is free or natural text, like reviews, conversations, and tweets. Note: natural language datasets should be single-column if using the Gretel GPT model.
To use your own data for evaluating synthetic models in Benchmark, use the create_dataset
function.
source
(required): A string path to a local CSV file, or a Pandas DataFrame object.
datatype
(required): A Benchmark Datatype
.
name
(required): A name used to identify the dataset. Each dataset in a Benchmark session must have a unique name.
delimiter
(optional): The delimiter character used in local CSV files. Defaults to comma.
You can select popular publicly available datasets and find datasets that best match your use case by name
, datatype
or tags
, provided by Gretel.
To access Gretel datasets, start by creating a repository instance:
The following instance methods can be called on the repository.
name
(required): The name of the dataset.
This function will raise an exception if no dataset exists with the supplied name
datatype
(optional): Datatype to filter on
tags
(optional): Tags to filter on. Various tags are applied to Gretel-curated datasets, see below
Some tags
include:
Data size: e.g. small, medium, large
Industry or category: e.g. finance, healthcare, e-commerce, marketing, ads, energy, government, environment, entertainment, telecom, employment, food
The list of models includes:
Auto: GretelAuto
Tabular Fine-Tuning: GretelNavigatorFT
Text Fine-Tuning: GretelGPTX
Tabular GAN: GretelACTGAN
Tabular DP: GretelTabularDP
Timeseries DGAN: GretelDGAN
If you want to get more fine-grained with Gretel model configurations than the default templates above, you can create customized Gretel models. If you have a valid, standalone model configuration you want to use, create a subclass of Benchmark’s GretelModel
class and specify the config
property of the class.
You can run Benchmark on any algorithm for synthetic data, not just Gretel models. To provide your own model implementation, define a Python class that meets this interface:
Implement your custom model in Python; any third-party libraries you use must be installed as dependencies wherever you are running Benchmark.
You can include either classes or instances in the models
list argument. If a class is provided, Benchmark will initialize a unique instance of the class for each run with that model. If an instance is provided, the same instance will be shared across runs, so any internal state will accumulate.
Start a Benchmark session by combining datasets and models.
datasets
(required): List of datasets acquired via create_dataset
, repo.get_gretel_dataset
, and/or repo.list_gretel_datasets
.
models
(required): List of models. This list can include both Gretel and custom models, both of which are documented further below.
config
(optional): The BenchmarkConfig
object supports customizing various aspects of the Benchmark session.
The Session
will start immediately. You can call the call the following properties and methods on the Session
object:
You can define your own BenchmarkConfig
to override default settings about your Benchmark session. Every field is optional.
project_display_name
(string): The display name to use for the project that gets automatically created in Gretel Console and under which all work is performed.
refresh_interval
(integer): How often to poll the Gretel API for updates on jobs.
trainer
(boolean): Whether to use Gretel Trainer instead of the standard Gretel Python client.
working_dir
(string or Path): Where to store working files (e.g. datasets, output synthetic data).
additional_report_scores
(list[string)): Fields from Gretel Evaluate to include in the final results dataframe. (SQS will always be included.)
If you encounter Skipped
for a given model on a given dataset, this indicates that the data format was unsuitable for that model. Please see the documentation on Gretel models if you were using a Gretel model for more information on acceptable data formats.
Some jobs may take a while to finish running - don't despair! If your input data is a particularly large file, models may take multiple hours to run. You can check on the status of each job using session.results
to evaluate if any jobs have failed.
When you’re running Benchmark, in addition to Gretel Trainer and Client SDK logs, you can also check on the status of each job anytime with the command session.results
The Benchmark results report provides: rows
, columns
, Synthetic Data Quality Score (SQS)
, train time (sec)
, generate time (sec)
, total runtime (sec)
The data shape (rows
and columns
) of the generated data will match the input dataset
Runtimes are reported as train time (sec)
for model training, generate time (sec)
for generating synthetic data, and total runtime (sec)
For more, check out the Benchmark Report of Gretel models on some popular publicly available ML datasets, categorized by industry.
You can use Gretel models to get Benchmark results on any dataset. Some models are better than others for synthesizing different types of data. See the page to learn about the strengths, limitations, and requirements for each model.
Out of the box, Benchmark exports objects wrapping several of our models' . These class objects can be included in the list of models
used to start a Benchmark session. For example:
A model may have failed to either train or generate due to a variety of reasons. For best results when using a Gretel model, check out our and these !
Tip: Every job kicked off in Benchmark can also be viewed in the while the job is running. In the Gretel Console, you can find more info about the projects including: (1) whether the model is active or stuck in pending, (2) what epoch training is in, and more.
Learn more about interpreting the