Gretel Benchmark
Gretel Benchmark is a source-available framework to evaluate multiple synthetic data models on any selection of datasets
Tip: You can use Benchmark to easily compare and analyze multiple synthetic generation algorithms (including, but not limited to, Gretel models) on multiple datasets. The Benchmark report provides Synthetic Data Quality Score (SQS) for each generated synthetic dataset, as well as train time, generate time, and total runtime (in secs).
Be sure to sign up for a free Gretel account to use Benchmark!
Installation
To use Gretel Benchmark, install Benchmark through Gretel Trainer.
You can also use this notebook to get started without local installation:
Quick Start
Use this quickstart notebook to start comparing multiple models with a few lines of code. These are the three steps for running Benchmark:
Use your local datasets or choose Gretel-curated datasets by name, datatype, or tag
Load Gretel models or create your own model class
Start a session to run each model with each dataset
At the highest level, starting Benchmark looks like this:
The sections below walk through the different ways you can set up datasets and models to run in Benchmark.
Datasets
Each dataset in a Benchmark session is categorized with a Datatype
. There are three supported variants:
Datatype.TABULAR
| “tabular” :: most data falls in this category, like ratings, labels, or measurementsDatatype.TIME_SERIES
| “time_series” :: data with a time or date column, like stocks or price historiesDatatype.NATURAL_LANGUAGE
| “natural_language” :: data that is free or natural text, like reviews, conversations, and tweets. Note: natural language datasets should be single-column if using the Gretel GPT model.
Generally speaking, functions expecting a Datatype
argument can accept either the enum variant object or its string representation (in lower_snake_case
).
Using Your Data
To use your own data for evaluating synthetic models in Benchmark, use the create_dataset
function.
source
(required): A string path to a local CSV file, or a Pandas DataFrame object.datatype
(required): A BenchmarkDatatype
.name
(required): A name used to identify the dataset. Each dataset in a Benchmark session must have a unique name.delimiter
(optional): The delimiter character used in local CSV files. Defaults to comma.
Using Gretel Datasets
You can select popular publicly available datasets and find datasets that best match your use case by name
, datatype
or tags
, provided by Gretel.
To access Gretel datasets, start by creating a repository instance:
The following instance methods can be called on the repository.
name
(required): The name of the dataset.This function will raise an exception if no dataset exists with the supplied name
datatype
(optional): Datatype to filter ontags
(optional): Tags to filter on. Various tags are applied to Gretel-curated datasets, see below
Some tags
include:
Data size: e.g.
small, medium, large
Industry or category: e.g.
finance, healthcare, e-commerce, marketing, ads, energy, government, environment, entertainment, telecom, employment, food
Models
Gretel Models
You can use Gretel models to get Benchmark results on any dataset. Some models are better than others for synthesizing different types of data. To get the best results, follow this guide:
This model works for a variety of synthetic data tasks including time-series, tabular, and text data. Generally useful for a few thousand records and upward. Dataset generally has a mix of categorical, continuous, and numerical values.
Data requirements: Source data should have <150 columns. We recommend using Gretel ACTGAN for high dimensional data.
This model works well for high dimensional, largely numeric data. Use for datasets with more than 20 columns and/or 50,000 rows.
Data requirements: Not ideal if dataset contains free text field
This model is useful for natural language or plain text datasets such as reviews, tweets, and conversations.
Data requirements: Dataset must be single-column
This model is great for generating lots of data quickly.
Note: Gretel Amplify is not a neural network model, but instead uses statistical means to generate lots of data from an input dataset. The SQS for data generated using Gretel Amplify may be lower.
For more on using Gretel models, refer to our blueprints and example notebooks for popular use cases.
Out of the box, Benchmark exports objects wrapping several of our models' default templates. These class objects can be included in the list of models
used to start a Benchmark session. For example:
Specific Gretel Model Configurations
If you want to get more fine-grained with Gretel model configurations than the default templates above, you can create customized Gretel models. If you have a valid, standalone model configuration you want to use, create a subclass of Benchmark’s GretelModel
class and specify the config
property of the class.
Custom Models
You can run Benchmark on any algorithm for synthetic data, not just Gretel models. To provide your own model implementation, define a Python class that meets this interface:
Implement your custom model in Python; any third-party libraries you use must be installed as dependencies wherever you are running Benchmark.
Note: Custom model implementations run synchronously and sequentially—Benchmark will not parallelize the work in background threads like it does for Gretel models.
You can include either classes or instances in the models
list argument. If a class is provided, Benchmark will initialize a unique instance of the class for each run with that model. If an instance is provided, the same instance will be shared across runs, so any internal state will accumulate.
Session
Start a Benchmark session by combining datasets and models.
datasets
(required): List of datasets acquired viacreate_dataset
,repo.get_gretel_dataset
, and/orrepo.list_gretel_datasets
.models
(required): List of models. This list can include both Gretel and custom models, both of which are documented further below.config
(optional): TheBenchmarkConfig
object supports customizing various aspects of the Benchmark session.
The Session
will start immediately. You can call the call the following properties and methods on the Session
object:
Config
You can define your own BenchmarkConfig
to override default settings about your Benchmark session. Every field is optional.
project_display_name
(string): The display name to use for the project that gets automatically created in Gretel Console and under which all work is performed.refresh_interval
(integer): How often to poll the Gretel API for updates on jobs.trainer
(boolean): Whether to use Gretel Trainer instead of the standard Gretel Python client.working_dir
(string or Path): Where to store working files (e.g. datasets, output synthetic data).additional_report_scores
(list[string)): Fields from Gretel Evaluate to include in the final results dataframe. (SQS will always be included.)
Errors
If you encounter Skipped
for a given model on a given dataset, this indicates that the data format was unsuitable for that model. Please see the documentation on Gretel models if you were using a Gretel model for more information on acceptable data formats.
A model may have failed to either train or generate due to a variety of reasons. For best results when using a Gretel model, check out our documentation on models and these blueprints!
Some jobs may take a while to finish running - don't despair! If your input data is a particularly large file, models may take multiple hours to run. You can check on the status of each job using session.results
to evaluate if any jobs have failed.
Tip: Every job kicked off in Benchmark can also be viewed in the Gretel Console while the job is running. In the Gretel Console, you can find more info about the projects including: (1) whether the model is active or stuck in pending, (2) what epoch training is in, and more.
Logs
When you’re running Benchmark, in addition to Gretel Trainer and Client SDK logs, you can also check on the status of each job anytime with the command session.results
Benchmark Results
The Benchmark results report provides: rows
, columns
, Synthetic Data Quality Score (SQS)
, train time (sec)
, generate time (sec)
, total runtime (sec)
The data shape (
rows
andcolumns
) of the generated data will match the input datasetLearn more about interpreting the Synthetic Data Quality Score (SQS)
Runtimes are reported as
train time (sec)
for model training,generate time (sec)
for generating synthetic data, andtotal runtime (sec)
For more, check out the Benchmark Report of Gretel models on some popular publicly available ML datasets, categorized by industry.
Last updated