Benchmark
Gretel Benchmark is a source-available framework to evaluate multiple synthetic data models on any selection of datasets
Tip: You can use Benchmark to easily compare and analyze multiple synthetic generation algorithms (including, but not limited to, Gretel models) on multiple datasets. The Benchmark report provides Synthetic Data Quality Score (SQS) for each generated synthetic dataset, as well as train time, generate time, and total runtime (in secs).
! pip install -Uqq gretel-trainer
from gretel_trainer import benchmark
You can also use this notebook to get started without local installation:
Use this quickstart notebook to start comparing multiple models with a few lines of code. These are the three steps for running Benchmark:
- Use your local datasets or choose ones by name, datatype, or tag
- Load Gretel models or create your own model class
- Run
comparison
to get the results.- While jobs are running, you can check on the status with
comparison.results
- When everything is finished, you can download a CSV of the Benchmark report using
comparison.export_results
class Datatype(Enum)
# Qualitative description of a dataset. Public functions documented below generally accept either a Datatype variant or its string representation.
Valid variants for
datatype
are:- TABULAR_MIXED | “tabular_mixed”
- TABULAR_NUMERIC | “tabular_numeric”
- TIME_SERIES | “time_series”
- NATURAL_LANGUAGE | “natural_language”
def make_dataset(
sources: Union[List[str], List[pd.DataFrame]],
*,
datatype: Union[Datatype, str],
delimiter: str = ",",
namespace: Optional[str] = None) -> Dataset
# Creates a dataset from local data.
sources
(required): A list of string paths to local CSV files, or a list of in-memory Pandas DataFrame objects. Raises an exception if:- List is empty.
- List includes incorrect types.
- List is not homogenous. All values in the list must be of the same type; mixing and matching files and DataFrames is not allowed. To include both files and DataFrames in a single comparison, make two separate datasets.
datatype
(required): Must be either a Datatype enum variant or a valid string representation.delimiter
(optional): The delimiter character used in local CSV files. Defaults to comma.namespace
(optional): A prefix to add to input data names in the eventual results output. (Particularly useful for DataFrame datasets, as those objects do not have “names.”)
def get_gretel_dataset(name: str) -> Dataset
# Fetches a Gretel-curated dataset from Gretel’s S3 bucket
name
(required): The name of the dataset.- This function will raise an exception if no dataset exists with the supplied name
def list_gretel_datasets(
datatype: Optional[Union[Datatype, str]] = None,
tags: Optional[List[str]] = None) -> List[Dataset]
# Returns a list of Gretel-curated datasets matching the specified datatype and tags. Uses "and" semantics—i.e. only returns datasets that match all supplied values.
datatype
(optional): Datatype to filter ontags
(optional): Tags to filter on. Various tags are applied to Gretel-curated datasets, see below
def list_gretel_dataset_tags() -> List[str]
# List all unique tags across all Gretel-curated datasets
def compare(
*,
datasets: List[Dataset],
models: List[Model],
auto_clean: bool=True) -> Comparison
# Begin execution of supplied datasets against supplied models. Returns a Comparison object, documented below.
datasets
(required): List of datasets acquired viamake_dataset
,get_gretel_dataset
, and/orlist_gretel_datasets
.models
(required): List of models.- For Gretel models, provide the class name, e.g.
GretelACTGAN
,GretelLSTM
. All available model classes are exported from the main Benchmark module:from gretel_trainer.benchmark import GretelACTGAN
- For custom model implementations, if your model does not take any initialization arguments, pass the class name. Alternatively, create an instance of your model elsewhere and pass it wrapped in a lambda to
compare
.
auto_clean
(optional): If set to False, does not delete projects created by Benchmark in Gretel Cloud, nor removes the local cache directory (.benchmark). Defaults to True.
comparison.is_complete -> bool
# Returns true when all runs (dataset X model) in the comparison reach a terminal state (Completed, Failed, or Skipped)
comparison.results() -> pd.DataFrame
# Returns a Pandas DataFrame showing the progress and/or results of all runs.
comparison.export_results(destination: str)
# Exports results as a CSV
# -- `destination` (required): where to write the CSV file
comparison.wait()
# Blocks until all runs are complete
To use your own data for evaluating synthetic models in Benchmark, use the
make_dataset
function: make_dataset(
sources: Union[List[str], List[pd.DataFrame]],
*,
datatype: Union[Datatype, str],
delimiter: str=",",
namespace: Optional[str]=None)
You can also use S3 connectors to connect your data. Check out these tutorials for how to connect data from Snowflake and more.
You can select popular publicly available datasets and find datasets that best match your use case by
name
, datatype
or tags
, provided by Gretel.datatype
is one of:tabular_numeric
: data that is only numeric, like ratings or measurementstabular_mixed
: data that is a mix of numeric and text, like labels along with ratings or measurementsnatural_language
: data that is free or natural text, like reviews, conversations, and tweets. Note: natural language datasets should be single-column if using the Gretel GPT model.time_series
: data with a time or date column, like stocks or price histories
tags
include:- Data size: e.g.
small, medium, large
- Industry or category: e.g.
finance, healthcare, e-commerce, marketing, ads, energy, government, environment, entertainment, telecom, employment, food
To list all the available datasets, use:
list_gretel_datasets(datatype: Optional[Union[Datatype, str]] = None, tags: Optional[List[str]] = None)
To select a dataset, use:
get_gretel_dataset(name: str)
You can use Gretel models to get Benchmark results on any dataset. Some models are better than others for synthesizing different types of data. To get the best results, follow this guide:
- GretelAuto
- This model will automatically pick the best solution between Gretel LSTM and Gretel ACTGAN for your dataset (see more below on the two models). This can be helpful if you want the Gretel engine to select the best model based on characteristics of your dataset.
- This model works for a variety of synthetic data tasks including time-series, tabular, and text data. Generally useful for a few thousand records and upward. Dataset generally has a mix of categorical, continuous, and numerical values.
- Data requirements: Source data should have <150 columns. We recommend using Gretel ACTGAN for high dimensional data.
- This model works well for high dimensional, largely numeric data. Use for datasets with more than 20 columns and/or 50,000 rows.
- Data requirements: Not ideal if dataset contains free text field
- This model is useful for natural language or plain text datasets such as reviews, tweets, and conversations.
- Data requirements: Dataset must be single-column
- This model is great for generating lots of data quickly.
- Note: Gretel Amplify is not a neural network model, but instead uses statistical means to generate lots of data from an input dataset. The SQS for data generated using Gretel Amplify may be lower.
For more on using Gretel models, refer to blueprints for popular use cases, notebooks and documentation.
The Gretel model classes above are wrappers around several of our default templates. To include a Gretel model with a non-default configuration, create a subclass of Benchmark’s
GretelModel
class and specify the config
property of the class.class CustomizedLSTM(GretelModel):
config = {...} # define configuration here or set string path to local config file
You can run Benchmark on any algorithm for synthetic data, not just Gretel models. To provide your own model implementation, define a Python class that meets this interface:
class MyCustomModel:
def train(self, source: str, **kwargs) -> None:
# your training code here
def generate(self, **kwargs) -> pd.DataFrame:
# your generation code here
Implement your custom model in Python; any third-party libraries you use must be installed as dependencies wherever you are running Benchmark.
Note: Custom model implementations run synchronously and sequentially—Benchmark will not parallelize the work in background threads like it does for Gretel models.
The
compare
function expects a list of “zero-argument constructors” for its models
argument, and calls these constructors under the hood. For an implementation like the one above, the class itself is a valid constructor—you pass the MyCustomModel
class to compare
, and compare
calls MyCustomModel()
to get an instance of the model. If your model class has an __init__
function that requires arguments, you should instantiate your model first and pass it wrapped in a lambda
to compare
:class AnotherCustomModel
def __init__(self, count: int):
self.count = count
def train(self, source: str, **kwargs) -> None:
pass
def generate(self, **kwargs) -> pd.DataFrame:
pass
my_model = AnotherCustomModel(42)
compare(datasets=[...], models=[lambda: my_model])
If you encounter
Skipped
for a given model on a given dataset, this indicates that the data format was unsuitable for that model. Please see the documentation on Gretel models if you were using a Gretel model for more information on acceptable data formats. A model may have failed to either train or generate due to a variety of reasons. For best results when using a Gretel model, check out our documentation on models and these blueprints!
Some jobs may take a while to finish running - don't despair! If your input data is a particularly large file, models may take multiple hours to run. You can check on the status of each job using
comparison.results
to evaluate if any jobs have failed.Tip: Every job kicked off in Benchmark can also be viewed in the Gretel Console while the job is running. In the Gretel Console, you can find more info about the projects including: (1) whether the model is active or stuck in pending, (2) what epoch training is in, and more.
When you’re running Benchmark, in addition to Gretel Trainer and Client SDK logs, you can also check on the status of each job anytime with the command
comparison.results
The Benchmark results report provides:
rows
, columns
, Synthetic Data Quality Score (SQS)
, train time (sec)
, generate time (sec)
, total runtime (sec)
- The data shape (
rows
andcolumns
) of the generated data will match the input dataset - Runtimes are reported as
train time (sec)
for model training,generate time (sec)
for generating synthetic data, andtotal runtime (sec)
For more, check out the Benchmark Report of Gretel models on some popular publicly available ML datasets, categorized by industry.
Last modified 5mo ago