Evaluate synthetic data vs. real-world data on classification models

The classification Evaluate task will generate a Gretel Synthetic Data Utility Report.

Learn more about the sections of the Data Utility Report

Customers frequently ask whether synthetic data is of high enough quality to train downstream machine learning tasks. Classifiers, for example, require highly accurate data before they can be usefully deployed.

The Gretel Evaluate Classification task uses the open source AutoML PyCaret library under the hood to evaluate the quality of your generated synthetic data on commonly used downstream machine learning classifiers, and gives you the results in an easy-to-understand HTML report.

Low-code using Gretel Console

You can kick off this evaluation directly in the Gretel Console. Start by using this example: Generate synthetic data + evaluate ML performance

This example includes a sample dataset (the publicly available bank marketing dataset) and the default blueprint:

  • Gretel LSTM model to generate synthetic data

  • classification Evaluate task with default parameters

        task: classification
        # holdout: null       # Default train-test split for model training = 0.2
        # metric: null        # Default metric = "acc"
        # models: null        # By default all available models are trained 

You can leave the config as is and simply click "Begin training" or edit the configuration with the synthetic model and optional classification parameters best suited for your use case.

Supported models and metrics

By default, all models will be used in the classifier model training. You can select specific models to use by passing in a list of strings from the following set:

# All classification models
classification_models = [

If you want to change the metric that the classifiers will use to optimize for, you can select one metric from classification_metrics below. The default metric is "acc" (accuracy).

# Select a metric
classification_metrics = [


You can use the classification Evaluate task in two ways:

1. As a parameter of a Gretel synthetics model, or

2. Compare two datasets directly: a synthetic dataset and a real-world dataset

Option 1: Train and generate synthetic data, then evaluate on classification models

Here's a basic example generating synthetic data using Gretel ACTGAN and the real-world bank marketing dataset, then adding classification evaluation to create the Data Utility Report:

from gretel_client.helpers import poll
from gretel_client.projects.models import read_model_config
from gretel_client.projects import create_or_get_unique_project

# Create a project with a name that describes this use case
project = create_or_get_unique_project(name="bank-classification-docs-example")

# Import the bank_marketing_small dataset from Gretel's public S3 bucket
dataset_path = ""

# Add Evaluate task to Gretel ACTGAN config
config = read_model_config("synthetics/tabular-actgan")

config["models"][0]["actgan"]["evaluate"] = {
    "task": "classification",
    "target": "y"

You can then run the model and save the report using:

## Train and run the model
model = project.create_model_obj(



# Save all artifacts

Even when using the Evaluate SDK, you can find model details and report download options in the Gretel Console -- simply navigate to the bank-marketing-classification-example project.

Option 2: BYO synthetic and real data to compare

If you already have generated synthetic data in the form of a CSV, JSON(L) or Pandas Dataframe, you can also use this Evaluate task to analyze the two datasets.

The Gretel SDK provides Python classes specifically to run reports. The DownstreamClassificationReport() class uses evaluate with classification task to generate a Data Utility Report. A basic usage is below:

# Use Evaluate SDK
from gretel_client.evaluation.downstream_classification_report import DownstreamClassificationReport

# Create a project 
from gretel_client.projects import create_or_get_unique_project
project = create_or_get_unique_project(name="evaluate-bank-classification-example-2") 

# Params
# NOTE: These data sources may also be Pandas DataFrames!
data_source = "synthetic.csv"
ref_data = "real.csv"

# Target to predict, REQUIRED for evaluate model
target = 'y' 

# Default holdout value
# test_holdout = 0.2

# Supply a subset if you do not want all of these, default is to use all of them
# models = classification_models

# Metric to use for ordering results, default is "acc" (Accuracy) for classification
# metric = "acc"

# Create a downstream classification report
evaluate = DownstreamClassificationReport(
    # holdout=test_holdout,
    # models=models,
    # metric=metric,
    # runner_mode="cloud",

For more examples, please follow the Jupyter notebook or open in Google Colab.

Gretel Synthetic Data Utility Report

The Evaluate task creates a Data Utility Report with the results of the analysis. You'll see a high-level ML Quality Score (MQS) which gives you an at-a-glance understanding of how your synthetic dataset performed. For more info about the report, checkout this page about each section.

Logs and Results

You can view logs both in the SDK environment or go to the project in the Gretel Console to follow along with the model training progress and download the results of the evaluation.


Last updated