Evaluate

Analyze the quality and utility of synthetic data.

Overview

By default, Gretel runs an Evaluate step at the end of your Safe Synthetics workflow. This step returns a Synthetic Quality and Privacy Report.

In the Console Builder, you can indicate that you want a Quality & Privacy Report generated by checking the box for "Generate Quality & Privacy Report." We recommend choosing this option to help you quickly analyze the results of your Safe Synthetics run.

Several of the report metrics rely on a holdout dataset in order to be computed. We recommend applying the holdout in order to get results for these metrics, unless you have less than 500 rows of data. In that case, we recommend turning off the holdout in order to use all of the data for training. In the Console Builder, you can apply the holdout by checking the box under "Data holdout."

If you need to set a group by parameter for Holdout (if you have event-driven data) you can do so by going to the Advanced tab in builder and adding a parameter inside of the Holdout step. The below code sets a maximum holdout of 2000 records and groups by values in the column "state."

steps:
  - name: holdout
    task: holdout
    inputs: [{file_id}]
    config:
      holdout: 0.05
      max_holdout: 2000
      group_by: "state"

In the SDK, both the Holdout and Evaluate steps happen by default. The standard template for Safe Synthetics, which includes both Holdout and Evaluate, is:

synthetic_dataset = gretel.safe_synthetic_dataset\
    .from_data_source(ds) \
    .transform() \
    .synthesize() \
    .create()

If you want to turn off the holdout, you can do so from the .from_data_source() step.

synthetic_dataset = gretel.safe_synthetic_dataset\
    .from_data_source(ds, holdout=None) \
    .transform() \
    .synthesize() \
    .create()

Similarly, if you want to adjust the amount of the holdout, that can also be done from the .from_data_source() step. By default, the holdout is set to 5%.

synthetic_dataset = gretel.safe_synthetic_dataset\
    .from_data_source(ds, holdout=0.1) \
    .transform() \
    .synthesize() \
    .create()

If you want to turn off Evaluate, you must explicitly disable it:

synthetic_dataset = gretel.safe_synthetic_dataset\
    .from_data_source(ds) \
    .transform() \
    .synthesize() \
    .evaluate(disable=True) \
    .create()

PreviousPrivacy Protection NextSynthetic Quality & Privacy Report

Last updated 6 months ago

Was this helpful?