Analyze the quality and utility of synthetic data.
Gretel provides jobs that enable evaluation of synthetic data quality and privacy. This job (or model type) is referred to as
evaluatein Gretel Configurations.
You may utilize Gretel Evaluate to compare and analyze any datasets. There are no restrictions around only evaluating synthetic data that was created by Gretel.
evaluatefamily of jobs, the following evaluation tasks are available. They can be specified within the Gretel Configuration under the
- Validate data quality with the Synthetic Data Quality Score (SQS), task type:
- Analyze performance on classification models with the classification ML Quality Score (MQS), task:
- Analyze performance on regression models with the regression ML Quality Score (MQS), task:
The specific evaluation task should be declared in the Gretel Configuration. By default, if a specific Evaluate task is not specified,
sqswill be used.
The two configurations below are effectively identical:
To evaluate synthetic data on classification and regression models, use:
task: classification # or: regression
target: "target_col" # change this to match your data!
It is important to note that
evaluatejobs are created using Gretel's Model interface. However, these models cannot be "run" so the
gretel models runor SDK Record Handler creation steps will return an error if used.
evaluatejobs are single-purpose jobs, so only model creation workflows should be used.
There are some additional considerations when running
evaluatejobs through the Gretel CLI and SDK. Let's take a look at a CLI command signature below:
gretel models create --config CONFIG --in-data synthetic.csv --ref-data real-world.csv --output report-dir
Unlike other Gretel models, some of the evaluation tasks may require more than one dataset. For example, SQS requires two input datasets. The
ref_datain the SDK) allows the use of additional datasets. The datasets can be in CSV, JSON, or JSONL format.
evaluate, we recommend using:
in_datafor the synthetic data under evaluation
ref_datafor the comparison data, such as a real-world dataset.
For SDK usage, please see the specific evaluation task that you are interested in. We have created dedicated classes in our SDK for ease of use.