Synthetic Data Quality Score (SQS)

How to evaluate any two datasets to generate a Gretel Synthetic Data Quality Report.

Remember, we suggest using your synthetic data as in-data and the data you wish to compare it with should be the ref-data parameter.

For more details about how to interpret and utilize the report, please see our Synthetic Data Quality page.

CLI

Because sqs is the default evaluation task type, you can simply reference the default evaluate configuration via the GitHub blueprint shortcut: evaluate/default.

The CLI usage to create an Quality Report is:

$ gretel models create --config evaluate/default --in-data synthetic.csv --ref-data compare.csv --output report-dir 

This will upload both datasets to Gretel Cloud, generate the report, and download the report artifacts to the report-dir directory. Within this directory, the artifacts of interest are:

  • report.html.gz which is an HTML document that contains the full report

  • report_json.json.gz which is a JSON version of the report

If you wish for this job to launch on your local host (from where you are running the command) you may add the --runner local flag.

SDK

The Gretel SDK provides Python classes specifically to run reports. The QualityReport() class uses evaluate with sqs task type generate a Synthetic Data Quality Report. The most basic usage is below:

from gretel_client.evaluation.quality_report import QualityReport

# NOTE: These data sources may also be Pandas DataFrames!
data_source = "synthetic.csv"
ref_data = "compare.csv"

report = QualityReport(data_source=data_source, ref_data=ref_data)
report.run() # this will wait for the job to finish

# This will return the full report JSON details
report.as_dict

# This will return the full HTML contents of the report
report.as_html

If you do not specify a project parameter when using the QualityReport() class, then a temporary project will be created and deleted after the report finishes and the artifacts are downloaded. This slightly differs from CLI behavior where temporary projects are not used.

For more usage examples with the SDK, please see the following Jupyter Notebook.

Last updated