Comment on page
Synthetic Data Quality Score (SQS)
How to evaluate any two datasets to generate a Gretel Synthetic Data Quality Report.
Remember, we suggest using your synthetic data as
in-dataand the data you wish to compare it with should be the
sqsis the default evaluation task type, you can simply reference the default
evaluateconfiguration via the GitHub blueprint shortcut:
The CLI usage to create an Quality Report is:
$ gretel models create --config evaluate/default --in-data synthetic.csv --ref-data compare.csv --output report-dir
This will upload both datasets to Gretel Cloud, generate the report, and download the report artifacts to the
report-dirdirectory. Within this directory, the artifacts of interest are:
report.html.gzwhich is an HTML document that contains the full report
report_json.json.gzwhich is a JSON version of the report
If you wish for this job to launch on your local host (from where you are running the command) you may add the
from gretel_client.evaluation.quality_report import QualityReport
# NOTE: These data sources may also be Pandas DataFrames!
data_source = "synthetic.csv"
ref_data = "compare.csv"
report = QualityReport(data_source=data_source, ref_data=ref_data)
report.run() # this will wait for the job to finish
# This will return the full report JSON details
# This will return the full HTML contents of the report
If you do not specify a
projectparameter when using the
QualityReport()class, then a temporary project will be created and deleted after the report finishes and the artifacts are downloaded. This slightly differs from CLI behavior where temporary projects are not used.