Synthetic Data Quality Score (SQS)
How to evaluate any two datasets to generate a Gretel Synthetic Data Quality Report.
Remember, we suggest using your synthetic data as in-data
and the data you wish to compare it with should be the ref-data
parameter.
For more details about how to interpret and utilize the report, please see our Synthetic Data Quality page.
CLI
Because sqs
is the default evaluation task type, you can simply reference the default evaluate
configuration via the GitHub blueprint shortcut: evaluate/default
.
The CLI usage to create an Quality Report is:
This will upload both datasets to Gretel Cloud, generate the report, and download the report artifacts to the report-dir
directory. Within this directory, the artifacts of interest are:
report.html.gz
which is an HTML document that contains the full reportreport_json.json.gz
which is a JSON version of the report
If you wish for this job to launch on your local host (from where you are running the command) you may add the --runner local
flag.
SDK
The Gretel SDK provides Python classes specifically to run reports. The QualityReport()
class uses evaluate
with sqs
task type generate a Synthetic Data Quality Report. The most basic usage is below:
If you do not specify a project
parameter when using the QualityReport()
class, then a temporary project will be created and deleted after the report finishes and the artifacts are downloaded. This slightly differs from CLI behavior where temporary projects are not used.
For more usage examples with the SDK, please see the following Jupyter Notebook.
Last updated