Report
Assess the accuracy and privacy of your synthetic database.
Last updated
Assess the accuracy and privacy of your synthetic database.
Last updated
To use data, you need to trust it. The Gretel Relational Report provides unique accuracy and privacy scores to help you verify the quality of your synthetic database. In addition to overall database scores, the report provides table-level insights that measure how well both in-table and cross-table relationships are maintained. This consumable report provides confidence that your data is accurate and secure.
At the top of the report, composite and for the database are shown. These are composite scores that represent the accuracy and privacy of the whole database. Later in the report, scores are also provided for each table.
The Synthetic Data Quality Score (SQS) is an estimate of how well the generated synthetic data maintains the same statistical properties as the original dataset. In this sense, the SQS can be viewed as a utility score or a confidence score as to whether scientific conclusions drawn from the synthetic database would be the same if one were to have used the original database instead. If you do not require statistical symmetry, as might be the case in a testing or demo environment, a lower score may be just as acceptable. If your SQS is not as high as you'd like it to be, check out our Tips to Improve Synthetic Data Quality.
The Privacy Protection Level (PPL) is determined by the model chosen for synthesis. Gretel Relational Synthetics support Gretel LSTM, Gretel ACTGAN, Gretel Tabular DP, and Gretel Amplify. In general, Tabular DP will have the highest privacy scores, followed by LSTM and ACTGAN, and finally Amplify. By nature, synthetic data is inherently more private than real-world data, so even a synthetic database with a Normal
PPL is more secure than non-synthesized database. When sharing data internally within a company, a PPL of Normal
or better is recommended. When sharing data outside of your organization, we recommend a PPL of Very Good
or higher.
If your privacy score is not as high as you'd like it to be, combining https://github.com/Gretellabs/docs/blob/main/models/relational/relational-transform.md with https://github.com/Gretellabs/docs/blob/main/models/relational/relational-synthetics.md is an excellent way to ensure the highest possible privacy for your data. See https://github.com/Gretellabs/docs/blob/main/models/relational/relational-transform.md#transform-and-synthesize-a-databasefor more information.
The report includes a visual of the key relationships between tables in the database, as shown below. When the cursor is hovered over a key, its related keys and tables highlight.
For each table, individual and cross-table Synthetic Data Qualitys are generated, which include additional quality scores. The individual report evaluates the statistical accuracy of the individual synthetic table compared to the real world table it is based on. This provides insight into the quality of the stand-alone synthetic table. The cross-table report evaluates the synthetic data of the table and all its ancestor tables. This provides insight into the accuracy of the table in the context of the database as a whole.
The individual, cross-table, and relational reports are all bundled in the gretel_tabular
output archive file, which can be found in Gretel Console under the Data Sources tab of your project.