Gretel Safe Synthetics

Reference docs for Gretel Safe Synthetics.

Gretel Safe Synthetics allows you to create private versions of your sensitive data. You can use Safe Synthetics to redact and replace sensitive Personally Identifiable Information (PII) with Transform, obfuscate quasi-identifiers with Synthetics, and apply differential privacy for mathematical guarantees of privacy protection. Once your data is generated, Gretel will automatically generate an evaluation report to help measure the quality and privacy of your synthetic data.

You can run a Safe Synthetics workflow by combining the steps that are relevant to you. The recommended flow runs Transform to replace and protect true identifiers, followed by Synthetics to protect quasi-identifying information.

Transform

Gretel’s Transform model combines data classification with data transformation to easily detect and anonymize or mutate sensitive data. Gretel’s data classification can detect a variety of Supported Entities such as PII, which can be used for defining transforms.

We generally recommend combining Gretel Transform with Gretel Synthetics to redact or replace sensitive data before training a synthetics model. This ensures that there is no possibility the model can learn the sensitive PII.

You can find out more about Gretel Transform here.

Synthetics

Gretel's Synthetics models generate synthetic datasets that mimic the statistical properties of real-world data, but without containing any actual real-world observations.

The models are trained to understand the patterns, distributions, and relationships within and across each column of the real-world data. After, synthetic records are generated that match those statistical properties, without any one-to-one mapping to original records.

Gretel offers the following synthetics models:

Tabular Fine-Tuning - Gretel’s flagship LLM-based model for generating privacy-preserving, real-world quality synthetic data across numeric, categorical, text, JSON, and event-based tabular data with up to ~50 columns.
1. Data types: Numeric, categorical, text, JSON, event-based
2. Differential privacy: Optional
Text Fine-Tuning - Gretel’s model for generating privacy-preserving synthetic text using your choice of top performing open-source models.
1. Data types: Text
2. Differential privacy: Optional
Tabular GAN - Gretel’s model for quickly generating synthetic numeric and categorical data for high-dimensional datasets (>50 columns) while preserving relationships between numeric and categorical columns.
1. Data types: Numeric, categorical
2. Differential privacy: NOT supported
Tabular DP - Gretel’s model for generating differentially-private data with very low epsilon values (maximum privacy). It is best for basic analytics use cases (e.g. pairwise modeling), and runs on CPU. If your use case is training an ML model to learn deep insights in the data, Tabular Fine-Tuning is your best option.
1. Data types: Numeric, categorical
2. Differential privacy: Required; you cannot run without differential privacy

You can learn more about Gretel Synthetics models here.

Which models are right for your use case?

You can use the flow chart below to help determine whether Transform, Synthetics (with or without Differential Privacy), or the combination is best for your use case.

If you decided that you should use Synthetics as part of your use case, you can use the next flow chart to help determine which Synthetics model may be best.

PreviousPython SDKs NextTransform

Last updated 4 months ago

Was this helpful?