Gretel Safe Synthetics
Reference docs for Gretel Safe Synthetics.
Last updated
Was this helpful?
Reference docs for Gretel Safe Synthetics.
Last updated
Was this helpful?
Gretel Safe Synthetics allows you to create private versions of your sensitive data. You can use Safe Synthetics to redact and replace sensitive Personally Identifiable Information (PII) with Transform, obfuscate quasi-identifiers with Synthetics, and apply differential privacy for mathematical guarantees of privacy protection. Once your data is generated, Gretel will automatically generate an evaluation report to help measure the quality and privacy of your synthetic data.
You can run a Safe Synthetics workflow by combining the steps that are relevant to you. The recommended flow runs Transform to replace and protect true identifiers, followed by Synthetics to protect quasi-identifying information.
Gretel’s Transform model combines data classification with data transformation to easily detect and anonymize or mutate sensitive data. Gretel’s data classification can detect a variety of such as PII, which can be used for defining transforms.
We generally recommend combining Gretel Transform with Gretel Synthetics to redact or replace sensitive data before training a synthetics model. This ensures that there is no possibility the model can learn the sensitive PII.
You can find out more about Gretel Transform .
Gretel's Synthetics models generate synthetic datasets that mimic the statistical properties of real-world data, but without containing any actual real-world observations.
The models are trained to understand the patterns, distributions, and relationships within and across each column of the real-world data. After, synthetic records are generated that match those statistical properties, without any one-to-one mapping to original records.
Gretel offers the following synthetics models:
- Gretel’s flagship LLM-based model for generating privacy-preserving, real-world quality synthetic data across numeric, categorical, text, JSON, and event-based tabular data with up to ~50 columns.
Data types: Numeric, categorical, text, JSON, event-based
Differential privacy: Optional
- Gretel’s model for generating privacy-preserving synthetic text using your choice of top performing open-source models.
Data types: Text
Differential privacy: Optional
- Gretel’s model for quickly generating synthetic numeric and categorical data for high-dimensional datasets (>50 columns) while preserving relationships between numeric and categorical columns.
Data types: Numeric, categorical
Differential privacy: NOT supported
- Gretel’s model for generating differentially-private data with very low epsilon values (maximum privacy). It is best for basic analytics use cases (e.g. pairwise modeling), and runs on CPU. If your use case is training an ML model to learn deep insights in the data, Tabular Fine-Tuning is your best option.
Data types: Numeric, categorical
Differential privacy: Required; you cannot run without differential privacy
You can use the flow chart below to help determine whether Transform, Synthetics (with or without Differential Privacy), or the combination is best for your use case.
If you decided that you should use Synthetics as part of your use case, you can use the next flow chart to help determine which Synthetics model may be best.
You can learn more about Gretel Synthetics models .