Synthetics
This section covers the model training and generation APIs shared across all Gretel models.
Synthetic Models
Synthetic data models supported by Gretel APIs.
Gretel Navigator Fine Tuning - LLM-based AI system supporting tabular, time-series, JSON, and natural language text data.
Gretel ACTGAN - Adversarial model for tabular, structured numerical, high column count data.
Gretel Tabular DP - Graph-based model for tabular data with differential privacy.
Gretel GPT - Generative pre-trained transformer for natural language text.
Gretel DGAN - Adversarial model for time-series data.
Gretel Amplify - Statistical model for high volume tabular data.
Gretel LSTM - Language model for tabular, time series, text data.
Supported Features
This section compares features of different generative data models supported by Gretel APIs.
✅ = Supported
✖️ = Not yet supported
Tag
navigator_ft
actgan
gpt_x
tabular_dp
timeseries_dgan
synthetics
amplify
Type
Language Model
Generative Adversarial Network
Language Model
Statistical
Generative Adversarial Network
Language Model
Statistical
Model
Pre-trained Transformer
GAN
Pre-trained Transformer
Probabilistic Graphical Model
GAN
LSTM
Statistical
Privacy filters
✖️
✅
✖️
✖️
✖️
✅
✅
Privacy metrics
✅
✅
✖️
✅
✖️
✅
✅
Differential privacy
✖️
✖️
✅
✅
✖️
✖️
✖️
Tabular
✅
✅
✖️
✅
✅
✅
✅
Time-series
✅
✖️
✖️
✖️
✅
✅
✖️
Natural language
✅
✖️
✅
✖️
✖️
✅
✖️
Conditional generation
✖️
✅
✅
✖️
✖️
✅
✅
Pre-trained
✅
✖️
✅
✖️
✖️
✖️
✖️
Gretel cloud
✅
✅
✅
✅
✅
✅
✅
Hybrid cloud
✅
✅
✅
✅
✅
✅
✅
Requires GPU
✅
✅
✅
✖️
✅
✅
✖️
Need help choosing the right synthetic model? Check out our Benchmark Report for a detailed model comparison based on real world datasets.
Model Configuration
All Gretel Synthetics models follow a similar configuration file format structure. Here is an example model-config.yaml
[model_id]
is replaced with the type of model you wish to train (e.g.synthetics
,gpt_x
,actgan
,timeseries_dgan
,amplify, tabular_dp
).data_source
must point to a valid and accessible file in CSV, JSON, or JSONL format.Supported storage formats include S3, GCS, Azure Blog Storage, HDFS, WebHDFS, HTTP, HTTPS, SFTP, or local filesystem.
Some #models have specific data source format requirements
data_source: __tmp__
can be used when the source file is specified elsewhere using:--in_data
parameter via CLI,parameter via SDK,
dataset
button
via Console.
The
params
object contains key-value pairs that represent the available parameters that will be used to train a synthetic data model on thedata_source
.Parameters are specific to each model type. See a full list of supported parameters in each of the #models pages.
Gretel has configuration templates that may be helpful as starting points for creating your model.
Create and Train a Model
Use the following CLI command to create and train a synthetic model.
--in_data
is optional ifdata_source
specified in the config, and can be used to override the value in the config.--in_data
is required ifdata_source: __tmp__
is used in the config--name
is optional, and can be used to override thename
specified in the config
During training, the following model artifacts are created:
data_preview.gz
A preview of your synthetic dataset in CSV format.
logs.json.gz
Log output from the synthetic worker that is helpful for debugging.
report.html.gz*
HTML report that offers deep insight into the quality of the synthetic model.
report-json.json.gz*
A JSON version of the synthetic quality report that is useful to validate synthetic data model quality programmatically.
*Not all models produce a Synthetic Data Quality Report. See the #models page for more details.
Generate data from a model
Use the gretel models run
command to generate data from a synthetic model.
--model-id
supports both a modeluid
and the JSON thatmodels create
outputsThere are many different
--param
options, depending on the model.num_records
param is supported by all synthetic models and is used to tell the model how many new rows to generate.
--in_data
is optional and used for conditional data generation when supported by the model
Last updated