Search…
Models
This section covers the generative machine learning models supported by Gretel APIs as well as core use cases and capabilities.

Supported Features

This section compares features of different generative data models supported by Gretel APIs.
✅ = Supported
✖️ = Not yet supported
Gretel-LSTM
Gretel-GPT
Tag
synthetics
gpt_x
Type
LM (Language Model)
LM
Model
LSTM
Transformer
Privacy filters
✖️
Differential privacy
✖️
Synthetic quality report
✖️
Tabular
✖️
Time-series
✖️
Natural language
Conditional generation
Pre-trained
✖️
Gretel cloud
On-premises
Open-source core

Create and train a model

Below is an example configuration that may be used to create and fine-tune a synthetic data model. Save the example above to model-config.yaml.
  • Replace [model_id] with the type of model you wish to train (e.g. synthetics, gpt_x).
  • data_source must point to a valid and accessible file URL in CSV format. Supported storage formats include S3, GCS, Azure Blog Storage, HDFS, WebHDFS, HTTP, HTTPS, SFTP, or local filesystem.
1
schema_version: "1.0"
2
name: "my-model"
3
4
models:
5
- [model_id]:
6
data_source: foo.csv
Copied!
Use the following CLI command to train and create the synthetic data model.
  • The use of exports are not necessary, they are only used to have a cleaner models create command.
  • --in_data is optional, and can be used to override the data_source specified in the config.
1
export CONFIG_PATH=model-config.yaml
2
export DATASOURCE=foo.csv
3
4
gretel models create \
5
--config $CONFIG_PATH \
6
--runner cloud \
7
--in-data $DATASOURCE > my-model.json
Copied!

Generate data from a model

Below is an example CLI command that may be used to generate data from a model.
  • --model-id supports both a model uid and the JSON that models create outputs.
  • --data_source (optional) allows you to specify a CSV file to prompt the model for conditional data generation tasks.
1
gretel models run --model-id my-model.json \
2
--runner cloud \
3
--data_source prompts.csv \
4
--output .
Copied!