Search…
Amplify
Model type: Statistical model that supports high volumes of data generation.
The Gretel Amplify model API is designed to rapidly generate large volumes of synthetic data using statistical models and a hyper efficient multi-processing implementation. While Amplify is effective at learning and recreating distributions and correlations, it typically has a 10-15% drop in synthetic data accuracy versus Gretel's deep learning-based models for tabular data.

Use Cases

The Gretel Amplify model is able to generate large quantities of data from real-world data or synthetic data. Several use cases could include:
  • Creating large amounts of synthetic data to load test an application.
  • Mimic real-world data for pre-production environments.
  • Generate synthetic examples to test a ML model's ability to generalize to new data.

Model creation

This model can be selected using the amplify model tag. Below is an example configuration that may be used to create a Gretel Amplify model. All Gretel models implement a common interface to train or fine-tune synthetic data models from the model-specific config. See the reference example to train a model.
The configuration below contains additional options for training a Gretel Amplify model, with the default options displayed.
schema_version: '1.0'
models:
- amplify:
data_source: __tmp__
params:
num_records: null
target_size_mb: null
  • data_source (str, required) - __tmp__ or point to a valid and accessible file in CSV, JSON, or JSONL format.
  • num_records (int, optional, defaults to null) - target number of records to generate
  • target_size_mb (int, optional, defaults to null) - target file size of the generated data in megabytes, with a maximum value of 5000 (5GB)
If both parameters are null, the model will generate a synthetic dataset of the same size and shape as the training data.
Set either num_records or target_size_mb to a positive integer, or leave both null. If both parameters are set to non-null values, the configuration will be invalid.

Data generation

Example CLI to generate 1000 additional records from a trained Amplify model:
gretel models run \
--project <project-name> \
--model-id <model-id> \
--runner cloud \
--param num_records 1000 \
--output .
Example CLI to generate 1 GB from a trained Amplify model:
gretel models run \
--project <project-name> \
--model-id <model-id> \
--runner cloud \
--param target_size_mb 1000 \
--output .
Also see the reference command line example for data generation.

Minimum requirements

Amplify's speed is roughly proportional to the number of CPUs because it employs multi-processing. Therefore, 8-12 core boxes will have optimal speed.
If running this system in local mode (on-premises), the following instance types are recommended.
CPU: Minimum 4 cores, 32GB RAM.
With Amplify, no GPU is required.

Limitations and Biases

This model is trained entirely on the examples provided in the training dataset and will therefore capture and likely repeat any biases that exist in the training set. We recommend having a human review the data set used to train models before using in production.
Copy link
On this page
Use Cases
Model creation
Data generation
Minimum requirements
Limitations and Biases