Synthesize Tabular Data
Use Gretel's ACTGAN model to generate tabular synthetic data.
In this example, we will generate synthetic tabular data using Gretel's ACTGAN model. The model will be trained from scratch on the United States Census Adult Income dataset.
To accomplish the above, we will submit training and generation jobs to the Gretel Cloud. Behind the scenes, Gretel will spin up workers with the necessary compute resources, set up the model with your desired configuration, and perform the submitted task.
Create Project
First, we will create a project to host your model and artifacts.
Get Training Data
Download and preview the dataset we will use to train the synthetic model on.
The head
command previews the first 10 rows of the dataset we will synthesize.
Train the synthetic model
Outputs
The --output
parameter specifies where the model artifacts will be saved. In this example --output .
creates several files in your local directory. For models trained in the Gretel Cloud, model artifacts can be downloaded at any time with the following command: gretel models get --model-id [model id] --output .
. The following model artifacts are created:
data_preview.gz
A preview of your synthetic dataset in CSV format.
report.html.gz
HTML report that offers deep insight into the quality of the synthetic model.
report-json.json.gz
A JSON version of the synthetic quality report that is useful to validate synthetic data model quality programmatically.
logs.json.gz
Log output from the synthetic worker that is helpful for debugging.
Generate synthetic data
Now we will use our trained synthetic model to generate more synthetic data. Copy the model ID returned by the gretel models create
command.
The following model artifacts are created during a generation job:
data.gz
A preview of your synthetic dataset in CSV format.
logs.json.gz
Log output from the synthetic worker that is helpful for debugging.
Last updated