This notebook is designed to help users successfully train synthetic models on complex datasets with high row and column counts. The code works by intelligently dividing a dataset into a set of smaller datasets of correlated columns that can be parallelized and then joined together.
Walk through the basics of using Gretel's Python SDK to create a synthetic dataset from a Pandas DataFrame or CSV.
Train a synthetic model locally and generate data in your environment.
Conditional data generation (seeding a model) is helpful when you want to preserve some of the original row data (primary keys, dates, important categorical data) in synthetic datasets.
Balance demographic representation bias in a healthcare set using conditional data generation with a synthetic model.
Create synthetic time-series data from a Pandas DataFrame or CSV.
Use a synthetic model to boost the representation of an extreme minority class in a dataset by incorporating features from nearest neighbors.
Use Gretel APIs to anonymize, synthesize, and then compare synthetic accuracy for a time-series dataset vs real world data.
Run a sweep to automate hyper parameter optimization for a synthetic model using Weights and Biases.
Augment a popular machine learning dataset with synthetic data to improve downstream accuracy and algorithmic fairness.
Measure the effects of different differential privacy settings on a model's ability to memorize and replay secrets in a dataset.
This notebook shows how to generate synthetic data directly from a multi-table relational database to support data augmentation and subsetting use cases.
Generate realistic but synthetic text examples using an open-source implementation of the GPT-3 architecture.
Generate synthetic daily oil price data using the DoppelGANger GAN for time-series data.
Produce a quality score and detailed report for any synthetic dataset vs. real world data.