This notebook is designed to help users successfully train synthetic models on complex datasets with high row and column counts. The code works by intelligently dividing a dataset into a set of smaller datasets of correlated columns that can be parallelized and then joined together.
Conditional data generation (seeding a model) is helpful when you want to preserve some of the original row data (primary keys, dates, important categorical data) in synthetic datasets.
This notebook shows how to generate synthetic data directly from a multi-table relational database to support data augmentation and subsetting use cases.
Videos
Walk through creating synthetic data with Gretel.ai, Python, Pandas, and Jupyter.