Examples
Use case based tutorials.
Notebook | Description | Description |
This notebook is designed to help users successfully train synthetic models on complex datasets with high row and column counts. The code works by intelligently dividing a dataset into a set of smaller datasets of correlated columns that can be parallelized and then joined together. | ||
Walk through the basics of using Gretel's Python SDK to create a synthetic dataset from a Pandas DataFrame or CSV. | ||
Train a synthetic model locally and generate data in your environment. | ||
Conditional data generation (seeding a model) is helpful when you want to preserve some of the original row data (primary keys, dates, important categorical data) in synthetic datasets. | ||
Balance demographic representation bias in a healthcare set using conditional data generation with a synthetic model. | ||
Create synthetic time-series data from a Pandas DataFrame or CSV. | ||
Use a synthetic model to boost the representation of an extreme minority class in a dataset by incorporating features from nearest neighbors. | ||
Use Gretel APIs to anonymize, synthesize, and then compare synthetic accuracy for a time-series dataset vs real world data. | ||
Run a sweep to automate hyper parameter optimization for a synthetic model using Weights and Biases. | ||
Augment a popular machine learning dataset with synthetic data to improve downstream accuracy and algorithmic fairness. | ||
Measure the effects of different differential privacy settings on a model's ability to memorize and replay secrets in a dataset. | ||
This notebook shows how to generate synthetic data directly from a multi-table relational database to support data augmentation and subsetting use cases. | ||
Generate realistic but synthetic text examples using an open-source implementation of the GPT-3 architecture. | ||
Generate synthetic daily oil price data using the DoppelGANger GAN for time-series data. | ||
Produce a quality score and detailed report for any synthetic dataset vs. real world data. |
Notebook | Launch | Description |
In this blueprint, we will create a transform policy to identify and redact or replace PII with fake values. We will then use the SDK to transform a dataset and examine the results. | ||
Label and transform sensitive data locally in your environment. | ||
In this deep dive, we will walk through some of the more advanced features to de-identify data with the Transform API, including bucketing, date shifts, masking, and entity replacements. | ||
This notebook walks through creating a policy using the Transform API to de-identify and anonymize data in a Postgres database for test use cases. |
Notebook | Launch | Description |
In this blueprint, we will create a classification policy to identify PII as well as a custom regular expression. We will then use the SDK to classify data and examine the results. | ||
Label managed and custom data types locally in your environment. | ||
In this blueprint, we analyze and label a set of freetext email dumps looking for PII and other potentially sensitive information using NLP. |
Last modified 3mo ago