Use Case Examples

Use case based notebooks.

Follow along with these use cases to famaliarize yourself with core Gretel features. These examples provide a starting point for common use cases which you can modify to suit your specific needs. We walk through three use cases using both the the Gretel SDK and the Gretel CLI in the #CLI and SDK Examples. The rest of our #Example Notebooks use the Gretel SDK and are provided in a Jupyter Notebook format.

On this page:

What's Next?

After trying some of our use case examples below, dive into the Gretel Gretel Fundamentals section to understand the core Gretel concepts you'll be working with regularly.

CLI and SDK Examples

These examples will walkthrough three core Gretel use cases using both the CLI and SDK.

Example Notebooks

Synthetics

Notebook
Description
Description

Open in Colab

This notebook is designed to help users successfully train synthetic models on complex datasets with high row and column counts. The code works by intelligently dividing a dataset into a set of smaller datasets of correlated columns that can be parallelized and then joined together.

Open in Colab

Walk through the basics of using Gretel's Python SDK to create a synthetic dataset from a Pandas DataFrame or CSV.

Open in Colab

Train a synthetic model locally and generate data in your environment.

Open in Colab

Conditional data generation (seeding a model) is helpful when you want to preserve some of the original row data (primary keys, dates, important categorical data) in synthetic datasets.

Open in Colab

Balance demographic representation bias in a healthcare set using conditional data generation with a synthetic model.

Open in Colab

Use a synthetic model to boost the representation of an extreme minority class in a dataset by incorporating features from nearest neighbors.

Open in Colab

Run a sweep to automate hyper parameter optimization for a synthetic model using Weights and Biases.

Open in Colab

Augment a popular machine learning dataset with synthetic data to improve downstream accuracy and algorithmic fairness.

Open in Colab

This notebook shows how to generate synthetic data directly from a multi-table relational database to support data augmentation and subsetting use cases.

Open in Colab

Generate realistic but synthetic text examples using an open-source implementation of the GPT-3 architecture.

Open in Colab

Generate synthetic daily oil price data using the DoppelGANger GAN for time-series data.

Open in Colab

Produce a quality score and detailed report for any synthetic dataset vs. real world data.

boost-minority-class to reduce bias

Open in Colab

Use Gretel ACTGAN model to conditionally generate additional minority samples on a dataset that only has a few instances of the minority class

Open in Colab

Synthesize a sample database using Gretel Relational Synthetics

Transforms

Notebook
Launch
Description

Open in Colab

In this blueprint, we will create a transform policy to identify and redact or replace PII with fake values. We will then use the SDK to transform a dataset and examine the results.

Open in Colab

Label and transform sensitive data locally in your environment.

Open in Colab

In this deep dive, we will walk through some of the more advanced features to de-identify data with the Transform API, including bucketing, date shifts, masking, and entity replacements.

Open in Colab

This notebook walks through creating a policy using the Transform API to de-identify and anonymize data in a Postgres database for test use cases.

Open in Colab

This notebook uses Gretel Relational Transform model to redact PII in a sample database.

Classify

Notebook
Launch
Description

Open in Colab

In this blueprint, we will create a classification policy to identify PII as well as a custom regular expression. We will then use the SDK to classify data and examine the results.

Open in Colab

Label managed and custom data types locally in your environment.

Open in Colab

In this blueprint, we analyze and label a set of freetext email dumps looking for PII and other potentially sensitive information using NLP.

Evaluate

Notebook
Launch
Description

Open in Colab

In this notebook, we benchmark datasets and models to analyze multiple synthetic generation algorithms (including, but not limited to, Gretel models). The Benchmark report provides Synthetic Data Quality Score (SQS) for each generated synthetic dataset, as well as train time, generate time, and total runtime (in secs).

Open in Colab

Evaluate synthetic data vs. real data trained on AutoML classifiers. The Gretel Synthetic Data Utility Report provides a detailed table of classification metrics.

Open in Colab

Evaluate synthetic data vs. real data trained on AutoML regression models. The Gretel Synthetic Data Utility Report provides a detailed table of regression metrics.

Last updated