Explore FAQs and deep dives around how to generate high quality synthetic data and models.


What is synthetic data?

Synthetic data can thought of as artificial information generated by computer algorithms or simulations that can be used as an alternative to real world data. While artificial, high quality synthetic data is capable of capturing the mathematical insights, statistics, and dynamism of real-world data; research shows that it can be as good or even better than real world data for datasets for analysis and training AI models as it can be engineered to reduce biases and increase privacy vs. real-world datasets.

What does it mean to synthesize data?

Synthesizing data is the process of creating an artificial dataset from real world data.

What is the definition of fairness in machine learning?

“In fair AI, the objective is to provide systems that both quantify bias and mitigate discrimination against subgroups” 1. Artificial Intelligence (AI) is now ubiquitous in our culture. It is often responsible for critical decisions such as who to hire and at what salary, who to give a loan or insurance policy to, and who is at risk for cancer or heart disease. Fair AI is an active area of academic research that strives to eliminate discrimination against demographic groups.

How synthetic is my synthetic data?

Check out our guide for everything you could want to know about the Gretel Synthetics Report and quantifying the accuracy of your synthetic models.

How do I improve my synthetic data quality?

A how-to guide for improving the quality of your synthetic data models.


1 Ahsen, Mehmet Eren, Mehmet Ulvi Saygi Ayvaci, and Srinivasan Raghunathan. "When algorithmic predictions use human-generated data: A bias-aware classification algorithm for breast cancer diagnosis." Information Systems Research 30.1 (2019): 97-116.