Regression
Evaluate synthetic data vs. real-world data on regression models
The regression
Evaluate task will generate a Gretel Synthetic Data Utility Report.
Learn more about the Data Utility Report
The Gretel Evaluate Regression task uses the open source AutoML PyCaret library to evaluate the quality of your generated synthetic data on commonly used ML regression models, and gives you the results in an easy-to-understand HTML report.
Low-code using Gretel Console
You can kick off this evaluation directly in the Gretel Console. Start with this existing example: Generate synthetic data + evaluate ML performance
To use this blueprint, click "Edit" in the configuration editor and change the parameters to fit your dataset. You can also change the synthetic model from the Gretel LSTM default to any other synthetic model.
Example configuration with Gretel LSTM:
You can copy the configuration above and edit it to fit your use case. Then, click "Begin training" to kick off the model process.
Supported models and metrics
By default, all models will be used in the training to create the evaluation results. You can select specific models to use by passing in a list of strings from the following set:
If you want to change the metric that the classifiers will use to optimize for, you can select one metric from regression_metrics
below. The default metric is "R2" (R-squared).
SDK
You can use the regression
Evaluate task in two ways:
1. As a parameter of a Gretel synthetics model, or
2. Compare two datasets directly: a synthetic dataset and a real-world dataset
Option 1: Train and generate synthetic data, then evaluate on classification models
Here's a basic example generating synthetic data using Gretel LSTM and the publicly available heart disease dataset, then adding regression
evaluation to create the Data Utility Report:
You can then run the model and save the report using:
Even when using the Evaluate SDK, you can find model details and report download options in the Gretel Console -- simply navigate to the heart-disease-regression-example
project.
Option 2: BYO synthetic and real data to compare
If you already have generated synthetic data in the form of a CSV, JSON(L) or Pandas Dataframe, you can also use this Evaluate task to analyze the two datasets.
The Gretel SDK provides Python classes specifically to run reports. The DownstreamRegressionReport()
class uses evaluate
with regression
task to generate a Data Utility Report. A basic usage is below:
The Evaluate task creates a Data Utility Report with the results of the analysis. You'll see a high-level ML Quality Score (MQS) which gives you an at-a-glance understanding of how your synthetic dataset performed. For more info about the report, checkout this page about each section.
For more examples, please follow the Jupyter notebook or open in Google Colab.
Gretel Synthetic Data Utility Report
The Evaluate task creates a Data Utility Report with the results of the analysis. You'll see a high-level ML Quality Score (MQS) which gives you an at-a-glance understanding of how your synthetic dataset performed. For more info about the report, checkout this page about each section.
Logs and Results
You can view logs both in the SDK environment or go to the project in the Gretel Console to follow along with the model training progress and download the results of the evaluation.
Last updated