Data Quality Reports

We only support evaluation and validation for Text-to-Python and Text-to-SQL data generation tasks. We have support for generate evaluations coming soon!

As part of our commitment to providing high-quality synthetic data, Data Designer includes a comprehensive evaluation process for generated datasets. This evaluation is designed to give you insights into the quality, diversity, and usefulness of your synthetic data. Here's an overview of how we evaluate datasets in the current version of Data Designer:

Evaluation Process

  1. Automatic Evaluation: By default, evaluations run as part of the synthetic data generation pipeline. However, you have the option to disable evaluation if you prefer to speed up the generation process.

  2. Preview Mode: We offer the ability to run evaluations in preview mode, allowing you to quickly assess and iterate on your dataset before full generation.

  3. Full Dataset Analysis: Our primary focus is on full dataset evaluations, providing you with a comprehensive view of your generated data.

  4. Visualization: We generate visualizations as part of our evaluations, making it easier for you to interpret the results at a glance.

Evaluation Metrics

Our evaluation process includes both general metrics applicable to all datasets and use-case specific metrics for text-to-code datasets. Here are the key metrics we provide:

General Metrics

  1. Diversity Analysis:

    • Percentage of unique records

    • Number of unique values per column

    • Per-column distribution and diversity index

    • Distribution and visualization of contextual tags

  2. LLM-based Quality Assessment:

    • We use a large language model (LLM) as a judge to evaluate the dataset on relevance, readability, scalability, and adherence to coding standards.

Text-to-Code Specific Metrics

  1. Code Validity:

    • Fraction of valid code in the dataset

  2. Code Quality Assessment:

    • LLM-based evaluation using a code-specific rubric, considering factors such as modularity, readability, and scalability

  3. Static Analysis:

    • Linter-based code assessments for Python and SQL code

Evaluation Report

After the evaluation process, we provide you with a comprehensive report that includes:

  1. Quantitative metrics for each evaluation category

  2. Visualizations to help you quickly grasp the characteristics of your dataset

  3. Qualitative feedback from our LLM-based assessments

This report is stored as part of the workflow artifacts, allowing you to review and compare evaluations across different iterations of your dataset.

Benefits of Our Evaluation Process

  • Trust and Transparency: Our evaluation process helps build trust by providing clear, objective measures of data quality.

  • Iterative Improvement: By offering evaluations in preview mode, we enable you to quickly iterate and improve your data generation process.

  • Use-Case Optimization: Our text-to-code specific metrics ensure that generated code meets high standards of validity and quality.

  • Comprehensive Insights: By combining quantitative metrics, visualizations, and LLM-based assessments, we provide a holistic view of your dataset's characteristics and quality.

You can learn how to access the evaluation report here.

Last updated