Data Evaluation
Setting Up Evaluations in Data Designer
Data Designer provides powerful capabilities for evaluating the quality of your generated data. This guide explains how to set up and use evaluations in your data generation workflows.
Overview of Evaluations
Evaluations help you assess various aspects of your generated data:
Statistical distributions and relationships
Content quality
Adherence to requirements
Correctness of generated code
Data Designer supports both automated data validation and LLM-based evaluations.
Adding Evaluation Reports
To add a general evaluation report to your Data Designer instance:
This will generate evaluation metrics for your data after generation.
LLM-Based Evaluation with Judges
One of the most powerful evaluation tools is the LLM judge, which can provide human-like assessment of your generated content. This type of column is catered for judge scenarios and allows users to provide a custom Rubric
to evaluate the generated data.
A Rubric is how users can define a set of rules used to judge the LLM response. A good Rubric will include a description and a scoring criteria. Users can have multiple Rubrics used by a judge column,
Using Predefined Rubrics
Data Designer includes predefined evaluation rubrics for common use cases such as Text-to-Python and Text-to-SQL datasets. For other use cases, you can define your own prompts and rubrics:
When using TEXT_TO_PYTHON_LLM_JUDGE_PROMPT_TEMPLATE
, you must have a column called instruction
and a column called code_implementation
to make up the prompt-code pairs. Similarly for the TEXT_TO_SQL_LLM_JUDGE_PROMPT_TEMPLATE
, you must have sql_prompt
, sql_context
, and sql
.
Accessing Evaluation Results
After running a workflow with evaluations, you can access the evaluation results:
Last updated
Was this helpful?