LogoLogo
  • Welcome to Gretel!
  • Gretel Basics
    • Getting Started
      • Quickstart
      • Blueprints
      • Use Case Examples
      • Environment Setup
        • Console
        • SDK
      • Projects
      • Inputs and Outputs
      • Gretel Connectors
        • Object Storage
          • Amazon S3
          • Google Cloud Storage
          • Azure Blob
        • Database
          • MySQL
          • PostgreSQL
          • MS SQL Server
          • Oracle Database
        • Data Warehouse
          • Snowflake
          • BigQuery
          • Databricks
        • Gretel Project
    • Release Notes
      • Platform Release Notes
        • May 2025
        • April 2025
        • March 2025
        • February 2025
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
        • July 2024
        • June 2024
      • Console Release Notes
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
      • Python SDKs
  • Create Synthetic Data
    • Gretel Safe Synthetics
      • Transform
        • Reference
        • Examples
        • Supported Entities
      • Synthetics
        • Gretel Tabular Fine-Tuning
        • Gretel Text Fine-Tuning
        • Gretel Tabular GAN
        • Benchmark Report
        • Privacy Protection
      • Evaluate
        • Synthetic Quality & Privacy Report
        • Tips to Improve Synthetic Data Quality
        • Data Privacy 101
      • SDK
    • Gretel Data Designer
      • Getting Started with Data Designer
      • Define your Data Columns
        • Column Types
        • Add Constraints to Columns
        • Custom Model Configurations
        • Upload Files as Seeds
      • Building your Dataset
        • Seeding your Dataset
        • Generating Data
      • Generate Realistic Personal Details
      • Structured Outputs
      • Code Validation
      • Data Evaluation
      • Magic Assistance
      • Using Jinja Templates
  • Gretel Playground [Legacy]
    • Getting Started
    • Prompts Tips & Best Practices
    • FAQ
    • SDK Examples
    • Tutorials
    • Videos
    • Gretel Playground [Legacy] Inference API
    • Batch Job SDK
  • Reference
    • Gretel's Python Client
    • Gretel’s Open Source Synthetic Engine
    • Gretel’s REST API
    • Homepage
    • Model Suites
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Create Synthetic Data
  2. Gretel Data Designer

Code Validation

Code Validation in Data Designer

Generate and Validate High-Quality Code

Data Designer provides powerful capabilities for generating and validating code. This feature is particularly valuable when creating code examples, documentation, tutorials, or test data for programming applications. With Data Designer's code validation, you can ensure that your generated code is syntactically correct, follows best practices, and meets quality standards.

Overview of Code Generation and Validation

Data Designer can generate code in various programming languages and then validate it to ensure quality and correctness. This is particularly useful for creating:

  • Code examples for documentation

  • Test data for programming tutorials

  • Synthetic implementation examples

  • Code training datasets

Supported Languages

Data Designer supports validation for these languages:

  • Python (CodeLang.PYTHON)

  • SQL dialects:

    • ANSI SQL (CodeLang.SQL_ANSI)

    • MySQL (CodeLang.SQL_MYSQL)

    • PostgreSQL (CodeLang.SQL_POSTGRES)

    • SQLite (CodeLang.SQL_SQLITE)

    • T-SQL (CodeLang.SQL_TSQL)

    • BigQuery (CodeLang.SQL_BIGQUERY)

Generating Code

To generate code, use the LLMCodeColumn column type with the output_format set to CodeLang.PYTHON to make sure the output code is formatted correctly:

from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P

# Generate Python code
aidd.add_column(
    C.LLMCodeColumn(
        name="code_implementation",
        output_format=P.CodeLang.PYTHON,  # Specify code type
        system_prompt="You are an expert Python programmer who writes clean, efficient, and well-documented code.",
        prompt="""
        Write Python code for the following instruction:
        Instruction: {{instruction}}
    
        Important Guidelines:
        * Code Quality: Your code should be clean, complete, self-contained and accurate.
        * Code Validity: Please ensure that your python code is executable and does not contain any errors.
        * Packages: Remember to import any necessary libraries, and to use all libraries you import.
        """
    )
)

Validating Generated Code

After generating code, you can add a validation column to check for errors and quality issues:

# Add code validation
aidd.add_column(
    name="code_validity_result",
    type="code-validation",
    code_lang=P.CodeLang.PYTHON,  
    target_column="code_implementation"  # Column containing the code
)

Validation Output

The validation process creates several output columns:

For Python:

  • code_validity_result

  • code_implementation_pylint_score

  • code_implementation_pylint_severity

  • code_implementation_pylint_messages

For SQL:

  • code_validity_result

  • code_implementation_validator_messages

Complete Python Example

Here's a complete example of generating and validating Python code:

from gretel_client.navigator_client import Gretel
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P

# Initialize Gretel client
gretel = Gretel(api_key="prompt")

# Create a new Data Designer instance
aidd = gretel.data_designer.new(model_suite="apache-2.0")

# Add a category for code topics
aidd.add_column(
    C.SamplerColumn(
        name="code_topic",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(
            values=["Data Processing", "Web Scraping", "API Integration", "Data Visualization"]
        )
    )
)

# Add a complexity level
aidd.add_column(
    C.SamplerColumn(
        name="complexity_level",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(
            values=["Beginner", "Intermediate", "Advanced"]
        )
    )
)

# Generate an instruction
aidd.add_column(
    C.LLMTextColumn(
        name="instruction",
        system_prompt="You are an expert at creating clear programming tasks.",
        prompt="""
        Create a specific Python programming task related to {{code_topic}} at a {{complexity_level}} level.
        The task should be clear, specific, and actionable.
        """
    )
)

# Generate Python code implementation
aidd.add_column(
    C.LLMCodeColumn(
        name="code_implementation",
        output_format=P.CodeLang.PYTHON,
        system_prompt="You are an expert Python programmer who writes clean, efficient, and well-documented code.",
        prompt="""
        Write Python code for the following instruction:
        Instruction: {{instruction}}
        Important Guidelines:
        * Code Quality: Your code should be clean, complete, self-contained and accurate.
        * Code Validity: Please ensure that your Python code is executable and does not contain any errors.
        * Packages: Remember to import any necessary libraries, and to use all libraries you import.
        * Complexity: The code should match a {{complexity_level}} level of expertise.
        """
    )
)

# Add code validation
aidd.add_column(
    C.CodeValidationColumn(
        name="code_validity_result",
        code_lang=P.CodeLang.PYTHON,
        target_column="code_implementation"
    )
)

# Generate a preview
preview = aidd.preview()
preview.display_sample_record()

LLM-Based Code Evaluation

In addition to static validation, you can add an LLM-based judge to evaluate code quality more holistically:

from gretel_client.data_designer.judge_rubrics import TEXT_TO_PYTHON_LLM_JUDGE_PROMPT_TEMPLATE, PYTHON_RUBRICS

# Add an LLM judge to evaluate code quality
aidd.add_column(
    C.LLMJudgeColumn(
        name="code_judge_result",
        prompt=TEXT_TO_PYTHON_LLM_JUDGE_PROMPT_TEMPLATE,
        rubrics=PYTHON_RUBRICS
    )
)

The judge will evaluate the code based on predefined rubrics like correctness, efficiency, readability, and documentation.

PreviousStructured OutputsNextData Evaluation

Last updated 29 days ago

Was this helpful?