Code Validation

Code Validation in Data Designer

Generate and Validate High-Quality Code

Data Designer provides powerful capabilities for generating and validating code. This feature is particularly valuable when creating code examples, documentation, tutorials, or test data for programming applications. With Data Designer's code validation, you can ensure that your generated code is syntactically correct, follows best practices, and meets quality standards.

Overview of Code Generation and Validation

Data Designer can generate code in various programming languages and then validate it to ensure quality and correctness. This is particularly useful for creating:

  • Code examples for documentation

  • Test data for programming tutorials

  • Synthetic implementation examples

  • Code training datasets

Supported Languages

Data Designer supports validation for these languages:

  • Python (CodeLang.PYTHON)

  • SQL dialects:

    • ANSI SQL (CodeLang.SQL_ANSI)

    • MySQL (CodeLang.SQL_MYSQL)

    • PostgreSQL (CodeLang.SQL_POSTGRES)

    • SQLite (CodeLang.SQL_SQLITE)

    • T-SQL (CodeLang.SQL_TSQL)

    • BigQuery (CodeLang.SQL_BIGQUERY)

Generating Code

To generate code, use the LLMCodeColumn column type with the output_format set to CodeLang.PYTHON to make sure the output code is formatted correctly:

from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P

# Generate Python code
aidd.add_column(
    C.LLMCodeColumn(
        name="code_implementation",
        output_format=P.CodeLang.PYTHON,  # Specify code type
        system_prompt="You are an expert Python programmer who writes clean, efficient, and well-documented code.",
        prompt="""
        Write Python code for the following instruction:
        Instruction: {{instruction}}
    
        Important Guidelines:
        * Code Quality: Your code should be clean, complete, self-contained and accurate.
        * Code Validity: Please ensure that your python code is executable and does not contain any errors.
        * Packages: Remember to import any necessary libraries, and to use all libraries you import.
        """
    )
)

Validating Generated Code

After generating code, you can add a validation column to check for errors and quality issues:

# Add code validation
aidd.add_column(
    name="code_validity_result",
    type="code-validation",
    code_lang=P.CodeLang.PYTHON,  
    target_column="code_implementation"  # Column containing the code
)

Validation Output

The validation process creates several output columns:

For Python:

  • code_validity_result

  • code_implementation_pylint_score

  • code_implementation_pylint_severity

  • code_implementation_pylint_messages

For SQL:

  • code_validity_result

  • code_implementation_validator_messages

Complete Python Example

Here's a complete example of generating and validating Python code:

from gretel_client.navigator_client import Gretel
from gretel_client.data_designer import columns as C
from gretel_client.data_designer import params as P

# Initialize Gretel client
gretel = Gretel(api_key="prompt")

# Create a new Data Designer instance
aidd = gretel.data_designer.new(model_suite="apache-2.0")

# Add a category for code topics
aidd.add_column(
    C.SamplerColumn(
        name="code_topic",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(
            values=["Data Processing", "Web Scraping", "API Integration", "Data Visualization"]
        )
    )
)

# Add a complexity level
aidd.add_column(
    C.SamplerColumn(
        name="complexity_level",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(
            values=["Beginner", "Intermediate", "Advanced"]
        )
    )
)

# Generate an instruction
aidd.add_column(
    C.LLMTextColumn(
        name="instruction",
        system_prompt="You are an expert at creating clear programming tasks.",
        prompt="""
        Create a specific Python programming task related to {{code_topic}} at a {{complexity_level}} level.
        The task should be clear, specific, and actionable.
        """
    )
)

# Generate Python code implementation
aidd.add_column(
    C.LLMCodeColumn(
        name="code_implementation",
        output_format=P.CodeLang.PYTHON,
        system_prompt="You are an expert Python programmer who writes clean, efficient, and well-documented code.",
        prompt="""
        Write Python code for the following instruction:
        Instruction: {{instruction}}
        Important Guidelines:
        * Code Quality: Your code should be clean, complete, self-contained and accurate.
        * Code Validity: Please ensure that your Python code is executable and does not contain any errors.
        * Packages: Remember to import any necessary libraries, and to use all libraries you import.
        * Complexity: The code should match a {{complexity_level}} level of expertise.
        """
    )
)

# Add code validation
aidd.add_column(
    C.CodeValidationColumn(
        name="code_validity_result",
        code_lang=P.CodeLang.PYTHON,
        target_column="code_implementation"
    )
)

# Generate a preview
preview = aidd.preview()
preview.display_sample_record()

LLM-Based Code Evaluation

In addition to static validation, you can add an LLM-based judge to evaluate code quality more holistically:

from gretel_client.data_designer.judge_rubrics import TEXT_TO_PYTHON_LLM_JUDGE_PROMPT_TEMPLATE, PYTHON_RUBRICS

# Add an LLM judge to evaluate code quality
aidd.add_column(
    C.LLMJudgeColumn(
        name="code_judge_result",
        prompt=TEXT_TO_PYTHON_LLM_JUDGE_PROMPT_TEMPLATE,
        rubrics=PYTHON_RUBRICS
    )
)

The judge will evaluate the code based on predefined rubrics like correctness, efficiency, readability, and documentation.

Last updated

Was this helpful?