Relational

Synthesize and Transform multi-table databases with Gretel Relational.

Introduction

Gretel Relational applies Gretel Transform and Synthetics models to multi-table contexts, allowing you to synthesize and transform multiple related tables, or even entire SQL databases, while ensuring referential integrity on top of accuracy and privacy.

Depending on your use case, you can leverage Gretel's generative AI models to:

  • Generate synthetic versions of your database

  • Redact Personal Identifiable Information (PII) from your database

  • Transform and synthesize your database to ensure compliance with GDPR, CCPA, and other privacy regulations

Gretel Relational is executed through Gretel Workflows.

High Level Flow

  1. Use a native connector to extract data from your source.

  2. Train and run models via the gretel_tabular action.

  3. Optionally, write output data to a destination sink.

  4. Optionally, write output reports to an object store of your choice.

Relational Synthetics

Gretel Relational Synthetics leverages our library of generative AI models to synthesize large multi-table databases while maintaining referential integrity and statistical accuracy.

When you generate synthetic data, you choose the amount of data to generate. You can choose to replicate the size of your database, generate less data (subset), or create more data than your original database.

Subset

Synthetic subsetting allows you to shrink your database proportionally with anonymized data that looks and feels like production data—all without risking privacy or sacrificing quality. Unlike other database subsetting tools that gamble with random sampling, Gretel Relational leverages our industry-leading generative AI models to accurately subset your data so you can innovate with confidence and speed.

Some examples of use cases for generating a synthetic subset of a database include:

  • Software Development and Testing - Speed up the development process testing by working with a smaller, statistically accurate database;

  • Resource Constraints - Reduce costs and improve performance by generating smaller databases instead of storing and processing large databases that can be resource-intensive and expensive;

  • Minimize Risk - Subset the data accessible in lower environments to reduce risk in the event of a breach.

Generate a larger database

Some examples of use cases that may require larger databases include:

  • Load Testing - Generate large amounts of synthetic data to safely test the robustness of an application in development environments before you reach that same scale in production;

  • Simulate Real-World Scenarios - For pre-production environments generating additional data that mimic real-world scenarios and edge cases allows for more comprehensive, robust testing;

  • Improve ML Models, for example Fraud detection - Generating a larger synthetic database that simulates fraudulent transactions and patterns can improve the accuracy of fraud detection systems by providing more data to learn from.

Relational Transforms

Gretel Relational Transforms leverages our Transform capabilities to detect and transform sensitive entities throughout your database. You can effortlessly extract sensitive columns and apply a range of transformations at scale, such as masking, hashing, tokenization, or even replacement. By transforming key values, Gretel Relational Transforms goes the extra mile to ensure your database is private and secure, while maintaining referential integrity and statistical accuracy.

Transform and Synthesize

For maximum privacy assurances (think GDPR compliance), you can configure a Gretel Relational Workflow to first transform your database, then train synthetics models on the transformed tables. Configure two gretel_tabular actions, with the second using the output dataset of the first.

name: transform-and-synthesize

actions:
  - name: extract
    type: mysql_source
    connection_id: sample_mysql_telecom
    config:
      sync:
        mode: full

  - name: transform
    type: gretel_tabular
    input: extract
    config:
      train:
        dataset: {outputs.extract.dataset}
        model: transform/transform_v2

  - name: synthesize
    type: gretel_tabular
    input: transform
    config:
      train:
        dataset: {outputs.transform.dataset}
        model: synthetics/tabular-actgan
      run:
        num_records_multiplier: 1.0

Notebook Resources

Sample source connection

The example notebooks above use a special connection, sample_mysql_telecom, which connects to a demo telecommunications database:

Last updated