Gretel Tabular

Like gretel_model, the gretel_tabular action can be used to train and generate records from Gretel Models. gretel_tabular's primary value add is the maintence of referential integrity between related tables. This action is therefore recommended for workflows involving relational databases or data warehouses. gretel_tabular also allows specifying different model configs for different tables, and even instructing Gretel to find optimal model configs for your data via Gretel Tuner.

Inputs

Outputs

Example Configs

Generate a synthetic database by applying a consistent synthetics model to all tables in the dataset. Note that the model config can be specified as a full object...

type: gretel_tabular
name: model-train-run
input: mysql-read
config:
  project_id: proj_1
  train:
    dataset: "{outputs.mysql-read.dataset}"
    model_config:
      schema_version: "1.0"
      name: "tabular-actgan"
      models:
        - synthetics:
            data_source: __tmp__
            params:
              epochs: auto
              vocab_size: auto
              learning_rate: auto
              batch_size: auto
              rnn_units: auto
            privacy_filters:
              outliers: auto
              similarity: auto
  run:
    num_records_multiplier: 1.0

...or a reference to a blueprint template can be provided via from:

type: gretel_tabular
name: model-train-run
input: mysql-read
config:
  project_id: proj_1
  train:
    dataset: "{outputs.mysql-read.dataset}"
    model_config:
      from: "synthetics/tabular-actgan"
  run:
    num_records_multiplier: 1.0

You can apply different model configs to different tables by supplying table-specific configs:

type: gretel_tabular
name: model-train-run
input: mysql-read
config:
  project_id: proj_1
  train:
    dataset: "{outputs.mysql-read.dataset}"
    model_config:
      from: "synthetics/tabular-actgan"
    table_specific_configs:
      - tables: ["users"]
        model_config:
          from: "synthetics/tabular-differential-privacy"
  run:
    num_records_multiplier: 1.0

To pass a subset of tables through unaltered by the model (e.g. for static reference data), specify tables to skip:

type: gretel_tabular
name: model-train-run
input: mysql-read
config:
  project_id: proj_1
  train:
    dataset: "{outputs.mysql-read.dataset}"
    model_config:
      from: "synthetics/tabular-actgan"
    skip_tables:
      - table: countries
      - table: states
  run:
    num_records_multiplier: 1.0

Autotune (Gretel Tuner)

Instead of providing a specific model config, you can instruct the gretel_tabular action to run trials to identify the best model config for each table. This is accomplished via the autotune option inside model_config fields (at either the root train level to apply to all tables, or inside a table_specific_config to apply to only a subset of tables).

Autotune objects accept the following fields:

Example configs with Autotune

Using all autotune defaults:

name: synthesize
type: gretel_tabular
input: extract
config:
  project_id: proj_1
  train:
    dataset: "{outputs.extract.dataset}"
    model_config:
      autotune:
        enabled: true

By default, gretel_tabular uses the tuner/tabular-actgan blueprint Tuner config, but a different blueprint can be referenced...

name: synthesize
type: gretel_tabular
input: extract
config:
  project_id: proj_1
  train:
    dataset: "{outputs.extract.dataset}"
    model_config:
      autotune:
        enabled: true
        tuner_config:
          from: "synthetics/tabular-lstm"

...or a Tuner config can be spelled out explicitly:

name: synthesize
type: gretel_tabular
input: extract
config:
  project_id: proj_1
  train:
    dataset: "{outputs.extract.dataset}"
    model_config:
      autotune:
        enabled: true
        tuner_config:
          base_config: synthetics/tabular-actgan
          params:
            batch_size:
              fixed: 500
            epochs:
              choices: [100, 500]
            generator_lr:
              log_range: [0.00001, 0.001]
            discriminator_lr:
              log_range: [0.00001, 0.001]
            embedding_dim:
              choices: [64, 128, 256]
            generator_dim:
              choices:
                - [512, 512, 512, 512]
                - [1024, 1024]
                - [1024, 1024, 1024]
                - [2048, 2048]
                - [2048, 2048, 2048]
            discriminator_dim:
              choices:
                - [512, 512, 512, 512]
                - [1024, 1024]
                - [1024, 1024, 1024]
                - [2048, 2048]
                - [2048, 2048, 2048]

Last updated