Gretel Model

The gretel_model action can be used to train and generate records from Gretel Models. It is a good choice for non-relational data that can share the same model config. For relational data and providing table-specific models, you should use the gretel_tabular action instead.

Inputs

project_id

The project to create the model in.

model

A reference to a blueprint or config location. If a config location is used, it must be addressable by the workflow action. This field is mutually exclusive to model_config.

model_config

Specify the model config as a dictionary. Accepts any valid model config. This field is mutually exclusive to model.

run_params

Parameters to run or generate records. If this field is omitted, the model will be trained, but no records will get generated for the model.

training_data

Data to use for training. This should be a reference to the output from a previous action.

Outputs

dataset

A dataset object containing the outputs from the models created by this action.

Example Configs

Train a model from a blueprint without generating any records:

type: gretel_model
name: train-model
input: s3-read
config:
    project_id: proj_1
    model: synthetics/default
    training_data: "{outputs.s3-read.dataset.files.data}"

Train and generate records from a model config:

type: gretel_model
name: train-run-model
input: s3-read
config:
  project_id: proj_1
  model_config:
      schema_version: "1.0"
      name: "tabular-lstm"
      models:
        - synthetics:
            data_source: __tmp__
            params:
              epochs: auto
              vocab_size: auto
              learning_rate: auto
              batch_size: auto
              rnn_units: auto
            privacy_filters:
              outliers: auto
              similarity: auto
  training_data: "{outputs.s3-read.dataset.files.data}"
  run_params:
      params:
          num_records_multiplier: 1

Combine two models together using the outputs of one model as the inputs to another

name: combine-models

actions:
  - name: remove-pii
    type: gretel_model
    input: s3-crawl
    config:
      project_id: [project_id]
      model: transform/default
      run_params:
        params: {}
      training_data: "{outputs.source-action.dataset.files.data}"

  - name: synthesize-data
    type: gretel_model
    input: remove-pii
    config:
      project_id: [project_id]
      model: synthetics/tabular-lstm
      run_params:
        params:
          num_records_multiplier: 1
      training_data: "{outputs.remove-pii.dataset.files.data}"

Last updated