Working with Models

Automate creating, training and running Gretel Models.

Gretel specific Workflow Actions are provided to connect data sources to Gretel Models and Projects.

Training and Serving Models

Action TypeSupported Data SourcesSupported Data Types

gretel_model

Non-relational tabular file(s)

gretel_tabular

Non-relational tabular file(s),

Relational databases,

Tabular files containing nested JSON data

gretel_model Action Type

The gretel_model action can be used to train and generate records from Gretel Models. The gretel_model action does not include support for Gretel Relational and nested JSON data.

Inputs

project_id

The project to create the model in.

model

A reference to a blueprint or config location. If a config location is used, it must be addressable by the workflow action. This field is mutually exclusive to model_config.

model_config

Specify the model config as a dictionary. Accepts any valid model config. This field is mutually exclusive to model.

run_params

Parameters to run or generate records. If this field is omitted, the model will be trained, but no records will get generated for the model.

training_data

Data to use for training. This should be a reference to the output from a previous action.

Outputs

dataset

A dataset object containing the outputs from the models created by this action.

Example Configs

Train a model from a blueprint without generating any records:

type: gretel_model
name: train-model
input: s3-read
config:
    project_id: proj_1
    model: synthetics/default
    training_data: "{outputs.s3-read.dataset.files.data}"

Train and generate records from a model config:

type: gretel_model
name: train-run-model
input: s3-read
config:
  project_id: proj_1
  model_config:
      schema_version: "1.0"
      name: "tabular-lstm"
      models:
        - synthetics:
            data_source: __tmp__
            params:
              epochs: auto
              vocab_size: auto
              learning_rate: auto
              batch_size: auto
              rnn_units: auto
            privacy_filters:
              outliers: auto
              similarity: auto
  training_data: "{outputs.s3-read.dataset.files.data}"
  run_params:
      params:
          num_records_multiplier: 1

gretel_tabular Action Type

The gretel_tabular action can be used to train and generate records from Gretel Models. The gretel_tabular action additionally includes support for Gretel Relational and nested JSON data.

Inputs

project_id

The project to create the model in.

train

(Training details, documented below)

train.dataset

Data to use for training, including relationships between tables (if applicable). This should be a reference to the output from a previous action.

train.model

A reference to a blueprint or config location. If a config location is used, it must be addressable by the workflow action. This field is mutually exclusive to train.model_config.

train.model_config

Specify the model config as a dictionary. Accepts any valid model config. This field is mutually exclusive to train.model.

run

(Run details, documented below)

run.num_records_multiplier

(Synthetics models only.) Parameter for scaling output table size. If set to 1.0 (the default), the model will generate the same number of records for each table as the input dataset. If set to 2.0, the model will generate twice as many records as the input dataset.

Outputs

dataset

A dataset object containing the outputs from the models created by this action.

Example Configs

Generate a synthetic database from a blueprint:

type: gretel_tabular
name: model-train-run
input: mysql-read
config:
  project_id: proj_1
  train:
    dataset: "{outputs.mysql-read.dataset}"
    model: synthetics/tabular-actgan
  run:
    num_records_multiplier: 1.0

Generate a synthetic database from a model config:

type: gretel_tabular
name: model-train-run
input: mysql-read
config:
  project_id: proj_1
  train:
    dataset: "{outputs.mysql-read.dataset}"
    model_config:
        schema_version: "1.0"
        name: "tabular-actgan"
        models:
          - synthetics:
              data_source: __tmp__
              params:
                epochs: auto
                vocab_size: auto
                learning_rate: auto
                batch_size: auto
                rnn_units: auto
              privacy_filters:
                outliers: auto
                similarity: auto
  run:
    num_records_multiplier: 1.0

Writing Project Artifacts

The write_project_artifact action can be used to write an action output to a Gretel Project.

Inputs

project_id

The project to create the artifact in.

artifact_name

The name of the artifact.

data

Reference to a data handle.

Outputs

dataset

A dataset with exactly one item (the project artifact) represented as both a file and table.

Reading Project Artifacts

The read_project_artifact action can be used to read in existing Gretel Project Artifacts as inputs to other actions.

Inputs

project_id

The project id the artifact is located in.

artifact_id

The id of the artifact to read.

Outputs

dataset

A dataset with exactly one item (the project artifact) represented as both a file and table.

Examples

Train a Gretel Model from an existing project artifact

name: train-model-from-artifact

actions:
  - name: read-artifact
    type: read_project_artifact
    config:
      project_id: proj_1
      artifact_id: art_1

  - name: train-model
    type: gretel_model
    input: read-artifact
    config:
      project_id: proj_1
      model: synthetics/default
      training_data: "{outputs.read-artifact.dataset.files.data}"

Combine two models together using the outputs of one model as the inputs to another

name: combine-models

actions:
  - name: remove-pii
    type: gretel_model
    input: s3-crawl
    config:
      project_id: [project_id]
      model: transform/default
      run_params:
        params: {}
      training_data: "{outputs.source-action.dataset.files.data}"

  - name: synthesize-data
    type: gretel_model
    input: remove-pii
    config:
      project_id: [project_id]
      model: synthetics/tabular-lstm
      run_params:
        params:
          num_records_multiplier: 1
      training_data: "{outputs.remove-pii.dataset.files.data}"

Last updated