Gretel Workflows

Automate and operationalize synthetic data using Gretel Workflows

Gretel Workflows provide an easy to use, config driven API for automating and operationalizing Gretel. Using Connectors, you can connect Gretel Workflows to various data sources such as S3 or MySQL and schedule recurring jobs to make it easy to securely share data across your organization.

A Gretel Workflow is constructed of actions that connect to various services including object stores and databases. These actions are then composed to create a pipeline for processing data with Gretel. In the example above:

A source action is configured to extract data from a source, such as S3 or MySQL.
The extracted source data is passed as inputs to Gretel Models. Using Workflows you can chain together different types of models based on specific use cases or privacy needs.
A destination action writes output data from the models to a sink.

Creating Workflows in the Gretel Console

Log into the Gretel Console.
Navigate to the Workflows page using the menu item in the left side bar and follow the instructions to create a new workflow.
The wizard-based flow will guide you through model selection, data source and destination creation, and workflow configuration.
Once completed, all workflow runs can be viewed for a particular workflow via the Workflow page, or for all workflows and models on the Activity page.

For more detailed steps by step instructions see Managing Workflows.

Workflows as YAML

Workflows are configured using YAML. Below is an example workflow config that crawls an Amazon S3 bucket and creates an anonymized synthetic copy of the bucket contents in a destination bucket.

name: sample-s3-workflow

actions:
  - name: s3-read
    type: s3_source
    connection: c_1
    config:
      bucket: my-analytics-bucket
      glob_filter: "*.csv"
      path: metrics/

  - name: model-train-run
    type: gretel_model
    input: s3-read
    config:
      project_id: proj_1
      model: synthetics/tabular-actgan
      run_params:
        params:
          num_records_multiplier: 1.0
      training_data: "{outputs.s3-read.dataset.files.data}"

  - name: s3-write
    type: s3_destination
    connection: c_1
    input: model-train-run
    config:
      bucket: my-synthetic-analytics-bucket
      input: "{outputs.model-train-run.dataset.files.data}"
      filename: "{outputs.s3-read.dataset.files.filename}"
      path: metrics/

This second example workflow config connects to a MySQL database, creates a synthetic version of the database, and writes it to an output MySQL database.

name: sample-mysql-workflow-full-db

actions:
  - name: mysql-read
    type: mysql_source
    connection: conn_1
    config:
      sync:
          mode: full

  - name: synthesize
    type: gretel_tabular
    input: mysql-read
    config:
      project_id: proj_1
      train:
        model: synthetics/tabular-actgan
        dataset: "{outputs.mysql-read.dataset}"
      run:
        params:
          num_records_multiplier: 1.0

  - name: mysql-write
    type: mysql_destination
    connection: conn_2
    input: synthesize
    config:
      sync:
        mode: replace
      dataset: "{outputs.synthesize.dataset}"

Next Steps

Next, we'll dive deeper into the components that make up Workflows. You may also want to check out a list of supported sources and sinks here: Connectors.

PreviousMulti-table Transformations NextConcepts

Last updated 10 months ago

Was this helpful?

Creating Workflows in the Gretel Console

Log into the Gretel Console.

Navigate to the Workflows page using the menu item in the left side bar and follow the instructions to create a new workflow.

The wizard-based flow will guide you through model selection, data source and destination creation, and workflow configuration.

Once completed, all workflow runs can be viewed for a particular workflow via the Workflow page, or for all workflows and models on the Activity page.

For more detailed steps by step instructions see Managing Workflows.

Workflows as YAML

Workflows are configured using YAML. Below is an example workflow config that crawls an Amazon S3 bucket and creates an anonymized synthetic copy of the bucket contents in a destination bucket.

name: sample-s3-workflow

actions:
  - name: s3-read
    type: s3_source
    connection: c_1
    config:
      bucket: my-analytics-bucket
      glob_filter: "*.csv"
      path: metrics/

  - name: model-train-run
    type: gretel_model
    input: s3-read
    config:
      project_id: proj_1
      model: synthetics/tabular-actgan
      run_params:
        params:
          num_records_multiplier: 1.0
      training_data: "{outputs.s3-read.dataset.files.data}"

  - name: s3-write
    type: s3_destination
    connection: c_1
    input: model-train-run
    config:
      bucket: my-synthetic-analytics-bucket
      input: "{outputs.model-train-run.dataset.files.data}"
      filename: "{outputs.s3-read.dataset.files.filename}"
      path: metrics/

This second example workflow config connects to a MySQL database, creates a synthetic version of the database, and writes it to an output MySQL database.

name: sample-mysql-workflow-full-db

actions:
  - name: mysql-read
    type: mysql_source
    connection: conn_1
    config:
      sync:
          mode: full

  - name: synthesize
    type: gretel_tabular
    input: mysql-read
    config:
      project_id: proj_1
      train:
        model: synthetics/tabular-actgan
        dataset: "{outputs.mysql-read.dataset}"
      run:
        params:
          num_records_multiplier: 1.0

  - name: mysql-write
    type: mysql_destination
    connection: conn_2
    input: synthesize
    config:
      sync:
        mode: replace
      dataset: "{outputs.synthesize.dataset}"