Ask or search…
K

Gretel Workflows

Automate and operationalize synthetic data using Gretel Workflows
Gretel Workflows provide an easy to use, config driven API for automating and operationalizing Gretel. Using Connectors, you can connect Gretel Workflows to various data sources such as S3 or MySQL and schedule recurring jobs to make it easy to securely share data across your organization.
A Gretel Workflow is constructed of actions that connect to various services including object stores and databases. These actions are then composed to create a pipeline for processing data with Gretel. In the example above:
  1. 1.
    A source action is configured to extract data from a source, such as S3 or MySQL.
  2. 2.
    The extracted source data is passed as inputs to Gretel Models. Using Workflows you can chain together different types of models based on specific use cases or privacy needs.
  3. 3.
    A destination action writes output data from the models to a sink.

Creating Workflows in the Gretel Console

  1. 1.
    Log into the Gretel Console.
  2. 2.
    Navigate to the Workflows page using the menu item in the left side bar and follow the instructions to create a new workflow.
  3. 3.
    The wizard-based flow will guide you through model selection, data source and destination creation, and workflow configuration.
  4. 4.
    Once completed, all workflow runs can be viewed for a particular workflow via the Workflow page, or for all workflows and models on the Activity page.
For more detailed steps by step instructions see Managing Workflows.

Workflows as YAML

Workflows are configured using YAML. Below is an example workflow config that crawls an Amazon S3 bucket and creates an anonymized synthetic copy of the bucket contents in a destination bucket.
name: sample-s3-workflow
actions:
- name: s3-read
type: s3_source
connection: c_1
config:
bucket: my-analytics-bucket
glob_filter: "*.csv"
path: metrics/
- name: model-train-run
type: gretel_model
input: s3-read
config:
project_id: proj_1
model: synthetics/tabular-actgan
run_params:
params: {}
training_data: "{outputs.s3-read.dataset.files.data}"
- name: s3-write
type: s3_destination
connection: c_1
input: model-train-run
config:
bucket: my-synthetic-analytics-bucket
input: "{outputs.model-train-run.dataset.files.data}"
filename: "{outputs.s3-read.dataset.files.filename}"
path: metrics/
This second example workflow config connects to a MySQL database, creates a synthetic version of the database, and writes it to an output MySQL database.
name: sample-mysql-workflow-full-db
actions:
- name: mysql-read
type: mysql_source
connection: conn_1
config:
sync:
mode: full
- name: synthesize
type: gretel_tabular
input: mysql-read
config:
project_id: proj_1
train:
model: synthetics/tabular-actgan
dataset: "{outputs.mysql-read.dataset}"
run:
num_records_multiplier: 1.0
- name: mysql-write
type: mysql_destination
connection: conn_2
input: synthesize
config:
sync:
mode: replace
dataset: "{outputs.synthesize.dataset}"

Next Steps

Next, we'll dive deeper into the components that make up Workflows. You may also want to check out a list of supported sources and sinks here: Connectors.