Automate and operationalize synthetic data using Gretel Workflows
Gretel Workflows provide an easy to use, config driven API for automating and operationalizing Gretel. Using Workflows you can connect to various data sources such as S3 or MySQL and schedule recurring jobs to make it easy to secure and share data across your organization.
A Gretel Workflow is constructed of actions that connect to various services including object stores and databases. These actions are then composed to create a pipeline for processing data with Gretel. In the example above:
  1. 1.
    A source action is configured to extract data from data sources such as S3 or MySQL.
  2. 2.
    The extracted source data is passed as inputs to Gretel Models. Using Workflows you can chain together different types of models based on specific use cases or privacy needs.
  3. 3.
    Using the outputs of the model, a destination action can write to a corresponding destination data source such as S3 or MySQL.

Creating Workflows in the Gretel Console

  1. 1.
    Log into the Gretel Console.
  2. 2.
    Navigate to the Workflows page using the menu item in the left side bar and follow the instructions to create a new workflow.
  3. 3.
    The wizard-based flow will guide you through model selection, data source and destination creation, and workflow configuration.
  4. 4.
    Once completed, all workflow runs can be viewed for a particular workflow via the Workflow page, or for all workflows and models on the Activity page.
For more detailed steps by step instructions see Managing Workflows.

Workflows as YAML

Workflows are configured using YAML. Below is an example workflow config that crawls a S3 bucket and creates an anonymized synthetic copy of the bucket contents in a destination bucket.
name: sample-s3-workflow
- name: s3-crawl
type: s3_source
connection: c_1
bucket: my-analytics-bucket
glob_filter: "*.csv"
path: metrics/
- name: model-train-run
type: gretel_model
input: s3-crawl
project_id: proj_1
model: synthetics/default
params: {{}}
training_data: "{{}}"
- name: s3-sync
type: s3_destination
connection: c_1
input: model-train-run
bucket: my-synthetic-analytics-bucket
input: "{{}}"
filename: "{{s3-crawl.outputs.filename}}"
path: metrics/

Next Steps

  • For a list of supported data sources see Integrations.
  • Please refer to Concepts for a more detailed overview of the components that make up Workflows.
  • For more information about authoring Workflow configs please reference Config Syntax.