Config Syntax
Workflows are configured using YAML and can be managed from either the Gretel Console, SDK or CLI.
Config Structure
Workflows are configured using three top-level blocks: name
, trigger
, and actions
.
Name
The name
field sets the name of the workflow. This name is used as the canonical reference to the workflow. Workflow names do not need to be unique to a project, but should be descriptive enough to uniquely describe the purpose of the workflow.
Trigger
Triggers may be used to schedule recurring workflows using standard cron syntax. To schedule a workflow to run once daily, a workflow trigger might look like this:
For more detailed documentation please refer to the Scheduled Workflows docs.
Actions
The actions
block configures each step in the workflow.
Each action definition carries the same top-level configuration envelope with the following fields:
name
An identifier for the action. Action names must be unique within the scope of a workflow.
type
The specific action type, e.g. s3_source
or gretel_model
. (See below)
connection
input
Specify a preceding action as input to the current action.
config
The type-specific config.
See the Connectors section for type
and config
details for actions that work with sources and sinks. See Working with Models for type
and config
details for actions that interface with Gretel.
Template Expressions
Template expressions are used to dynamically configure actions based on the result of a preceding action. Template expressions are denoted by curly braces, i.e. {<template-expression>}
.
Accessing Action Outputs
Action outputs are accessed via the following form:
For example, a dataset output from a MySQL source action would be referenced like this:
You can append attribute components to the expression to dive into the output data structure. For example, to get the filename of each object from an Azure blob storage source action:
Enumerating Template Expression Values
Consider the following workflow config
In this config the s3-read
action outputs a dataset
object. In the next action - model-train-run
- we use the template expression {outputs.s3-read.dataset.files.data}
to define the training_data used for that action. When executing the workflow, the expression is resolved to a concrete set of values based on the outputs of s3-read
.
If the s3-read
action finds two files, a.csv
and b.csv
, we will enumerate two concrete instances of the model-train-run
config with:
training_data: <data handle to a.csv>
training_data: <data handle to b.csv>
Each instance of the config will get passed into the model-train-run
action, resulting in two trained models, one model for a.csv
and another for b.csv
.
Additionally, an action config can include multiple template expressions referring to different lists. For example, the s3-write
action above is configured with two template expressions, one referencing the original source filename, the other referencing synthesized data. The workflow runtime will automatically resolve these expressions to align such that again, there are two concrete instances of the s3-write
config enumerated, with:
filename: "a.csv"
input: <data handle to the synthetic output from the model trained on a.csv>
filename: "b.csv"
input: <data handle to the synthetic output from the model trained on b.csv>
Last updated