Config Syntax
Workflows are configured using YAML and can be managed from either the Gretel Console, SDK or CLI.
Config Structure
Workflows are configured using three top-level blocks: name
, trigger
, and actions
.
Name
The name
field sets the name of the workflow. This name is used as the canonical reference to the workflow. Workflow names do not need to be unique to a project, but should be descriptive enough to uniquely describe the purpose of the workflow.
Trigger
Triggers may be used to schedule recurring workflows using standard cron syntax. To schedule a workflow to run once daily, a workflow trigger might look like this:
For more detailed documentation please refer to the Scheduled Workflows docs.
Actions
The actions
block configures each step in the workflow.
Each action definition carries the same top-level configuration envelope with the following fields:
| An identifier for the action. Action names must be unique within the scope of a workflow. |
| The specific action type, e.g. |
| Pass a Connection ID to authenticate the action. This field is required for actions that connect to external services such as S3 or BigQuery. |
| Specify a preceding action as input to the current action. |
| The type-specific config. |
See the Connectors section for type
and config
details for actions that work with sources and sinks. See Working with Models for type
and config
details for actions that interface with Gretel.
Template Expressions
Template expressions are used to dynamically configure actions based on the result of a preceding action. Template expressions are denoted by curly braces, i.e. {<template-expression>}
.
Accessing Action Outputs
Action outputs are accessed via the following form:
For example, a dataset output from a MySQL source action would be referenced like this:
You can append attribute components to the expression to dive into the output data structure. For example, to get the filename of each object from an Azure blob storage source action:
Enumerating Template Expression Values
Consider the following workflow config
In this config the s3-read
action outputs a dataset
object. In the next action - model-train-run
- we use the template expression {outputs.s3-read.dataset.files.data}
to define the training_data used for that action. When executing the workflow, the expression is resolved to a concrete set of values based on the outputs of s3-read
.
If the s3-read
action finds two files, a.csv
and b.csv
, we will enumerate two concrete instances of the model-train-run
config with:
training_data: <data handle to a.csv>
training_data: <data handle to b.csv>
Each instance of the config will get passed into the model-train-run
action, resulting in two trained models, one model for a.csv
and another for b.csv
.
Additionally, an action config can include multiple template expressions referring to different lists. For example, the s3-write
action above is configured with two template expressions, one referencing the original source filename, the other referencing synthesized data. The workflow runtime will automatically resolve these expressions to align such that again, there are two concrete instances of the s3-write
config enumerated, with:
filename: "a.csv"
input: <data handle to the synthetic output from the model trained on a.csv>
filename: "b.csv"
input: <data handle to the synthetic output from the model trained on b.csv>
Last updated