Working with Data

To use data extracted by a connector as training input to a Gretel model, we need to understand how data is passed between Workflow Actions. Each Workflow Action produces a set of outputs that can be referenced by downstream actions as inputs.

These inputs are configured on each action's config block as a template expression. The properties of these inputs might take a number of different forms depending on the type of data being worked with.

Data Types


The file data structure holds information about a data file, such as a CSV in object storage.


string, the data handle


string, the stem of the file (e.g. events.csv)


string, the name of the file with any path prefix (e.g. sources/events.csv)


The table data structure holds information about a table extracted from a relational database or data warehouse.


string, the data handle


string, the name of the table


A dataset is an umbrella data structure containing collections of files and tables, as well as metadata like table relationships used internally by various actions. All actions output exactly one dataset.


file list


table list

Referencing outputs via Template Expressions

All Gretel Workflow actions output a dataset object that can then be referenced from a template expression in subsequent actions. Some actions require an entire dataset as input, while others require finer-grained inputs like file names and data handles. Each action documents its required inputs.

For more detail on template expression syntax, see the Config Syntax docs.

Last updated