Working with Data

To use data extracted by a connector as training input to a Gretel model, we need to understand how data is passed between Workflow Actions. Each Workflow Action produces a set of outputs that can be referenced by downstream actions as inputs.

These inputs are configured on each action's config block as a template expression. The properties of these inputs might take a number of different forms depending on the type of data being worked with.

Data Types

File

The file data structure holds information about a data file, such as a CSV in object storage.

data

string, the data handle

filename

string, the stem of the file (e.g. events.csv)

source_filename

string, the name of the file with any path prefix (e.g. sources/events.csv)

Table

The table data structure holds information about a table extracted from a relational database or data warehouse.

data

string, the data handle

name

string, the name of the table

Dataset

A dataset is an umbrella data structure containing collections of files and tables, as well as metadata like table relationships used internally by various actions. All actions output exactly one dataset.

files

file list

tables

table list

Referencing outputs via Template Expressions

All Gretel Workflow actions output a dataset object that can then be referenced from a template expression in subsequent actions. Some actions require an entire dataset as input, while others require finer-grained inputs like file names and data handles. Each action documents its required inputs.

For more detail on template expression syntax, see the Config Syntax docs.

Last updated