Working with Data
To use data extracted by a connector as training input to a Gretel model, we need to understand how data is passed between Workflow Actions. Each Workflow Action produces a set of outputs that can be referenced by downstream actions as inputs.
These inputs are configured on each action's config
block as a template expression. The properties of these inputs might take a number of different forms depending on the type of data being worked with.
Data Types
File
The file
data structure holds information about a data file, such as a CSV in object storage.
data | string, the data handle |
filename | string, the stem of the file (e.g. |
source_filename | string, the name of the file with any path prefix (e.g. |
Table
The table
data structure holds information about a table extracted from a relational database or data warehouse.
data | string, the data handle |
name | string, the name of the table |
Dataset
A dataset
is an umbrella data structure containing collections of files and tables, as well as metadata like table relationships used internally by various actions. All actions output exactly one dataset
.
files |
|
tables |
|
Referencing outputs via Template Expressions
All Gretel Workflow actions output a dataset
object that can then be referenced from a template expression in subsequent actions. Some actions require an entire dataset
as input, while others require finer-grained inputs like file
names and data handles. Each action documents its required inputs.
For more detail on template expression syntax, see the Config Syntax docs.
Last updated