Object Storage
Connect Gretel to object storage based services.
Gretel Workflows support connecting to the following object storage services
Reading Objects
Object storage source actions will incrementally crawl buckets searching for files that have changed between runs. Crawled files can then be configured as inputs to Gretel Models.
Each crawled object is passed as an input to the configured Gretel Model. Some models may need a certain amount of records, while other models might not work for datasets that are too large.
For best results, ensure objects contain the appropriate amount of records to successfully train and run the downstream model.
Glob Filter and Path Configurations
A glob filter can be configured to ensure files matching a specific pattern are used as sources. Files not matching the pattern will be excluded from the crawl.
A glob filter is evaluated against the filename or key of the object.
The character
*
is used to matches any number of characters, excluding slashes.Passing
**
recursively matches any number of nested directories.Checks are case-sensitive
Examples
*.txt
data.txt
Yes, any txt file in the current path will be matched.
*.png
data.json
No, json files do not container a png ending.
my/path/*.txt
my/path/data.txt
Yes, any txt files under my/path
are matched
**/*.csv
my/path/data.csv
Yes, any csv file is recursively matched.
**
data.csv
Yes, all files are recursively matched.
*/**
data.csv
No, any files in the root directory are excluded.
In addition to a glob filter, a source action can be configured to crawl in a specific path. Configuring a path will narrow the set of objects that the bucket crawler will list or search.
It's recommended to configure a narrow bucket path when possible. This reduces the amount of objects the crawler must list, and speeds up each crawl.
Writing Objects
Object storage destination actions can be configured to write the synthetic data outputs of a Gretel Model back to object storage.
Each object storage destination action can be configured to mirror the directory structure of the source bucket or can be configured to create new directory layouts.
Limitations
For a list of supported file types, please refer to Inputs and Outputs.
Last updated