Google Cloud Storage
Connect to your Google Cloud Storage buckets.
Last updated
Connect to your Google Cloud Storage buckets.
Last updated
Prerequisites to create a Google Cloud storage based workflow. You will need
A connection to Google Cloud Storage.
A source bucket.
(optional) A destination bucket. This can be the same as your source bucket, or omitted entirely.
Google Cloud Storage related actions require creating a gcs
connection. The connection must be configured with the correct permissions for each Gretel Action.
For specific permissions, please refer to the Minimum Permissions section under each corresponding action.
Gretel GCS connections require the following fields
In order to generate a private key you will first need to create a service account, and then download the key for that service account.
After the service account has been created, you can attach bucket specific permissions to the service account.
Please see each action's Minimum Permissions section for a list of permissions to attach to the service account.
The gcs_source
action can be used to read an object from a GCS bucket into Gretel Models.
This action works as an incremental crawler. Each time a workflow is run the action will crawl new files that have landed in the bucket since the last crawl.
The associated service account must have the following permissions for the configured bucket
storage.objects.list
storage.objects.get
The gcs_destination
action may be used to write gretel_model
outputs to Google Cloud Storage buckets.
None
The associated service account must have the following permissions for the configured destination bucket
storage.objects.create
storage.objects.delete
(supports replacing an existing file in the bucket)
Create a synthetic copy of your Google Cloud Storage bucket. The following config will crawl a bucket, train and run a synthetic model, then write the outputs of the model back to a destination bucket while maintaining the same folder structure of the source bucket.
For details how the action more generally works, please see .
For details how the action more generally works, please see .
Type
gcs_source
Connection
gcs
bucket
Bucket to crawl data from. Should only include the name, such as my-gretel-source-bucket
.
glob_filter
A glob filter may be used to match file names matching a specific pattern. Please see the Glob Filter Reference for more details.
path
Prefix to crawl objects from. If no path
is provided, the root of the bucket is used.
recursive
Default false
. If set to true
the action will recursively crawl objects starting from path
.
dataset
A dataset object containing file and table representations of the found objects.
Type
gcs_destination
Connection
gcs
bucket
The bucket to write objects back to. Only include the name of the bucket, eg my-gretel-bucket
.
path
Defines the path prefix to write the object into.
filename
Name of the file to write data back to. This file name will be appended to the path
if one is configured.
input
Data to write to the file. This should be a reference to the output from a previous action.
private_key_json
This private key JSON blob is used to authenticate Gretel with GCS object storage resources.