Google Cloud Storage
Connect to your Google Cloud Storage buckets.
Getting Started
Prerequisites to create a Google Cloud storage based workflow. You will need
A connection to Google Cloud Storage.
A source bucket.
(optional) A destination bucket. This can be the same as your source bucket, or omitted entirely.
Configuring a Google Cloud Storage Connection
Google Cloud Storage related actions require creating a gcs
connection. The connection must be configured with the correct permissions for each Gretel Action.
For specific permissions, please refer to the Minimum Permissions section under each corresponding action.
Gretel GCS connections require the following fields
| This private key JSON blob is used to authenticate Gretel with GCS object storage resources. |
Create a Service Account
In order to generate a private key you will first need to create a service account, and then download the key for that service account.
Configure Bucket IAM Permissions
After the service account has been created, you can attach bucket specific permissions to the service account.
Please see each action's Minimum Permissions section for a list of permissions to attach to the service account.
GCS Source
Type |
|
Connection |
|
The gcs_source
action can be used to read an object from a GCS bucket into Gretel Models.
This action works as an incremental crawler. Each time a workflow is run the action will crawl new files that have landed in the bucket since the last crawl.
Inputs
| Bucket to crawl data from. Should only include the name, such as |
| A glob filter may be used to match file names matching a specific pattern. Please see the Glob Filter Reference for more details. |
| Prefix to crawl objects from. If no |
| Default |
Outputs
| A dataset object containing file and table representations of the found objects. |
Minimum Permissions
The associated service account must have the following permissions for the configured bucket
storage.objects.list
storage.objects.get
GCS Destination
Type |
|
Connection |
|
The gcs_destination
action may be used to write gretel_model
outputs to Google Cloud Storage buckets.
Inputs
| The bucket to write objects back to. Only include the name of the bucket, eg |
| Defines the path prefix to write the object into. |
| Name of the file to write data back to. This file name will be appended to the |
| Data to write to the file. This should be a reference to the output from a previous action. |
Outputs
None
Minimum Permissions
The associated service account must have the following permissions for the configured destination bucket
storage.objects.create
storage.objects.delete
(supports replacing an existing file in the bucket)
Examples
Create a synthetic copy of your Google Cloud Storage bucket. The following config will crawl a bucket, train and run a synthetic model, then write the outputs of the model back to a destination bucket while maintaining the same folder structure of the source bucket.
Last updated