Azure Blob
Connect to your Azure Blob containers.
Getting Started
Prerequisites to create an Azure Blob based workflow. You will need
A connection to Azure Blob.
A source container.
A destination container. This can be the same as your source container.
Configuring an Azure Blob Connection
Azure Blob related actions require creating an azure
connection. The connection must be configured with the correct permissions for each Gretel Action.
For specific permissions, please refer to the Minimum Permissions section under each corresponding action.
There are three ways to authenticate a Gretel Azure Blob Connection, each method requires different fields for connection creation:
Account Access Key
Connection Creation Parameters
name
Display name of your choosing used to identify your connection within Gretel.
account_name
Name of the Storage Account.
access_key
default_container
Default container to crawl data from. Different containers can be chosen at the azure_source
and azure_destination
actions.
First, create a file on your local computer containing the connection credentials. This file should also include type
, name
, config
, and credentials
. connection_target_type
is optional; if omitted, the connection can be used for both source and destination action. The config
and credentials
fields should contain fields that are specific to the connection being created.
Below is an example Azure Blob connection using access key credentials:
Now that you've created the credentials file, use the CLI to create the connection
Entra ID
name
Display name of your choosing used to identify your connection within Gretel.
account_name
Name of the Storage Account.
client_id
Application (client) ID.
tenant_id
Directory (tenant) ID.
username
Email of the Service Account.
entra_password
Password of the Service Account.
default_container
Default container to crawl data from. Different containers can be chosen at the azure_source
and azure_destination
actions.
First, create a file on your local computer containing the connection credentials. This file should also include type
, name
, config
, and credentials
. connection_target_type
is optional; if omitted, the connection can be used for both source and destination action. The config
and credentials
fields should contain fields that are specific to the connection being created.
Below is an example Azure Blob connection using access key credentials:
Now that you've created the credentials file, use the CLI to create the connection
SAS Token
name
Display name of your choosing used to identify your connection within Gretel.
account_name
Name of the Storage Account.
sas_token
default_container
Default container to crawl data from. Different containers can be chosen at the azure_source
and azure_destination
actions.
First, create a file on your local computer containing the connection credentials. This file should also include type
, name
, config
, and credentials
. connection_target_type
is optional; if omitted, the connection can be used for both source and destination action. The config
and credentials
fields should contain fields that are specific to the connection being created.
Below is an example Azure Blob connection file using access key credentials:
Now that you've created the credentials file, use the CLI to create the connection
Azure Blob Source
Type
azure_source
Connection
azure
The azure_source
action can be used to read an object from an Azure Blob container into Gretel Models.
This action works as an incremental crawler. Each time a workflow is run the action will crawl new files that have landed in the container since the last crawl.
For details how the action more generally works, please see https://github.com/Gretellabs/docs/blob/main/workflows-and-connectors/connectors/object-storage/broken-reference/README.md.
Inputs
container
Container to crawl data from. If empty, will default to default_container
.
glob_filter
A glob filter may be used to match file names matching a specific pattern. Please see the Glob Filter Reference for more details.
path
Prefix to crawl objects from. If no path
is provided, the root of the container is used.
recursive
Default false
. If set to true
the action will recursively crawl objects starting from path
.
Outputs
dataset
A dataset object containing file and table representations of the found objects.
Minimum Permissions
The associated service account must have the following permissions for the configured container
Storage Blob Data Reader role permissions, or higher
The SAS Token must have the following permissions for the configured container or storage account
List
Read
The SAS Token added for the storage account needs to have Container and Object allowed resource types.
Azure Blob Destination
Type
azure_destination
Connection
azure
The azure_destination
action may be used to write gretel_model
or gretel_tabular
outputs to Azure Blob containers.
Inputs
container
Container to write data to. If empty, will default to default_container
.
path
Defines the path prefix to write the object into.
filename
Name of the file to write data back to. This file name will be appended to the path
if one is configured.
input
Data to write to the file. This should be a reference to the output from a previous action.
Outputs
None
Minimum Permissions
The associated service account must have the following permissions for the configured container
Storage Blob Data Contributor role permissions, or higher
The SAS Token must have the following permissions for the configured container or storage account
Create
List
Write
The SAS Token added for the storage account needs to have Container and Object allowed resource types.
Examples
Create a synthetic copy of your Azure Blob container. The following config will crawl a container, train and run a synthetic model, then write the outputs of the model back to a destination container while maintaining the same folder structure of the source container.
Last updated