Azure Blob

Connect to your Azure Blob containers.

Getting Started

Prerequisites to create an Azure Blob based workflow. You will need

  1. A connection to Azure Blob.

  2. A source container.

  3. A destination container. This can be the same as your source container.

Configuring an Azure Blob Connection

Azure Blob related actions require creating an azure connection. The connection must be configured with the correct permissions for each Gretel Action.

For specific permissions, please refer to the Minimum Permissions section under each corresponding action.

There are three ways to authenticate a Gretel Azure Blob Connection, each method requires different fields for connection creation:

Account Access Key

Connection Creation Parameters

name

Display name of your choosing used to identify your connection within Gretel.

account_name

Name of the Storage Account.

default_container

Default container to crawl data from. Different containers can be chosen at the azure_source and azure_destination actions.

First, create a file on your local computer containing the connection credentials. This file should also include type, name , config, and credentials. connection_target_type is optional; if omitted, the connection can be used for both source and destination action. The config and credentials fields should contain fields that are specific to the connection being created.

Below is an example Azure Blob connection using access key credentials:

{
    "type": "azure",
    "name": "my-azure-connection",
    "connection_target_type": "source"
    "config": {
        "account_name": "mystorageaccount",
        "default_container": "mycontainer",
    },
    "credentials": {
        "access_key": "..."
    }
}

Now that you've created the credentials file, use the CLI to create the connection

gretel connections create --project [project id] --from-file [credential_file.json]

Entra ID

name

Display name of your choosing used to identify your connection within Gretel.

account_name

Name of the Storage Account.

client_id

Application (client) ID.

tenant_id

Directory (tenant) ID.

username

Email of the Service Account.

entra_password

Password of the Service Account.

default_container

Default container to crawl data from. Different containers can be chosen at the azure_source and azure_destination actions.

First, create a file on your local computer containing the connection credentials. This file should also include type, name , config, and credentials. connection_target_type is optional; if omitted, the connection can be used for both source and destination action. The config and credentials fields should contain fields that are specific to the connection being created.

Below is an example Azure Blob connection using access key credentials:

{
    "type": "azure",
    "name": "my-azure-connection",
    "connection_target_type": "source"
    "config": {
        "account_name": "mystorageaccount",
        "default_container": "mycontainer",
        "entra_config": {
            "client_id": "12a345b6-1a23-1ab2-abc1-1ab234cde56f",
            "tenant_id": "78g901h2-7g89-7gh8-ghi7-7gh890ijk12l",
            "username": "serviceaccountemail@domain.com",
    },
    "credentials": {
        "entra_password": "..."
    }
}

Now that you've created the credentials file, use the CLI to create the connection

gretel connections create --project [project id] --from-file [credential_file.json]

SAS Token

name

Display name of your choosing used to identify your connection within Gretel.

account_name

Name of the Storage Account.

default_container

Default container to crawl data from. Different containers can be chosen at the azure_source and azure_destination actions.

First, create a file on your local computer containing the connection credentials. This file should also include type, name , config, and credentials. connection_target_type is optional; if omitted, the connection can be used for both source and destination action. The config and credentials fields should contain fields that are specific to the connection being created.

Below is an example Azure Blob connection file using access key credentials:

{
    "type": "azure",
    "name": "my-azure-connection",
    "connection_target_type": "source"
    "config": {
        "account_name": "mystorageaccount",
        "default_container": "mycontainer",
    },
    "credentials": {
        "sas_token": "..."
    }
}

Now that you've created the credentials file, use the CLI to create the connection

gretel connections create --project [project id] --from-file [credential_file.json]

Azure Blob Source

Type

azure_source

Connection

azure

The azure_source action can be used to read an object from an Azure Blob container into Gretel Models.

This action works as an incremental crawler. Each time a workflow is run the action will crawl new files that have landed in the container since the last crawl.

For details how the action more generally works, please see https://github.com/Gretellabs/docs/blob/main/workflows-and-connectors/connectors/object-storage/broken-reference/README.md.

Inputs

container

Container to crawl data from. If empty, will default to default_container.

glob_filter

A glob filter may be used to match file names matching a specific pattern. Please see the Glob Filter Reference for more details.

path

Prefix to crawl objects from. If no path is provided, the root of the container is used.

recursive

Default false. If set to true the action will recursively crawl objects starting from path.

Outputs

dataset

A dataset object containing file and table representations of the found objects.

Minimum Permissions

The associated service account must have the following permissions for the configured container

The SAS Token must have the following permissions for the configured container or storage account

  • List

  • Read

The SAS Token added for the storage account needs to have Container and Object allowed resource types.

Azure Blob Destination

Type

azure_destination

Connection

azure

The azure_destination action may be used to write gretel_model or gretel_tabular outputs to Azure Blob containers.

Inputs

container

Container to write data to. If empty, will default to default_container.

path

Defines the path prefix to write the object into.

filename

Name of the file to write data back to. This file name will be appended to the path if one is configured.

input

Data to write to the file. This should be a reference to the output from a previous action.

Outputs

None

Minimum Permissions

The associated service account must have the following permissions for the configured container

The SAS Token must have the following permissions for the configured container or storage account

  • Create

  • List

  • Write

The SAS Token added for the storage account needs to have Container and Object allowed resource types.

Examples

Create a synthetic copy of your Azure Blob container. The following config will crawl a container, train and run a synthetic model, then write the outputs of the model back to a destination container while maintaining the same folder structure of the source container.

name: sample-azure-workflow

actions:
  - name: azure-read
    type: azure_source
    connection: c_1
    config:
      container: my-default-container
      glob_filter: "*.csv"
      path: metrics/

  - name: model-train-run
    type: gretel_model
    input: azure-crawl
    config:
      project_id: proj_1
      model: synthetics/default
      run_params:
        params:
          num_records_multiplier: 1.0
      training_data: "{{outputs.azure-read.dataset.files.data}}"

  - name: azure-write
    type: azure_destination
    connection: c_1
    input: model-train-run
    config:
      container: my-synthetic-container
      input: "{{outputs.model-train-run.dataset.files.data}}"
      filename: "{{outputs.azure-read.dataset.files.data}}"
      path: metrics/

Last updated