Amazon S3

Connect Gretel to your Amazon S3 buckets.

This guide will walk you through connecting source and destination S3 buckets to Gretel. Source buckets will be crawled and used as training inputs to Gretel models. Model outputs get written to the configured S3 destination.

Getting Started

Prerequisites to create a Amazon S3 based workflow. You will need

A connection to Amazon S3.
A source bucket.
(optional) A destination bucket. This can be the same as your source bucket, or omitted entirely.

Configuring a Connection

Amazon S3 related actions require creating an s3 connection. The connection must be configured with the correct IAM permissions for each Gretel Action.

You can configure the following properties for a connection

access_key_id

Unique identifier used to authenticate and identify the user.

secret_access_key

Secret value used to sign requests.

All credentials sent to Gretel are encrypted both in transit and at rest.

The following policy can be used to enable access for all S3 related actions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "GretelS3Source",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObject",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::your-source-bucket-here",
        "arn:aws:s3:::your-source-bucket-here/*"
      ]
    },
    {
      "Sid": "GretelS3Destination",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts",
        "s3:ListBucketMultipartUploads",
        "s3:CreateMultipartUpload",
        "s3:UploadPart",
        "s3:CompleteMultipartUpload"
      ],
      "Resource": [
        "arn:aws:s3:::your-destination-bucket-here/*"
      ]
    }
  ]
}

More granular permissions for each action can be found in the action's respective Minimum Permissions section.

Creating Access Keys

The following documentation provides instruction for creating IAM users and access keys from your AWS account.

Managing access keys for IAM users - AWS Identity and Access ManagementAWS Identity and Access Management

Creating an IAM Role

You can configure your Gretel S3 connector to use an IAM role for authorization. Using IAM roles you can grant Gretel systems access to your bucket without sharing any static access keys.

Before setting up your IAM role, you must first locate the Gretel Project ID for the project you wish to create the connection in. You will use the project id as the external id for the IAM role.

You may find your Gretel Project ID from the Console or SDK using the following instructions:

Navigate to the Projects page, and select Copy UID from the project drop-down on the right.

This should automatically copy the project id to your clipboard.

from gretel_client import create_or_get_unique_project
from gretel_client.config import get_session_config

session = get_session_config()
project = create_or_get_unique_project(name="s3-workflows")

print(f"Project Id: {project.project_guid}")

Running the snippet above, should yield an output such as

Project Id: proj_28N5smcmkGnD6H5pd17tZwfYkQ1

Now that you have the external id, you will need to create an AWS IAM role. To create the role, navigate to your AWS IAM Console, select the Roles page from the left menu, select Create Role and follow the instruction for Gretel Cloud below:

From the Role Creation dialog

Select AWS account as the Trusted entity type.
From the Select Another AWS account and enter Gretel's AWS account 074762682575.
Check Require external ID and enter the Gretel Project ID from the previous step as the External ID.
Select Next and add the appropriate IAM policies for the bucket.

The final trust policy on your IAM role should look similar to

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "074762682575"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "<your gretel project id, eg proj_28N5smcmkGnD6H5pd17tZwfYkQ1>"
                }
            }
        }
    ]
}

For more information about delegating permissions to an AWS IAM user, please reference the following AWS documentation:

Creating a role to delegate permissions to an IAM user - AWS Identity and Access ManagementAWS Identity and Access Management

Now that you have the role configured, you can create a Gretel connection using the role ARN from the the previous step.

From the Gretel Console, navigate to the Create Connection dialog, select S3, select the Role ARN authentication method, and enter the role ARN created in the previous steps.

from gretel_client import create_or_get_unique_project
from gretel_client.config import get_session_config
from gretel_client.rest_v1.api.connections_api import ConnectionsApi
from gretel_client.rest_v1.models import CreateConnectionRequest

session = get_session_config()
connection_api = session.get_v1_api(ConnectionsApi)

project = create_or_get_unique_project(name="s3-workflows")

connection = connection_api.create_connection(
    CreateConnectionRequest(
        name="my-s3-source-bucket",
        project_id=project.project_guid,
        type="s3",
        config={
            "role_arn": "arn:aws:iam::123456789012:role/s3-gretel-source-access",
        },
    )
)

PreviousObject Storage NextGoogle Cloud Storage

Last updated 2 months ago

Was this helpful?