LogoLogo
  • Welcome to Gretel!
  • Gretel Basics
    • Getting Started
      • Quickstart
      • Blueprints
      • Use Case Examples
      • Environment Setup
        • Console
        • SDK
      • Projects
      • Inputs and Outputs
      • Gretel Connectors
        • Object Storage
          • Amazon S3
          • Google Cloud Storage
          • Azure Blob
        • Database
          • MySQL
          • PostgreSQL
          • MS SQL Server
          • Oracle Database
        • Data Warehouse
          • Snowflake
          • BigQuery
          • Databricks
        • Gretel Project
    • Release Notes
      • Platform Release Notes
        • May 2025
        • April 2025
        • March 2025
        • February 2025
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
        • July 2024
        • June 2024
      • Console Release Notes
        • January 2025
        • December 2024
        • November 2024
        • October 2024
        • September 2024
        • August 2024
      • Python SDKs
  • Create Synthetic Data
    • Gretel Safe Synthetics
      • Transform
        • Reference
        • Examples
        • Supported Entities
      • Synthetics
        • Gretel Tabular Fine-Tuning
        • Gretel Text Fine-Tuning
        • Gretel Tabular GAN
        • Benchmark Report
        • Privacy Protection
      • Evaluate
        • Synthetic Quality & Privacy Report
        • Tips to Improve Synthetic Data Quality
        • Data Privacy 101
      • SDK
    • Gretel Data Designer
      • Getting Started with Data Designer
      • Define your Data Columns
        • Column Types
        • Add Constraints to Columns
        • Custom Model Configurations
        • Upload Files as Seeds
      • Building your Dataset
        • Seeding your Dataset
        • Generating Data
      • Generate Realistic Personal Details
      • Structured Outputs
      • Code Validation
      • Data Evaluation
      • Magic Assistance
      • Using Jinja Templates
  • Gretel Playground [Legacy]
    • Getting Started
    • Prompts Tips & Best Practices
    • FAQ
    • SDK Examples
    • Tutorials
    • Videos
    • Gretel Playground [Legacy] Inference API
    • Batch Job SDK
  • Reference
    • Gretel's Python Client
    • Gretel’s Open Source Synthetic Engine
    • Gretel’s REST API
    • Homepage
    • Model Suites
Powered by GitBook
On this page
  • Getting Started
  • Configuring a Connection
  • Creating Access Keys
  • Creating an IAM Role

Was this helpful?

Export as PDF
  1. Gretel Basics
  2. Getting Started
  3. Gretel Connectors
  4. Object Storage

Amazon S3

Connect Gretel to your Amazon S3 buckets.

PreviousObject StorageNextGoogle Cloud Storage

Last updated 27 days ago

Was this helpful?

This guide will walk you through connecting source and destination S3 buckets to Gretel. Source buckets will be crawled and used as training inputs to Gretel models. Model outputs get written to the configured S3 destination.

Getting Started

Prerequisites to create a Amazon S3 based workflow. You will need

  1. A connection to .

  2. A source bucket.

  3. (optional) A destination bucket. This can be the same as your source bucket, or omitted entirely.

Configuring a Connection

Amazon S3 related actions require an s3 connection. The connection must be configured with the correct IAM permissions for each Gretel Action.

You can configure the following properties for a connection

access_key_id

Unique identifier used to authenticate and identify the user.

secret_access_key

Secret value used to sign requests.

All credentials sent to Gretel are encrypted both in transit and at rest.

The following policy can be used to enable access for all S3 related actions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "GretelS3Source",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObject",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::your-source-bucket-here",
        "arn:aws:s3:::your-source-bucket-here/*"
      ]
    },
    {
      "Sid": "GretelS3Destination",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts",
        "s3:ListBucketMultipartUploads",
        "s3:CreateMultipartUpload",
        "s3:UploadPart",
        "s3:CompleteMultipartUpload"
      ],
      "Resource": [
        "arn:aws:s3:::your-destination-bucket-here/*"
      ]
    }
  ]
}

More granular permissions for each action can be found in the action's respective Minimum Permissions section.

Creating Access Keys

The following documentation provides instruction for creating IAM users and access keys from your AWS account.

Creating an IAM Role

You can configure your Gretel S3 connector to use an IAM role for authorization. Using IAM roles you can grant Gretel systems access to your bucket without sharing any static access keys.

You may find your Gretel Project ID from the Console or SDK using the following instructions:

Navigate to the Projects page, and select Copy UID from the project drop-down on the right.

This should automatically copy the project id to your clipboard.

from gretel_client import create_or_get_unique_project
from gretel_client.config import get_session_config

session = get_session_config()
project = create_or_get_unique_project(name="s3-workflows")

print(f"Project Id: {project.project_guid}")

Running the snippet above, should yield an output such as

Project Id: proj_28N5smcmkGnD6H5pd17tZwfYkQ1

Now that you have the external id, you will need to create an AWS IAM role. To create the role, navigate to your AWS IAM Console, select the Roles page from the left menu, select Create Role and follow the instruction for Gretel Cloud below:

From the Role Creation dialog

  1. Select AWS account as the Trusted entity type.

  2. From the Select Another AWS account and enter Gretel's AWS account 074762682575.

  3. Check Require external ID and enter the Gretel Project ID from the previous step as the External ID.

The final trust policy on your IAM role should look similar to

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "074762682575"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "<your gretel project id, eg proj_28N5smcmkGnD6H5pd17tZwfYkQ1>"
                }
            }
        }
    ]
}

For more information about delegating permissions to an AWS IAM user, please reference the following AWS documentation:

Now that you have the role configured, you can create a Gretel connection using the role ARN from the the previous step.

From the Gretel Console, navigate to the Create Connection dialog, select S3, select the Role ARN authentication method, and enter the role ARN created in the previous steps.

from gretel_client import create_or_get_unique_project
from gretel_client.config import get_session_config
from gretel_client.rest_v1.api.connections_api import ConnectionsApi
from gretel_client.rest_v1.models import CreateConnectionRequest

session = get_session_config()
connection_api = session.get_v1_api(ConnectionsApi)

project = create_or_get_unique_project(name="s3-workflows")

connection = connection_api.create_connection(
    CreateConnectionRequest(
        name="my-s3-source-bucket",
        project_id=project.project_guid,
        type="s3",
        config={
            "role_arn": "arn:aws:iam::123456789012:role/s3-gretel-source-access",
        },
    )
)

Before setting up your IAM role, you must first locate the Gretel Project ID for the project you wish to create the connection in. You will use the project id as the for the IAM role.

Select Next and add the appropriate IAM for the bucket.

external id
policies
Amazon S3
creating
Managing access keys for IAM users - AWS Identity and Access ManagementAWS Identity and Access Management
Creating a role to delegate permissions to an IAM user - AWS Identity and Access ManagementAWS Identity and Access Management
Logo
Logo