Architecture and Components

Get familiar with Gretel's architectural components.

Introduction

Explore our guides and examples to integrate Gretel.ai into your workflows. To get started, sign up in our console. Once your account is created you will be able to start building privacy engineering workflows easily.

Gretel Components

Gretel has three architectural components that you will want to be familiar with:

  • Gretel Cloud: The control plane for scheduling work such as creating models and generating, classifying, or transforming data. This includes the Gretel REST API, Console and CLI tool. The REST API is hosted as a service and is used to manage accounts, projects, and metadata for models.

Regardless of where Gretel Workers run, they will communicate to Gretel’s REST API to communicate timing information, errors, and additional metadata. If you use workers in your own environment, no training data or sensitive information will be sent back to Gretel’s API.

  • Gretel Configurations: Declarative objects that are used to create models. Gretel offers several configuration templates to help you get started with popular use cases such as creating synthetic datasets or anonymizing PII. These configurations are sent to the Gretel REST API to create models. These models can then be used to generate, transform, and classify data.

  • Gretel Workers: Containers that consume Gretel Configurations and handle requests to process records. When a worker consumes a Gretel Configuration, it creates a re-usable model. Additionally, workers can utilize existing models to generate, transform, and classify record

These components work together to enable developers to build robust and flexible privacy engineering workflows.

Gretel Architecture

Gretel’s CLI tool and Console automate privacy engineering workflows by working directly with Gretel’s REST API. Throughout this documentation, you will see how to achieve these tasks with examples from our Console and CLI. The REST API can always be used directly to create your own custom or more advanced workflows.

Gretel Cloud

Gretel Cloud acts as the control plane for scheduling model creation and generation, transformation, or classification of data records.

Projects

The primary object within Gretel Cloud that you will be working with is a Project. Projects are like repositories of models and associated data that enable privacy engineering. You can invite other users to a project and control their permissions.

The following primitives exist within a Gretel Project:

  • Project Artifacts: These are datasets that can be uploaded and stored with your project. These artifacts are typically datasets that can be used to create models. Project artifacts can be uploaded by anyone with “write” access to a project. Additionally, project artifacts will be kept with the project until they are explicitly deleted. When using the Gretel Console or CLI you use Gretel Cloud Workers by default, and project artifacts will automatically be created for you from your training data. Project artifacts will have a specific structure. If your training data is called my-data.csv then an example artifact key might be: gretel_89bdba626464477aaeeef96fc8b2b613_my-data.csv. This key can be used as a data source for training or running models.

  • Models: Models are created on source datasets. You configure a model to be created using a Gretel Configuration which allows you to specify a source dataset, model type, and various parameters. You can train a model to generate synthetic data, transform records, or classify records. For each model that is created, the following artifacts are created:

    • A model archive, which can be referenced to generate, transform, and classify data at scale.

    • A model report. For synthetic models, this will be the Gretel Synthetic Report. For transforms, this will be a Gretel Transform Report.

    • Sample data. A small sample of synthesized or transformed data will be created as part of the model creation process.

  • Model Servers: After a model has been created, you may run that model as many times as you like to generate, transform, and classify new data. The result of the model server will be an output dataset that can be shared or used for your downstream use case.

Uploading project artifacts, model creation, and model server creation can only be done by Project members that have “write” access or higher.

Gretel Configurations

Gretel configurations are declarative objects that specify how a model should be created. Configurations can be authored in YAML or JSON. To help you get started, we have several Configuration Templates. You may download and edit these templates as necessary or directly reference them when using the CLI (see our tutorials on using the templates directly for model creation).

The configuration file is the primary way to specify how a model can be created. When a model is requested to be created, a copy of this configuration will be sent to Gretel Cloud. Regardless of where a Gretel Worker is run, this configuration will be stored in Gretel Cloud and associated with the model.

When a Gretel Worker is scheduled (in our cloud or your own environment), it will contact Gretel Cloud and download a copy of the configuration and then start the model creation process.

To learn more about the configurations, please see the Model Configurations documentation.

Gretel Workers

Gretel workers can create or run models. Workers are containerized applications that are designed to communicate directly with Gretel Cloud. All communications will occur over HTTPS (Port 443) to api.gretel.cloud. If you are running your own Gretel Workers, they will need this communications path open.

Gretel workers can run in our cloud or your own environment

Workers are stateful and will transition through different statuses during their run time. Additionally, during their run time, the workers will periodically check in with Gretel Cloud to transmit usage information (for billing), status updates, generalized run logs, and error / troubleshooting diagnostic information.

When you run your own worker, your training (and possibly sensitive) data will never be sent to Gretel Cloud. Gretel Cloud operates as the control plane only.

A Gretel Worker can be in one of the following states:

  • created - A request for a worker has been made. This is the default state for a worker and will stay in this state until a worker is launched. By default, a user may have up to 10 created workers. This essentially serves as your “queue” for creating or running models.

  • pending - When using Gretel Cloud Workers, this state indicates that our scheduling service has obtained the request and is provisioning a worker for your model or model server.

  • active - A worker is creating a model, generating, or processing records. Once a worker is in this state it will begin periodically sending control plane and logging information back to Gretel Cloud.

  • completed - A worker successfully completed its job. If it was a Gretel Cloud Worker, all model or server artifacts have been uploaded and stored in Gretel Cloud. If using a local worker, then all artifacts should have been written out the location specified when starting the job.

  • error - A worker countered an error. Basic error and troubleshooting information should have been sent to Gretel Cloud.

  • cancelled - A user has cancelled the worker. When a worker is cancelled, the worker will promptly shut down operation and cease all processing.

  • lost- A worker will be marked as lost if Gretel Cloud has been unable to communicate with the worker after some period of time.

In the event of an error, cancelled, or lost status, a worker cannot recover from this state. A new model or server will have to be created once the underlying issue is fixed.

To create a model, a Gretel worker is launched and will download a configuration from Gretel Cloud. Once the configuration is loaded, the worker will obtain the training data and begin creating a synthetic, transform, or classification model.

To run a model, you create a model server. Depending on the model type, a server can be used to generate, transform, or classify data. Once a model is created, you may create several servers to generate or process data at scale with that model. Like models, model servers are requested through Gretel Cloud and Workers are launched to generate or process records using a pre-existing model.

Workers can be automatically launched for you in Gretel Cloud. This is the default mode when uploading a configuration from the Console or the CLI. In cloud mode, once a request for a model is received, Gretel will provision a worker for you and the model and associated artifacts (such as quality reports, sample data, etc) will also be stored in Gretel Cloud. You may download these artifacts at any time. With a model created and stored in Gretel Cloud, model servers can be created to utilize the model and generate, transform, or classify data.

Optionally, you may run a Gretel Worker in your own environment. When you run your own worker, it will still need to communicate with Gretel Cloud. Please check out our Deep Dives on running different workloads in your own environment to get started with your own workers. Workers can currently be run in your own environment using the Gretel CLI. There are a few things to note when running your own workers:

  • The CLI will run the worker in a Docker container. This must be installed on your host before running your own worker. See our guide on setting up your environment for Docker + GPU support.

  • Artifacts (model archives, reports, sample data, etc) will be saved to a destination directory specified by the user. Artifacts, by default, will not be uploaded to Gretel Cloud when running your own workers.

  • The worker will need to connect to Gretel Cloud to transmit timing, logging, and error telemetry.

Service Limits

During Gretel's Beta2 period, the following service limits will exist when running Gretel Workers. These limits apply to both workers in your own environment and workers in Gretel Cloud.

  • Maximum Queued Jobs (10). This is the maximum number of jobs that can be in a created state. If you are using Gretel Cloud workers, these jobs are automatically queued to start. While a worker is in this state, you may delete it or cancel it at any time. When this number is exceeded, API calls will return a 4xx error when attempting to create new models or model servers.

  • Maximum Running Workers (1). This is the maximum number of jobs that can be in an active state. When using Gretel Cloud workers, if this limit is exceeded, Gretel will wait for work to complete and then automatically start a new job from the queue of created jobs. When running local workers, if the worker starts and the limit is exceeded, the job will be put into an error state.

  • Maximum Worker Duration (1 hour). This is the maximum amount of time a worker can be in an active state either creating or serving a model. If the job exceeds this limit, the job will be put into an error state.

To request an increase in any of these limits, please open a ticket.

Next Steps

Now that you have learned the basic components of Gretel to include Gretel Cloud, Configurations, and Workers it’s time to dive into creating and running your first models! Check out our environment configuration and basic tutorials next where we will combine the Gretel CLI and some sample data to create your first synthetic or transform models using Gretel Cloud Workers!