Architecture
Get familiar with Gretel's architectural components.
Last updated
Get familiar with Gretel's architectural components.
Last updated
Gretel has three architectural components that you will want to be familiar with:
Gretel Control Plane: The control plane for scheduling work such as creating models and generating, classifying, or transforming data. This includes the Gretel REST API, Console and CLI tool. The REST API is hosted as a service and is used to manage accounts, projects, and metadata for projects, workflows, and models.
Regardless of where Gretel Workers run, they will communicate to Gretel’s REST API to communicate timing information, errors, and additional metadata. If you use workers in your own environment, no training data or sensitive information will be sent back to Gretel’s API.
Gretel Data Plane: Containers that consume Gretel Configurations and handle requests to process records. When a worker consumes a Gretel Configuration, it creates a re-usable model. Additionally, workers can utilize existing models to generate, transform, and classify records. The data plane also includes several controller microservices that are responsible for detecting queued jobs and scheduling the required worker containers. Gretel Cloud's managed data plane will execute all of your workloads by default. Gretel Hybrid allows customers to deploy their own Gretel Data Plane into their preferred cloud environment which will enable customers to utilize all of Gretel's incredible features without the need for data to leave the boundaries of your cloud tenant. See Deployment Options for more details.
Gretel Configurations: Declarative objects that are used to create models. Gretel offers several configuration templates to help you get started with popular use cases such as creating synthetic datasets or anonymizing PII. These configurations are sent to the Gretel REST API to create models. These models can then be used to generate, transform, and classify data. Further information can be found in the Model Configurations page.
These components work together to enable developers to build robust and flexible privacy engineering systems.
Gretel’s CLI tool and Console automate privacy engineering by working directly with Gretel’s REST API. Throughout this documentation, you will see how to achieve these tasks with examples from our Console and CLI. The REST API can always be used directly to create your own custom or more advanced automated systems.
The Gretel Control Plane is responsible for creating and managing projects, models, workflows, and job scheduling. The Control Plane is accessible via our REST API. We also consider other core Gretel components part of the control plane, such as the Gretel Console and Gretel CLI which are both responsible for interacting with the Control Plane API.
The primary object within Gretel that you will be working with is a Project. Projects are like repositories that contain models, workflows, and other associated data. You can invite other users to a project and control their permissions.
The following primitives exist within a Gretel Project:
Project Artifacts: These are datasets that can be uploaded and stored with your project. These artifacts are typically datasets that can be used to create models. Project artifacts can be uploaded by anyone with “write” access to a project. Additionally, project artifacts will be kept with the project until they are explicitly deleted. When using the Gretel Console or CLI you use Gretel Cloud Workers by default, and project artifacts will automatically be created for you from your training data. Project artifacts will have a specific structure. If your training data is called my-data.csv
then an example artifact key might be: gretel_89bdba626464477aaeeef96fc8b2b613_my-data.csv
. This key can be used as a data source for training or running models.
Models: Models are created on source datasets. You configure a model to be created using a Gretel Configuration which allows you to specify a source dataset, model type, and various parameters. You can train a model to generate synthetic data, transform records, or classify records. For each model that is created, the following artifacts are created:
A model archive, which can be referenced to generate, transform, and classify data at scale.
A model report. For synthetic models, this will be the Gretel Synthetic Report. For transforms, this will be a Gretel Transform Report.
Sample data. A small sample of synthesized or transformed data will be created as part of the model creation process.
Model Servers: After a model has been created, you may run that model as many times as you like to generate, transform, and classify new data. The result of the model server will be an output dataset that can be shared or used for your downstream use case.
Uploading project artifacts, model creation, and model server creation can only be done by Project members that have “write” access or higher.
Whether you are utilizing Gretel's managed data plane (Gretel Cloud) or deploying your own data plane (Gretel Hybrid), the Data Plane is responsible for running jobs created via the Gretel Control Plane. The Data Plane consists of two primary components: Gretel Workers that create and run models, and the controller microservices responsible for creating and scheduling Gretel Worker containers. Gretel Workers are containerized applications that are designed to communicate directly with Gretel Cloud. All communications will occur over HTTPS (Port 443) to api.gretel.cloud
. If you are running your own Gretel Data Plane (using Gretel Hybrid), your environment will need open outbound communication with the Control Plane API.
Workers are stateful and will transition through different statuses during their run time. Additionally, during their run time, the workers will periodically check in with Gretel Cloud to transmit usage information (for billing), status updates, generalized run logs, and error / troubleshooting diagnostic information.
When you run your own worker, your training (and possibly sensitive) data will never be sent to Gretel's Control Plane.
A Gretel Worker can exist in one of the following states:
created
- A request for a worker has been made. This is the default state for a worker and will stay in this state until a worker is launched. By default, a user may have up to 10 created workers. This essentially serves as your “queue” for creating or running models.
pending
- This state indicates that the scheduling service has obtained the request and is provisioning a worker for your model or model server.
active
- A worker is creating a model, generating, or processing records. Once a worker is in this state it will begin periodically sending control plane and logging information back to the Gretel Control Plane.
completed
- A worker successfully completed its job. If it was a Gretel Cloud Worker, all model or server artifacts have been uploaded and stored in Gretel Cloud. If using a Gretel Hybrid worker, then all artifacts should have been written to the private location specified when starting the job.
error
- A worker countered an error. Basic error and troubleshooting information should have been sent to the Gretel Control Plane.
cancelled
- A user has cancelled the worker. When a worker is cancelled, the worker will promptly shut down operation and cease all processing.
lost
- A worker will be marked as lost if the Gretel Control Plane has been unable to communicate with the worker after some period of time.
In the event of an error
, cancelled
, or lost
status, a worker cannot recover from this state. A new model or server will have to be created once the underlying issue is fixed.
To create a model, a Gretel worker is launched and will download a configuration from the Gretel Control Plane. Once the configuration is loaded, the worker will obtain the training data and begin creating a synthetic, transform, or classification model.
To run a model, a Gretel worker is launched which we consider a "model server". Depending on the model type, a model server can be used to generate, transform, or classify data.
Workers can be automatically launched for you in Gretel Cloud. This is the default mode when uploading a configuration from the Console or the CLI. In cloud mode, once a request for a model is received, Gretel will provision a worker for you and the model and associated artifacts (such as quality reports, sample data, etc) will also be stored in Gretel Cloud. You may download these artifacts at any time. With a model created and stored in Gretel Cloud, model servers can be created to utilize the model and generate, transform, or classify data.
Gretel configurations are declarative objects that specify how a model should be created. Configurations can be authored in YAML or JSON. To help you get started, we have several Configuration Templates. You may download and edit these templates as necessary or directly reference them when using the CLI (see our tutorials on using the templates directly for model creation). You can also edit configurations directly in the Gretel Console, using the Config Editor.
The configuration file is the primary way to specify how a model can be created. When a model is requested to be created, a copy of this configuration will be sent to the Gretel Control Plane. Regardless of where a Gretel Worker is run, this configuration will be stored in Gretel's Control Plane and associated with the model.
When a Gretel Worker is scheduled (in our cloud or your own environment), it will contact Gretel Cloud and download a copy of the configuration and then start the model creation process.
All Gretel models follow a similar configuration file format structure.
To learn more about the configurations, please see the Model Configurations documentation.
Please see our pricing page for details on our various plans. You can get started completely free with 15 credits on our Developer Plan. The following limits apply:
Maximum Queued Jobs (10). This is the maximum number of jobs that can be in a created
state. If you are using Gretel Cloud workers, these jobs are automatically queued to start. While a worker is in this state, you may delete it or cancel it at any time. When this number is exceeded, API calls will return a 4xx
error when attempting to create new models or model servers.
Maximum Running Workers (4). This is the maximum number of jobs that can be in an active
state. When using Gretel Cloud workers, if this limit is exceeded, Gretel will wait for work to complete and then automatically start a new job from the queue of created
jobs. When running local workers, if the worker starts and the limit is exceeded, the job will be put into an error
state.
Maximum Worker Duration (1 hour). This is the maximum amount of time a worker can be in an active
state either creating or serving a model. If the job exceeds this limit, the job will be put into an error
state.