Learn how to connect Gretel with your existing data sources.
Gretel Connectors may be used to make it easier to connect your existing data sources with Gretel.
Each Connector consists of a source, a sink, and a model config. The connector reads raw data from the source, passes it to a Gretel Cloud or Local worker to transform each record (if using a Gretel Transform config) or synthesize new records based on it (if using a Gretel Synthetics config), and writes the result to the sink.
Gretel connectors are configured using YAML. A config consists of atleast one source, one sink and a connector that configures a source and sink together into a pipeline.
- name: my_s3_source
- name: my_s3_sink
- name: default
sinksproperties in a config define where data should be read from and written to. Source data will be processed through a Gretel Model and then the results written to a sink.
A source or sink definition takes the form of
type: integration_name # eg s3
name- The name of the source or sink. This name must be unique to the connector config.
config- Based on the
type, any integration specific configuration should be defined here. Please refer to each integration's specific documentation for more details.
connectorsmap is used to define a pipeline. Each connector config must define a
version- Specifies the connector version to run. Valid options include:
source- Specifies the source to read data from.
sink- Specifies the destination to write results back to.
max_active- Determines the max number of active jobs the connector will launch. This setting can be used to manage pipeline throughput.
When an existing model id is configured, that model will be re-run for every new data source in the pipeline. If a model config is specified, then a new model will be created or trained for every new data source in the pipeline.
In the example below, we configure a model using a model configuration provided by Gretel.
Model configurations can also be passed using either
https://...urls. For example
If you have an existing pre-trained model, you can pass that model's id instead.
Connectors are shipped as Docker containers and may be ran via the Gretel CLI, or deployed into existing container orchestration platforms such as Kubernetes.
gretel connectors start --config my_config.yaml
Please note, that in order to run the connector from the CLI, the host must have access to a running docker daemon.
For a complete list of available params you may run
gretel connectors start --help.