Running this command will trigger the following actions automatically:
The configuration will be sent to Gretel Cloud and a model creation job will be requested
The CLI will start a local Gretel Worker that will will download the configuration from Gretel Cloud
The Gretel Worker will create the model and generate model artifacts to the transform-modeldirectory.
When the model is created, you should see logging output that provides the Model ID. You will need this Model ID when serving models to transform records. Since you are running in your own environment you will also need the path to the model.tar.gz artifact that gets created in the output directory.
As part of creating a model, a data preview is created for a quick look at transformed records, for this example we can take a peak at our transformed records with:
gunzip -c transform-model/data_preview.gz | cat | head
Compare the sample transformed data with the original data:
curl https://gretel-public-website.s3-us-west-2.amazonaws.com/datasets/customer-orders.csv | head
Now that a Transform Model has been created, we will look at how to use this model to do full dataset transformations.
Transforming data at scale
Now that you have created a transform model. You can serve that model as many times as you like to transform records at scale. Next, we'll use the model we just created and transform all of the records from the same sample file.
You should have the Model ID and access to the model.tar.gz model archive from the previous model creation step.
To serve the model, we run the following command (replace the Model ID!):
gretel records transform --runner local --model-path transform-model/model.tar.gz --in-data https://gretel-public-website.s3-us-west-2.amazonaws.com/datasets/customer-orders.csv --output transformed-data --model-id 60ba8a401fae93eff9d35dc1
You should see the worker start up, create a handler for serving the model and begin transforming the records. Once the job is complete, your transformed data should be sitting in the transformed-data (or whatever directory you specified).