Model Configurations

Gretel configurations are declarative objects that specify how a model should be created. Configurations can be authored in YAML or JSON.

Gretel has configuration templates that may be helpful as starting points for creating your model.

All Gretel models follow the same high-level configuration file format structure. All configurations include schema_version and name keys, as well as a models array that is keyed by a [model_id]. Within the [model_id] object, all model configurations have a data_source key.

schema_version: "1.0"
name: "my-model"

models:
  - [model_id]:
      data_source: __tmp__
  • [model_id] is replaced with the type of model you wish to train (e.g. synthetics, gpt_x, actgan, timeseries_dgan, amplify, transform, classify).

  • data_source must point to a valid and accessible file in CSV, JSON, or JSONL format.

    • Supported storage formats include S3, GCS, Azure Blog Storage, HDFS, WebHDFS, HTTP, HTTPS, SFTP, or local filesystem.

      • Note: Some models have specific data source format requirements

    • data_source: __tmp__ can be used when the source file is specified elsewhere using:

      • --in_data parameter via CLI,

      • parameter via SDK,

      • dataset button via Console.

Each Gretel model have different additional keys within the model_id object and unique configuration parameters specific to that model. For details on the configuration parameters for each model, see the specific model page:

Last updated