Model Configurations
Last updated
Was this helpful?
Last updated
Was this helpful?
Gretel configurations are declarative objects that specify how a model should be created. Configurations can be authored in YAML or JSON.
All Gretel models follow the same high-level configuration file format structure. All configurations include schema_version
and name
keys, as well as a models
array that is keyed by a [model_id]
. Within the [model_id]
object, all model configurations have a data_source
key.
[model_id]
is replaced with the type of model you wish to train (e.g. navigator_ft
, gpt_x
, actgan
, tabular_dp
, or transform_v2
).
The mapping between Gretel models and configuration model_id
values is:
Tabular Fine-Tuning: navigator_ft
Text Fine-Tuning: gpt_x
Tabular GAN: actgan
Tabular DP: tabular_dp
Transform: transform_v2
data_source
must point to a valid and accessible file in CSV, JSON, or JSONL format.
Supported storage formats include S3, GCS, Azure Blog Storage, HDFS, WebHDFS, HTTP, HTTPS, SFTP, or local filesystem.
Note: Some models have specific data source format requirements
data_source: __tmp__
can be used when the source file is specified elsewhere using:
--in_data
parameter via CLI,
parameter via SDK,
dataset button
via Console.
Each Gretel model has different additional keys within the model_id
object and unique configuration parameters specific to that model. For details on the configuration parameters for each model, see the specific model page: