models
object to our configuration. The models object takes a list of keyed objects, named by the type of model we wish to train. For a synthetic data model, we use synthetics
. The minimal configuration required is below:data_source
can be any valid URL that is accessible by the client. By default, no extra objects or parameters are required. Gretel uses default settings that will work well for a variety of datasets.params
object contains key-value pairs that represent the available parameters that will be used to train a machine learning model on the data_source
. By default, Gretel will start with 100 epochs and automatically stop training when attributes like model loss and accuracy stop improving.field_delimiter
parameter is a single character that serves as the delimiter between fields in your training data. If this value is null (the default), then Gretel will automatically detect and use a delimiter.in_set_count
: This validator accumulates all of the unique values in a field. If the cardinality of the field’s values is less than or equal to the setting, then the validator will enforce generated values being in the set of training values. If the cardinality of the field’s value is greater than the setting, the validator will have no affect.pattern_count
: This validator builds a pattern mask for each value in a field. Alphanumeric characters are masked, while retaining other special characters. For example, 867-5309
will be masked to ddd-dddd
, and f32-sk-39d
would mask to add-aa-dda
where a represents any A-Za-z character. Much like the previous validator, if the cardinality of learned patterns is less than or equal to the settings, patterns will be enforced during data generation. If the unique pattern count is above the settings, enforcement will be ignored.use_numeric_iqr
: When set to true
, it enables IQR-based validation for all numeric fields. When enabled, it calculates the IQR for values in the field and uses that range to validate generated values. generate.num_records
key in the synthetic config. The default value is 5000 records.generate.max_invalid
settings as well. This is the number of records that can fail the Data Validation process before generation is stopped. This setting helps govern a long running data generation process where the model is not producing optimal data. The default value will be five times the number of records being generated.generate.num_records
value to null. If this value is null, then the number of records generated will be the lesser of 5000 or the total number of records in the training data.