Gretel Tabular
The gretel_tabular
action can be used to transform multiple tables while preserving referential integrity between those tables. gretel_tabular
also allows specifying different model configs for different tables, and even instructing Gretel to find optimal model configs for your data via Gretel Tuner.
Inputs
project_id
The project to create the model in.
train
(Training details, see following fields)
train.dataset
Data to use for training, including relationships between tables (if applicable). This should be a reference to a dataset output from a previous action.
train.model_config
A yaml object that accepts a few different shapes (detailed below): 1) a complete Gretel model config; 2) a reference to a blueprint or config location (from
); 3) an autotune
configuration.
train.skip_tables
(List of tables to pass through unaltered to outputs, see following fields)
train.skip_tables.table
The name of a table to skip, i.e. omit from model training and pass through unaltered.
train.table_specific_configs
(List of table-specific training details, see following fields)
train.table_specific_configs.tables
A list of table names to which the other fields in this object apply.
train.table_specific_configs.model_config
An alternative to the global default train.model_config
value defined above.
run
(Run details, see following fields)
run.encode_keys
(Transform models only.) Whether to transform primary and foreign key columns. Defaults to false
.
Outputs
dataset
A dataset object containing the outputs from the models created by this action.
Example Configs
Transform a dataset by applying a consistent model to all tables in the dataset. Note that the model config can be specified as a full object...
...or a reference to a blueprint template can be provided via from
:
You can apply different model configs to different tables by supplying table-specific configs:
To pass a subset of tables through unaltered by the model (e.g. for static reference data), specify tables to skip:
Autotune (Gretel Tuner)
Instead of providing a specific model config, you can instruct the gretel_tabular
action to run trials to identify the best model config for each table. This is accomplished via the autotune
option inside model_config
fields (at either the root train
level to apply to all tables, or inside a table_specific_config
to apply to only a subset of tables).
Autotune objects accept the following fields:
enabled
This boolean field must be explicitly set to true
to enable config tuning.
trials_per_table
Optionally specify the number of trials to run for each table. Defaults to 4.
metric
The metric to optimize for. Defaults to synthetic_data_quality_score
; also accepts field_correlation_stability
, field_distribution_stability
, principal_component_stability
.
tuner_config
The specific Gretel Tuner config to use. Like model_config
, this accepts either full configuration objects, or references to blueprints via from
.
Example configs with Autotune
Using all autotune defaults:
By default, gretel_tabular
uses the tuner/tabular-actgan blueprint Tuner config, but a different blueprint can be referenced...
...or a Tuner config can be spelled out explicitly:
Last updated