Gretel Text Fine-Tuning
Model type: Generative pre-trained transformer for text generation
Gretel Text Fine-Tuning simplifies the process of training popular Large Language Models (LLMs) to generate synthetic text. It offers support for differentially private training, ensuring data privacy, and includes automated quality reporting with Gretel's Text Synthetic Quality Score (SQS). This allows you to create labeled examples to train or test other machine learning models, fine-tune the model on your data, or prompt it with examples for inference.
Step configuration
The config below shows all the available training and generation parameters for Text Fine-Tuning. It is best to use Text Fine-Tuning on datasets with only a single column of free text.
If your training dataset is a multi-column format, you MUST set the column_name
when using Text Fine-Tuning.
Train parameters
pretrained_model
(optional,defaults to meta-llama/Llama-3.1-8B-Instruct
) - Base model used for fine-tuning. These are the models currently supported:gretelai/gpt-auto
- defaults tometa-llama/Meta-Llama-3-8B-Instruct
mistralai/Mistral-7B-Instruct-v0.2
TinyLlama/TinyLlama-1.1B-Chat-v1.0
meta-llama/Llama-3.1-8B-Instruct
column_name
(optional) - Column with text for training if multi-column input. This parameter is required if multi-column input is used.params
- Parameters that control the model training process:batch_size
(optional, default 4) - Batch size per GPU/TPU/CPU. Lower if out of memory.epochs
(optional, default 3) - Number of training epochs.weight_decay
(optional, default 0.01) - Weight decay for AdamW optimizer. 0 to 1.warmup_steps
(optional, default 100) - Warmup steps for linear lr increase.lr_scheduler
(optional, default linear) - Learning rate scheduler type.learning_rate
(optional, default 0.0002) - Initial AdamW learning rate.max_tokens
(optional, default 512) - Max input length in tokens.validation
(optional) - Validation set size. Integer is absolute number of samples.gradient_accumulation_steps
(optional, default 8) - Number of update steps to accumulate the gradients for, before performing a backward/update pass. This technique increases the effective batch size that will fit into GPU memory.
lora_r
(optional, default 8) - Rank of the matrices that are updated. A lower value means fewer trainable model parameters.lora_alpha_over_r
(optional, default 1) - The ratio of the LoRA scaling factor (alpha) to the LoRA rank. Empirically, values of 0.5, 1 or 2 work well.target_modules
(optional, default null) - List of module names or regex expression of the module names to replace with LoRA. When unspecified, modules will be chosen according to the model architecture (e.g. Mistral, Llama).
privacy_params
- To fine tune on a privacy-sensitive data source with differential privacy, use the parameters in this section.dp
(optional, default false) - Flag to turn on differentially private fine tuning when a data source is provided.epsilon
(optional, default 8) - Privacy loss parameter for differential privacy. Specify the maximum value available for model fine tuning.entity_column_name
(optional, default null) - Column representing unit of privacy. e.g.name
orid
. When null, record-level differential privacy will be maintained, i.e. the final model does not change by much when the input dataset changes by one record. When specified as e.g.user_id
, user-level differential privacy is maintained.
Generate parameters
num_records
(optional, default 80) - Number of output recordsmaximum_text_length
(optional, default 100) - Max tokens per output record
Limitations and Biases
Large-scale language models such as Gretel Text Fine-Tuning may produce untrue and/or offensive content without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.
Last updated
Was this helpful?