Gretel Text Fine-Tuning
Model type: Generative pre-trained transformer for text generation
Gretel Text Fine-Tuning simplifies the process of training popular Large Language Models (LLMs) to generate synthetic text. It offers support for differentially private training, ensuring data privacy, and includes automated quality reporting with Gretel's Text Synthetic Quality Score (SQS). This allows you to create labeled examples to train or test other machine learning models, fine-tune the model on your data, or prompt it with examples for inference.
Step configuration
The config below shows all the available training and generation parameters for Text Fine-Tuning. It is best to use Text Fine-Tuning on datasets with only a single column of free text.
If your training dataset is a multi-column format, you MUST set the column_name
when using Text Fine-Tuning.
schema_version: "1.0"
name: default
task:
name: text_ft
config:
train:
pretrained_model: "gretelai/gpt-auto"
prompt_template: null
column_name: null
validation: null
params:
batch_size: 4
epochs: null
steps: 750
weight_decay: 0.01
warmup_steps: 100
lr_scheduler: "linear"
learning_rate: 0.0001
max_tokens: 512
gradient_accumulation_steps: 8
peft_params:
lora_r: 8
lora_alpha_over_r: 1.0
target_modules: null
privacy_params:
dp: false
epsilon: 8.0
delta: "auto"
per_sample_max_grad_norm: 1.0
entity_column_name: null
generate:
num_records: 80
maximum_text_length: 100
Train parameters
pretrained_model
(optional,defaults to meta-llama/Llama-3.1-8B-Instruct
) - Base model used for fine-tuning. These are the models currently supported:gretelai/gpt-auto
- defaults tometa-llama/Meta-Llama-3-8B-Instruct
mistralai/Mistral-7B-Instruct-v0.2
TinyLlama/TinyLlama-1.1B-Chat-v1.0
meta-llama/Llama-3.1-8B-Instruct
column_name
(optional) - Column with text for training if multi-column input. This parameter is required if multi-column input is used.params
- Parameters that control the model training process:batch_size
(optional, default 4) - Batch size per GPU/TPU/CPU. Lower if out of memory.epochs
(optional, default 3) - Number of training epochs.weight_decay
(optional, default 0.01) - Weight decay for AdamW optimizer. 0 to 1.warmup_steps
(optional, default 100) - Warmup steps for linear lr increase.lr_scheduler
(optional, default linear) - Learning rate scheduler type.learning_rate
(optional, default 0.0002) - Initial AdamW learning rate.max_tokens
(optional, default 512) - Max input length in tokens.validation
(optional) - Validation set size. Integer is absolute number of samples.gradient_accumulation_steps
(optional, default 8) - Number of update steps to accumulate the gradients for, before performing a backward/update pass. This technique increases the effective batch size that will fit into GPU memory.
peft_params
- Gretel Text Fine-Tuning uses Low-Rank Adaptation (LoRA) of LLMs, which makes fine-tuning more efficient by drastically reducing the number of trainable parameters by updating weights of smaller matrices through low-rank decomposition.lora_r
(optional, default 8) - Rank of the matrices that are updated. A lower value means fewer trainable model parameters.lora_alpha_over_r
(optional, default 1) - The ratio of the LoRA scaling factor (alpha) to the LoRA rank. Empirically, values of 0.5, 1 or 2 work well.target_modules
(optional, default null) - List of module names or regex expression of the module names to replace with LoRA. When unspecified, modules will be chosen according to the model architecture (e.g. Mistral, Llama).
privacy_params
- To fine tune on a privacy-sensitive data source with differential privacy, use the parameters in this section.dp
(optional, default false) - Flag to turn on differentially private fine tuning when a data source is provided.epsilon
(optional, default 8) - Privacy loss parameter for differential privacy. Specify the maximum value available for model fine tuning.delta
(optional, default auto) - Probability of accidentally leaking information. It is typically set to be much less than1/n
, wheren
is the number of training records. By default,delta
is automatically set based on the characteristics of your dataset to be less than or equal to1/n^1.2
. You can also choose your own value fordelta
. Decreasingdelta
(for example to1/n^2
, which corresponds todelta: 0.000004
for a 500-record training dataset) provides even stronger privacy guarantees, while increasing it may improve synthetic data quality.entity_column_name
(optional, default null) - Column representing unit of privacy. e.g.name
orid
. When null, record-level differential privacy will be maintained, i.e. the final model does not change by much when the input dataset changes by one record. When specified as e.g.user_id
, user-level differential privacy is maintained.
Generate parameters
num_records
(optional, default 80) - Number of output recordsmaximum_text_length
(optional, default 100) - Max tokens per output record
Limitations and Biases
Large-scale language models such as Gretel Text Fine-Tuning may produce untrue and/or offensive content without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.
Last updated
Was this helpful?