Master the complete spectrum of fine-tuning approaches - from full parameter updates to efficient adaptation. Learn when to use each method, understand the trade-offs, and make informed decisions for your specific use case.
In full fine-tuning, every parameter in the model is updated using gradient descent:
LoRA constrains updates to a low-rank subspace, dramatically reducing parameters:
Layer freezing offers a middle ground between full fine-tuning and LoRA - selectively updating only certain parts of the model:
Strategy | What to Freeze | What to Train | Best For | Memory Savings |
---|---|---|---|---|
Conservative | Embeddings + Early layers | Late layers + Output | Similar domain tasks | 30-50% |
Attention-Only | All FFN layers | All attention layers | Task-specific adaptation | 60-70% |
Aggressive | First 75% of layers | Final 25% layers | Fine-grained control | 70-85% |
Selective | Task-dependent analysis | Critical layers only | Expert optimization | Variable |
When models learn new tasks, they can forget previous knowledge. The severity depends on how much of the model you update:
Model | Full Fine-tuning | Layer Freezing | LoRA (r=32) | Savings |
---|---|---|---|---|
LLaMA-2 7B | $120-200 | $60-100 | $20-40 | 80-85% |
LLaMA-2 13B | $200-350 | $100-180 | $40-70 | 80-85% |
LLaMA-2 70B | $800-1500 | $400-800 | $150-300 | 80-85% |
Technique | Approach | Memory | Performance | Complexity |
---|---|---|---|---|
Staged Training | LoRA → Full fine-tuning | Medium | Excellent | Medium |
Adaptive Freezing | Gradual unfreezing during training | Low→High | Excellent | High |
Mixed Precision LoRA | Different ranks for different layers | Very Low | Good | Medium |
Dynamic LoRA | Rank adaptation during training | Low | Very Good | High |