Master the complete spectrum of fine-tuning approaches - from full parameter updates to efficient adaptation. Learn when to use each method, understand the trade-offs, and make informed decisions for your specific use case.
In full fine-tuning, every parameter in the model is updated using gradient descent:
LoRA constrains updates to a low-rank subspace, dramatically reducing parameters:
Layer freezing offers a middle ground between full fine-tuning and LoRA - selectively updating only certain parts of the model:
| Strategy | What to Freeze | What to Train | Best For | Memory Savings |
|---|---|---|---|---|
| Conservative | Embeddings + Early layers | Late layers + Output | Similar domain tasks | 30-50% |
| Attention-Only | All FFN layers | All attention layers | Task-specific adaptation | 60-70% |
| Aggressive | First 75% of layers | Final 25% layers | Fine-grained control | 70-85% |
| Selective | Task-dependent analysis | Critical layers only | Expert optimization | Variable |
When models learn new tasks, they can forget previous knowledge. The severity depends on how much of the model you update:
| Model | Full Fine-tuning | Layer Freezing | LoRA (r=32) | Savings |
|---|---|---|---|---|
| LLaMA-2 7B | $120-200 | $60-100 | $20-40 | 80-85% |
| LLaMA-2 13B | $200-350 | $100-180 | $40-70 | 80-85% |
| LLaMA-2 70B | $800-1500 | $400-800 | $150-300 | 80-85% |
| Technique | Approach | Memory | Performance | Complexity |
|---|---|---|---|---|
| Staged Training | LoRA → Full fine-tuning | Medium | Excellent | Medium |
| Adaptive Freezing | Gradual unfreezing during training | Low→High | Excellent | High |
| Mixed Precision LoRA | Different ranks for different layers | Very Low | Good | Medium |
| Dynamic LoRA | Rank adaptation during training | Low | Very Good | High |