🎯 Context Length Impact: Training vs Inference

Explore how models trained on long contexts perform on shorter sequences with step-by-step mathematical analysis

🔧 Interactive Model Configuration

Training Context Length:

Inference Context Length:

Model Dimension:

🔄 RoPE Frequency Calculation: Fixed vs Dynamic

Understanding What Changes and What Doesn't

A common question: As conversations get longer, are RoPE frequencies recalculated?

🔑 Answer: NO! RoPE frequencies (θ values) are calculated ONCE based on model dimension and NEVER change during inference.

📚 Step-by-Step Mathematical Analysis

Understanding the Question

We want to understand: If a model is trained on very long contexts, how does it perform on much shorter contexts?

🤔 Key Question: Do the learned parameters (W_Q, W_K, W_V, token embeddings) work well when the positional encodings are completely different due to shorter sequences?

🔍 Mathematical Proof with Concrete Examples

Example Text:

🎯 Context Length Impact: Training vs Inference

🔧 Interactive Model Configuration

🔄 RoPE Frequency Calculation: Fixed vs Dynamic

Understanding What Changes and What Doesn't

📚 Step-by-Step Mathematical Analysis

📚 Step-by-Step Mathematical Analysis

Understanding the Question

🔍 Mathematical Proof with Concrete Examples

📊 Performance Comparison