⚡ Complete Attention Mechanism

Interactive step-by-step walkthrough of how Q, K, V matrices work together in transformer attention

🔧 Interactive Configuration

Example Text:

Head Dimension:

Show Step:

🎯 Understanding Each Step

The Complete Attention Formula

Attention(Q, K, V) = softmax(Q × K^T / √d_k) × V

Step 1: Q × K^T → Raw attention scores
Step 2: / √d_k → Scale for stability
Step 3: softmax() → Convert to probabilities
Step 4: × V → Apply to actual content

🔍 Interactive Matrix Explorer

🎯 Key Question: How does each matrix contribute to the final output?