š Encoder-Only (BERT)
Purpose: Understanding & Classification
Attention: Bidirectional (sees all tokens)
Use Cases: Search, Q&A, Classification
Input: "The cat sat on the mat"
Output: [CLS] vector for classification
Every token sees every other token
Perfect for understanding tasks
š Decoder-Only (GPT)
Purpose: Text Generation
Attention: Causal (only sees previous tokens)
Use Cases: ChatGPT, Code Generation, Writing
Input: "The cat sat on the"
Output: "mat" (next token prediction)
Autoregressive generation
Powers modern chatbots
š Encoder-Decoder (T5)
Purpose: Sequence-to-Sequence
Attention: Cross-attention between sequences
Use Cases: Translation, Summarization
Input: "Hello world" (English)
Output: "Hola mundo" (Spanish)
Encoder understands, decoder generates
Best for translation tasks