QKV Decomposition for Transformer XAI Paper · 2026
Diagnose transformer prediction failures from weights alone, then correct them by retraining one layer. GPT-2 capital-city accuracy 2/8 → 8/8 with zero side effects, achievable through any of attention, FFN, or V-only (590K params).