view article Article Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers? Apr 4 • 15
view article Article Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) Jan 19 • 38
view article Article 🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It? Mar 17 • 348
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 123
Extending Context Window of Large Language Models via Positional Interpolation Paper • 2306.15595 • Published Jun 27, 2023 • 53