NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published Aug 20 • 38
FFN Fusion: Rethinking Sequential Computation in Large Language Models Paper • 2503.18908 • Published Mar 24 • 19
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published Nov 28, 2024 • 17