Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper • 2506.19697 • Published Jun 24 • 44
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning Paper • 2509.23873 • Published about 1 month ago • 67
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published 28 days ago • 39
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights Paper • 2509.22944 • Published Sep 26 • 76
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published 15 days ago • 168
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published 6 days ago • 58