view article Article Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques By jmamou and 8 others • Mar 24 • 20
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published May 23, 2024 • 18