LLM general - a tgangs Collection

tgangs 's Collections

Agents

LLM general

updated Sep 11

XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Paper • 2508.10395 • Published Aug 14 • 42
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Paper • 2508.09834 • Published Aug 13 • 53
Causal Attention with Lookahead Keys

Paper • 2509.07301 • Published Sep 9 • 21