view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 Sep 11, 2025 • 176
view article Article What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware Aug 8, 2025 • 29
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 207
view article Article Assisted Generation: a new direction toward low-latency text generation May 11, 2023 • 74