File size: 347 Bytes
202f039 1e86a7c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
---
license: mit
base_model:
- Qwen/Qwen3-32B
---
FP8-Dynamic quant to support Ampere cards
Use following vllm command to run on 2x 3090
```bash
vllm serve khajaphysist/Qwen3-32B-FP8-Dynamic --enable-reasoning --reasoning-parser deepseek_r1 \
-tp 2 --gpu-memory-utilization 0.99 --disable-log-requests --enforce-eager --max-num-seqs 15
``` |