File size: 347 Bytes
202f039
 
 
 
 
 
1e86a7c
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
---
license: mit
base_model:
- Qwen/Qwen3-32B
---

FP8-Dynamic quant to support Ampere cards

Use following vllm command to run on 2x 3090

```bash
vllm serve khajaphysist/Qwen3-32B-FP8-Dynamic --enable-reasoning --reasoning-parser deepseek_r1 \
     -tp 2 --gpu-memory-utilization 0.99 --disable-log-requests --enforce-eager --max-num-seqs 15
```