vllm troughput is so low on H100??? anyone else is facing the same issue?
#2
by
sanak
- opened
INFO 10-09 10:55:56 [loggers.py:127] Engine 000: Avg prompt throughput: 395.1 tokens/s, Avg generation throughput: 2.9 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.9%, Prefix cache hit rate: 20.5%
sanak
changed discussion title from
vllm troughput is so low??? anyone else is facing the same issue?
to vllm troughput is so low on H100??? anyone else is facing the same issue?
sanak
changed discussion status to
closed
Same. how did you fix that?