vllm troughput is so low on H100??? anyone else is facing the same issue?

#2
by sanak - opened

INFO 10-09 10:55:56 [loggers.py:127] Engine 000: Avg prompt throughput: 395.1 tokens/s, Avg generation throughput: 2.9 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.9%, Prefix cache hit rate: 20.5%

sanak changed discussion title from vllm troughput is so low??? anyone else is facing the same issue? to vllm troughput is so low on H100??? anyone else is facing the same issue?
sanak changed discussion status to closed

Same. how did you fix that?

Sign up or log in to comment