vllm troughput is so low on H100??? anyone else is facing the same issue?

by sanak - opened Oct 9

Oct 9

INFO 10-09 10:55:56 [loggers.py:127] Engine 000: Avg prompt throughput: 395.1 tokens/s, Avg generation throughput: 2.9 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.9%, Prefix cache hit rate: 20.5%

sanak changed discussion title from vllm troughput is so low??? anyone else is facing the same issue? to vllm troughput is so low on H100??? anyone else is facing the same issue? Oct 9

sanak changed discussion status to closed Oct 9

zfeng13

Oct 15

Same. how did you fix that?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment