Update README.md
Browse files
README.md
CHANGED
|
@@ -300,11 +300,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 300 |
<th>Model</th>
|
| 301 |
<th>Average Cost Reduction</th>
|
| 302 |
<th>Latency (s)</th>
|
| 303 |
-
<th>
|
| 304 |
-
<th>Latency (s)th>
|
| 305 |
-
<th>QPD</th>
|
| 306 |
<th>Latency (s)</th>
|
| 307 |
-
<th>
|
|
|
|
|
|
|
| 308 |
</tr>
|
| 309 |
</thead>
|
| 310 |
<tbody style="text-align: center">
|
|
@@ -404,7 +404,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 404 |
</tbody>
|
| 405 |
</table>
|
| 406 |
|
|
|
|
| 407 |
|
|
|
|
| 408 |
|
| 409 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
| 410 |
|
|
@@ -423,11 +425,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 423 |
<th>Model</th>
|
| 424 |
<th>Average Cost Reduction</th>
|
| 425 |
<th>Maximum throughput (QPS)</th>
|
| 426 |
-
<th>
|
| 427 |
<th>Maximum throughput (QPS)</th>
|
| 428 |
-
<th>
|
| 429 |
<th>Maximum throughput (QPS)</th>
|
| 430 |
-
<th>
|
| 431 |
</tr>
|
| 432 |
</thead>
|
| 433 |
<tbody style="text-align: center">
|
|
@@ -525,4 +527,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
| 525 |
<td>6777</td>
|
| 526 |
</tr>
|
| 527 |
</tbody>
|
| 528 |
-
</table>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 300 |
<th>Model</th>
|
| 301 |
<th>Average Cost Reduction</th>
|
| 302 |
<th>Latency (s)</th>
|
| 303 |
+
<th>Queries Per Dollar</th>
|
|
|
|
|
|
|
| 304 |
<th>Latency (s)</th>
|
| 305 |
+
<th>Queries Per Dollar</th>
|
| 306 |
+
<th>Latency (s)</th>
|
| 307 |
+
<th>Queries Per Dollar</th>
|
| 308 |
</tr>
|
| 309 |
</thead>
|
| 310 |
<tbody style="text-align: center">
|
|
|
|
| 404 |
</tbody>
|
| 405 |
</table>
|
| 406 |
|
| 407 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
| 408 |
|
| 409 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|
| 410 |
|
| 411 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
| 412 |
|
|
|
|
| 425 |
<th>Model</th>
|
| 426 |
<th>Average Cost Reduction</th>
|
| 427 |
<th>Maximum throughput (QPS)</th>
|
| 428 |
+
<th>Queries Per Dollar</th>
|
| 429 |
<th>Maximum throughput (QPS)</th>
|
| 430 |
+
<th>Queries Per Dollar</th>
|
| 431 |
<th>Maximum throughput (QPS)</th>
|
| 432 |
+
<th>Queries Per Dollar</th>
|
| 433 |
</tr>
|
| 434 |
</thead>
|
| 435 |
<tbody style="text-align: center">
|
|
|
|
| 527 |
<td>6777</td>
|
| 528 |
</tr>
|
| 529 |
</tbody>
|
| 530 |
+
</table>
|
| 531 |
+
|
| 532 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
| 533 |
+
|
| 534 |
+
**QPS: Queries per second.
|
| 535 |
+
|
| 536 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|