Spaces:
Runtime error
Runtime error
Update app.py
#3
by
liuhaotian
- opened
app.py
CHANGED
|
@@ -342,12 +342,13 @@ title_markdown = """
|
|
| 342 |
|
| 343 |
ONLY WORKS WITH GPU!
|
| 344 |
|
| 345 |
-
You can load the model with
|
|
|
|
| 346 |
|
| 347 |
Recommended configurations:
|
| 348 |
-
| Hardware |
|
| 349 |
-
|
| 350 |
-
| **Bits** |
|
| 351 |
|
| 352 |
"""
|
| 353 |
|
|
|
|
| 342 |
|
| 343 |
ONLY WORKS WITH GPU!
|
| 344 |
|
| 345 |
+
You can load the model with 4-bit or 8-bit quantization to make it fit in smaller hardwares. Setting the environment variable `bits` to control the quantization.
|
| 346 |
+
*Note: 8-bit seems to be slower than both 4-bit/16-bit. Although it has enough VRAM to support 8-bit, until we figure out the inference speed issue, we recommend 4-bit for A10G for the best efficiency.*
|
| 347 |
|
| 348 |
Recommended configurations:
|
| 349 |
+
| Hardware | T4-Small (16G) | A10G-Small (24G) | A100-Large (40G) |
|
| 350 |
+
|-------------------|-----------------|------------------|------------------|
|
| 351 |
+
| **Bits** | 4 (default) | 4 | 16 |
|
| 352 |
|
| 353 |
"""
|
| 354 |
|