Commit
·
1b30fcd
1
Parent(s):
59231af
Update vLLM support
Browse files
README.md
CHANGED
|
@@ -194,13 +194,23 @@ You can run the TensorRT-LLM server by following steps:
|
|
| 194 |
|
| 195 |
2. Run server with the configuration
|
| 196 |
```bash
|
| 197 |
-
trtllm-serve serve
|
| 198 |
```
|
| 199 |
|
| 200 |
For more details, please refer to [the documentation](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/exaone) of EXAONE from TensorRT-LLM.
|
| 201 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 202 |
> [!NOTE]
|
| 203 |
-
> Other inference engines including `
|
| 204 |
|
| 205 |
|
| 206 |
## Performance
|
|
|
|
| 194 |
|
| 195 |
2. Run server with the configuration
|
| 196 |
```bash
|
| 197 |
+
trtllm-serve serve LGAI-EXAONE/EXAONE-4.0-32B --backend pytorch --extra_llm_api_options extra_llm_api_config.yaml
|
| 198 |
```
|
| 199 |
|
| 200 |
For more details, please refer to [the documentation](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/exaone) of EXAONE from TensorRT-LLM.
|
| 201 |
|
| 202 |
+
### vLLM
|
| 203 |
+
|
| 204 |
+
vLLM officially supports EXAONE 4.0 models in the version of `0.10.0`. You can run the vLLM server by following command:
|
| 205 |
+
|
| 206 |
+
```bash
|
| 207 |
+
vllm serve LGAI-EXAONE/EXAONE-4.0-32B --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser qwen3
|
| 208 |
+
```
|
| 209 |
+
|
| 210 |
+
For more details, please refer to [the vLLM documentation](https://docs.vllm.ai/en/stable/).
|
| 211 |
+
|
| 212 |
> [!NOTE]
|
| 213 |
+
> Other inference engines including `sglang` don't support the EXAONE 4.0 officially now. We will update as soon as these libraries are updated.
|
| 214 |
|
| 215 |
|
| 216 |
## Performance
|