Update README.md
Browse files
README.md
CHANGED
|
@@ -98,7 +98,7 @@ All these systems support the OpenAI Chat Completions API format, ensuring smoot
|
|
| 98 |
vllm serve AXCXEPT/QwQ-32B-Distill-Qwen-1.5B-Alpha --max-model-len 32768 --enforce-eager
|
| 99 |
```
|
| 100 |
|
| 101 |
-
### Call API:
|
| 102 |
```python
|
| 103 |
from openai import OpenAI
|
| 104 |
client = OpenAI(
|
|
@@ -117,6 +117,79 @@ completion = client.chat.completions.create(
|
|
| 117 |
print(completion.choices[0].message)
|
| 118 |
```
|
| 119 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
## License
|
| 121 |
|
| 122 |
This project is released under the **MIT License**, reflecting our commitment to open and accessible AI. We firmly believe that cutting-edge AI research should be available for anyone to use, modify, and build upon.
|
|
|
|
| 98 |
vllm serve AXCXEPT/QwQ-32B-Distill-Qwen-1.5B-Alpha --max-model-len 32768 --enforce-eager
|
| 99 |
```
|
| 100 |
|
| 101 |
+
### Call API Without Streaming:
|
| 102 |
```python
|
| 103 |
from openai import OpenAI
|
| 104 |
client = OpenAI(
|
|
|
|
| 117 |
print(completion.choices[0].message)
|
| 118 |
```
|
| 119 |
|
| 120 |
+
### Call API With Streaming:
|
| 121 |
+
```python
|
| 122 |
+
# SPDX-License-Identifier: Apache-2.0
|
| 123 |
+
"""
|
| 124 |
+
An example shows how to generate chat completions from reasoning models
|
| 125 |
+
like DeepSeekR1.
|
| 126 |
+
|
| 127 |
+
To run this example, you need to start the vLLM server with the reasoning
|
| 128 |
+
parser:
|
| 129 |
+
|
| 130 |
+
```bash
|
| 131 |
+
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
|
| 132 |
+
--enable-reasoning --reasoning-parser deepseek_r1
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
Unlike openai_chat_completion_with_reasoning.py, this example demonstrates the
|
| 136 |
+
streaming chat completions feature.
|
| 137 |
+
|
| 138 |
+
The streaming chat completions feature allows you to receive chat completions
|
| 139 |
+
in real-time as they are generated by the model. This is useful for scenarios
|
| 140 |
+
where you want to display chat completions to the user as they are generated
|
| 141 |
+
by the model.
|
| 142 |
+
|
| 143 |
+
Remember to check content and reasoning_content exist in `ChatCompletionChunk`,
|
| 144 |
+
content may not exist leading to errors if you try to access it.
|
| 145 |
+
"""
|
| 146 |
+
|
| 147 |
+
from openai import OpenAI
|
| 148 |
+
|
| 149 |
+
# Modify OpenAI's API key and API base to use vLLM's API server.
|
| 150 |
+
openai_api_key = "EMPTY"
|
| 151 |
+
openai_api_base = "http://localhost:8000/v1"
|
| 152 |
+
|
| 153 |
+
client = OpenAI(
|
| 154 |
+
api_key=openai_api_key,
|
| 155 |
+
base_url=openai_api_base,
|
| 156 |
+
)
|
| 157 |
+
|
| 158 |
+
models = client.models.list()
|
| 159 |
+
model = models.data[0].id
|
| 160 |
+
|
| 161 |
+
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
|
| 162 |
+
# For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
|
| 163 |
+
stream = client.chat.completions.create(model=model,
|
| 164 |
+
messages=messages,
|
| 165 |
+
stream=True)
|
| 166 |
+
|
| 167 |
+
print("client: Start streaming chat completions...")
|
| 168 |
+
printed_reasoning_content = False
|
| 169 |
+
printed_content = False
|
| 170 |
+
|
| 171 |
+
for chunk in stream:
|
| 172 |
+
reasoning_content = None
|
| 173 |
+
content = None
|
| 174 |
+
# Check the content is reasoning_content or content
|
| 175 |
+
if hasattr(chunk.choices[0].delta, "reasoning_content"):
|
| 176 |
+
reasoning_content = chunk.choices[0].delta.reasoning_content
|
| 177 |
+
elif hasattr(chunk.choices[0].delta, "content"):
|
| 178 |
+
content = chunk.choices[0].delta.content
|
| 179 |
+
|
| 180 |
+
if reasoning_content is not None:
|
| 181 |
+
if not printed_reasoning_content:
|
| 182 |
+
printed_reasoning_content = True
|
| 183 |
+
print("reasoning_content:", end="", flush=True)
|
| 184 |
+
print(reasoning_content, end="", flush=True)
|
| 185 |
+
elif content is not None:
|
| 186 |
+
if not printed_content:
|
| 187 |
+
printed_content = True
|
| 188 |
+
print("\ncontent:", end="", flush=True)
|
| 189 |
+
# Extract and print the content
|
| 190 |
+
print(content, end="", flush=True)
|
| 191 |
+
```
|
| 192 |
+
|
| 193 |
## License
|
| 194 |
|
| 195 |
This project is released under the **MIT License**, reflecting our commitment to open and accessible AI. We firmly believe that cutting-edge AI research should be available for anyone to use, modify, and build upon.
|