AXCXEPT commited on
Commit
cf270dd
·
verified ·
1 Parent(s): 0241243

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -1
README.md CHANGED
@@ -98,7 +98,7 @@ All these systems support the OpenAI Chat Completions API format, ensuring smoot
98
  vllm serve AXCXEPT/QwQ-32B-Distill-Qwen-1.5B-Alpha --max-model-len 32768 --enforce-eager
99
  ```
100
 
101
- ### Call API:
102
  ```python
103
  from openai import OpenAI
104
  client = OpenAI(
@@ -117,6 +117,79 @@ completion = client.chat.completions.create(
117
  print(completion.choices[0].message)
118
  ```
119
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  ## License
121
 
122
  This project is released under the **MIT License**, reflecting our commitment to open and accessible AI. We firmly believe that cutting-edge AI research should be available for anyone to use, modify, and build upon.
 
98
  vllm serve AXCXEPT/QwQ-32B-Distill-Qwen-1.5B-Alpha --max-model-len 32768 --enforce-eager
99
  ```
100
 
101
+ ### Call API Without Streaming:
102
  ```python
103
  from openai import OpenAI
104
  client = OpenAI(
 
117
  print(completion.choices[0].message)
118
  ```
119
 
120
+ ### Call API With Streaming:
121
+ ```python
122
+ # SPDX-License-Identifier: Apache-2.0
123
+ """
124
+ An example shows how to generate chat completions from reasoning models
125
+ like DeepSeekR1.
126
+
127
+ To run this example, you need to start the vLLM server with the reasoning
128
+ parser:
129
+
130
+ ```bash
131
+ vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
132
+ --enable-reasoning --reasoning-parser deepseek_r1
133
+ ```
134
+
135
+ Unlike openai_chat_completion_with_reasoning.py, this example demonstrates the
136
+ streaming chat completions feature.
137
+
138
+ The streaming chat completions feature allows you to receive chat completions
139
+ in real-time as they are generated by the model. This is useful for scenarios
140
+ where you want to display chat completions to the user as they are generated
141
+ by the model.
142
+
143
+ Remember to check content and reasoning_content exist in `ChatCompletionChunk`,
144
+ content may not exist leading to errors if you try to access it.
145
+ """
146
+
147
+ from openai import OpenAI
148
+
149
+ # Modify OpenAI's API key and API base to use vLLM's API server.
150
+ openai_api_key = "EMPTY"
151
+ openai_api_base = "http://localhost:8000/v1"
152
+
153
+ client = OpenAI(
154
+ api_key=openai_api_key,
155
+ base_url=openai_api_base,
156
+ )
157
+
158
+ models = client.models.list()
159
+ model = models.data[0].id
160
+
161
+ messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
162
+ # For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
163
+ stream = client.chat.completions.create(model=model,
164
+ messages=messages,
165
+ stream=True)
166
+
167
+ print("client: Start streaming chat completions...")
168
+ printed_reasoning_content = False
169
+ printed_content = False
170
+
171
+ for chunk in stream:
172
+ reasoning_content = None
173
+ content = None
174
+ # Check the content is reasoning_content or content
175
+ if hasattr(chunk.choices[0].delta, "reasoning_content"):
176
+ reasoning_content = chunk.choices[0].delta.reasoning_content
177
+ elif hasattr(chunk.choices[0].delta, "content"):
178
+ content = chunk.choices[0].delta.content
179
+
180
+ if reasoning_content is not None:
181
+ if not printed_reasoning_content:
182
+ printed_reasoning_content = True
183
+ print("reasoning_content:", end="", flush=True)
184
+ print(reasoning_content, end="", flush=True)
185
+ elif content is not None:
186
+ if not printed_content:
187
+ printed_content = True
188
+ print("\ncontent:", end="", flush=True)
189
+ # Extract and print the content
190
+ print(content, end="", flush=True)
191
+ ```
192
+
193
  ## License
194
 
195
  This project is released under the **MIT License**, reflecting our commitment to open and accessible AI. We firmly believe that cutting-edge AI research should be available for anyone to use, modify, and build upon.