Update README.md
Browse files
README.md
CHANGED
|
@@ -199,6 +199,30 @@ pip uninstall vllm
|
|
| 199 |
pip install git+https://github.com/geyuying/vllm.git@arc-qwen-video
|
| 200 |
```
|
| 201 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 202 |
### Inference
|
| 203 |
|
| 204 |
```bash
|
|
|
|
| 199 |
pip install git+https://github.com/geyuying/vllm.git@arc-qwen-video
|
| 200 |
```
|
| 201 |
|
| 202 |
+
#### An 'Ugly' Workaround for vLLM Installation
|
| 203 |
+
If you are unable to install our provided vllm package, we offer an alternative "ugly" method:
|
| 204 |
+
|
| 205 |
+
1. Install vllm with Qwen2.5-VL support.
|
| 206 |
+
|
| 207 |
+
2. Modify config.json. In your model weights directory, open config.json and change the architectures field to "Qwen2_5_VLForConditionalGeneration".
|
| 208 |
+
|
| 209 |
+
3. Patch the vllm source code. Locate the file vllm/model_executor/models/qwen2_5_vl.py in your vllm installation path. Add the following code inside the __init__ method of the Qwen2_5_VLForConditionalGeneration class:
|
| 210 |
+
|
| 211 |
+
```
|
| 212 |
+
whisper_path = 'openai/whisper-large-v3'
|
| 213 |
+
speech_encoder = WhisperModel.from_pretrained(whisper_path).encoder
|
| 214 |
+
self.speech_encoder = speech_encoder
|
| 215 |
+
speech_dim = speech_encoder.config.d_model
|
| 216 |
+
llm_hidden_size = config.vision_config.out_hidden_size
|
| 217 |
+
self.mlp_speech = nn.Sequential(
|
| 218 |
+
nn.LayerNorm(speech_dim),
|
| 219 |
+
nn.Linear(speech_dim, llm_hidden_size),
|
| 220 |
+
nn.GELU(),
|
| 221 |
+
nn.Linear(llm_hidden_size, llm_hidden_size)
|
| 222 |
+
)
|
| 223 |
+
```
|
| 224 |
+
**Why this works**: Our model is based on the Qwen-VL-2.5 architecture, with the addition of an audio encoder and a corresponding MLP. During vllm inference, the multi-modal encoder processes inputs sequentially, while the LLM performs batch inference. Since we only need to pass the final multi-modal embeddings to the LLM, we can reuse the existing code for Qwen-VL-2.5.
|
| 225 |
+
|
| 226 |
### Inference
|
| 227 |
|
| 228 |
```bash
|