Update pipeline example
Browse files
README.md
CHANGED
|
@@ -9,7 +9,6 @@ tags:
|
|
| 9 |
datasets:
|
| 10 |
- lmms-lab/LLaVA-OneVision-Data
|
| 11 |
pipeline_tag: image-text-to-text
|
| 12 |
-
inference: false
|
| 13 |
arxiv: 2408.03326
|
| 14 |
---
|
| 15 |
# LLaVA-Onevision Model Card
|
|
@@ -53,38 +52,26 @@ The model supports multi-image and multi-prompt generation. Meaning that you can
|
|
| 53 |
### Using `pipeline`:
|
| 54 |
|
| 55 |
Below we used [`"llava-hf/llava-onevision-qwen2-7b-ov-hf"`](https://huggingface.co/llava-hf/llava-onevision-qwen2-7b-ov-hf) checkpoint.
|
| 56 |
-
|
| 57 |
```python
|
| 58 |
-
from transformers import pipeline
|
| 59 |
-
from PIL import Image
|
| 60 |
-
import requests
|
| 61 |
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
processor = AutoProcessor.from_pretrained(model_id)
|
| 65 |
-
|
| 66 |
-
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
|
| 67 |
-
image = Image.open(requests.get(url, stream=True).raw)
|
| 68 |
-
|
| 69 |
-
# Define a chat history and use `apply_chat_template` to get correctly formatted prompt
|
| 70 |
-
# Each value in "content" has to be a list of dicts with types ("text", "image")
|
| 71 |
-
conversation = [
|
| 72 |
{
|
| 73 |
-
|
| 74 |
"role": "user",
|
| 75 |
"content": [
|
|
|
|
| 76 |
{"type": "text", "text": "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"},
|
| 77 |
-
{"type": "image"},
|
| 78 |
],
|
| 79 |
},
|
| 80 |
]
|
| 81 |
-
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
|
| 82 |
|
| 83 |
-
|
| 84 |
-
print(
|
| 85 |
-
>>> {
|
| 86 |
```
|
| 87 |
|
|
|
|
| 88 |
### Using pure `transformers`:
|
| 89 |
|
| 90 |
Below is an example script to run generation in `float16` precision on a GPU device:
|
|
|
|
| 9 |
datasets:
|
| 10 |
- lmms-lab/LLaVA-OneVision-Data
|
| 11 |
pipeline_tag: image-text-to-text
|
|
|
|
| 12 |
arxiv: 2408.03326
|
| 13 |
---
|
| 14 |
# LLaVA-Onevision Model Card
|
|
|
|
| 52 |
### Using `pipeline`:
|
| 53 |
|
| 54 |
Below we used [`"llava-hf/llava-onevision-qwen2-7b-ov-hf"`](https://huggingface.co/llava-hf/llava-onevision-qwen2-7b-ov-hf) checkpoint.
|
|
|
|
| 55 |
```python
|
| 56 |
+
from transformers import pipeline
|
|
|
|
|
|
|
| 57 |
|
| 58 |
+
pipe = pipeline("image-text-to-text", model="llava-onevision-qwen2-0.5b-ov-hf")
|
| 59 |
+
messages = [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
{
|
|
|
|
| 61 |
"role": "user",
|
| 62 |
"content": [
|
| 63 |
+
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"},
|
| 64 |
{"type": "text", "text": "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"},
|
|
|
|
| 65 |
],
|
| 66 |
},
|
| 67 |
]
|
|
|
|
| 68 |
|
| 69 |
+
out = pipe(text=messages, max_new_tokens=20)
|
| 70 |
+
print(out)
|
| 71 |
+
>>> [{'input_text': [{'role': 'user', 'content': [{'type': 'image', 'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg'}, {'type': 'text', 'text': 'What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud'}]}], 'generated_text': 'Lava'}]
|
| 72 |
```
|
| 73 |
|
| 74 |
+
|
| 75 |
### Using pure `transformers`:
|
| 76 |
|
| 77 |
Below is an example script to run generation in `float16` precision on a GPU device:
|