AIML-TUDA
/

LlavaGuard-v1.2-7B-OV-hf

@@ -36,54 +36,60 @@ LlavaGuard-v1.2-7B-OV is trained on [LlavaGuard-DS](https://huggingface.co/datas
 ## Model Compatability
-- Inference: SGLang❌, LLaVA [repo](https://github.com/LLaVA-VL/LLaVA-NeXT)❌, HF Tranformers✅
 - Model Tuning:❌
 ## Overview
 We here provide the transformers converted weights for LlavaGuard v1.2 7B.
 It builds upon LLaVA-OneVision 7B and has achieved the best overall performance so far with improved reasoning capabilities within the rationales.
-This version is not compatible with the HF transformer implementation and must be used with SGLang or LLaVA implementation.
-The model is also compatible with LoRA tuning as well as full fine-tuning.
-For tuning, you can adopt and use the training scripts provided in our repository (see [ml-research/LlavaGuard](https://github.com/ml-research/LlavaGuard)).
-A suitable docker image can be found at our Github repo, too.
-#### Usage
-# 0. Install requirements
-For inference, you use the following [sglang docker](https://github.com/sgl-project/sglang/blob/main/docker/Dockerfile) and proceed with step 1.
-Otherwise, you can also install sglang via pip or from source [see here](https://github.com/sgl-project/sglang).
-# 1. Select a model and start an SGLang server
-    CUDA_VISIBLE_DEVICES=0 python3 -m sglang.launch_server --model-path AIML-TUDA/LlavaGuard-v1.2-7B-OV --port 10000
-# 2. Model Inference
 For model inference, you can access this server by running the code provided below, e.g.
 `python my_script.py`
 ```Python
-import sglang as sgl
-from sglang import RuntimeEndpoint
-@sgl.function
-def guard_gen(s, image_path, prompt):
-    s += sgl.user(sgl.image(image_path) + prompt)
-    hyperparameters = {
-        'temperature': 0.2,
-        'top_p': 0.95,
-        'top_k': 50,
-        'max_tokens': 500,
-    }
-    s += sgl.assistant(sgl.gen("json_output", **hyperparameters))
-im_path = 'path/to/your/image'
-prompt = safety_taxonomy_below
-backend = RuntimeEndpoint(f"http://localhost:10000")
-sgl.set_default_backend(backend)
-out = guard_gen.run(image_path=im_path, prompt=prompt)
-print(out['json_output'])
 ```
 ## Safety Taxonomy
 Our default policy prompt looks like this:

 ## Model Compatability
+- Inference: HF Tranformers✅, SGLang❌, LLaVA [repo](https://github.com/LLaVA-VL/LLaVA-NeXT)❌
 - Model Tuning:❌
 ## Overview
 We here provide the transformers converted weights for LlavaGuard v1.2 7B.
 It builds upon LLaVA-OneVision 7B and has achieved the best overall performance so far with improved reasoning capabilities within the rationales.
+#### Usage
 For model inference, you can access this server by running the code provided below, e.g.
 `python my_script.py`
 ```Python
+from transformers import AutoProcessor, LlavaForConditionalGeneration
+from PIL import Image
+import requests
+model = LlavaForConditionalGeneration.from_pretrained('AIML-TUDA/LlavaGuard-v1.2-7B-OV-hf')
+processor = AutoProcessor.from_pretrained('AIML-TUDA/LlavaGuard-v1.2-7B-OV-hf')
+conversation = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image"},
+            {"type": "text", "text": policy},
+            ],
+    },
+]
+text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
+url = "https://www.ilankelman.org/stopsigns/australia.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+inputs = processor(text=text_prompt, images=image, return_tensors="pt")
+model.to('cuda:0')
+inputs = {k: v.to('cuda:0') for k, v in inputs.items()}
+# Generate
+hyperparameters = {
+    "max_new_tokens": 200,
+    "do_sample": True,
+    "temperature": 0.2,
+    "top_p": 0.95,
+    "top_k": 50,
+    "num_beams": 2,
+    "use_cache": True,
+}
+output = model.generate(**inputs, **hyperparameters)
+print(processor.decode(output[0], skip_special_tokens=True))
 ```
 ## Safety Taxonomy
 Our default policy prompt looks like this: