Instructions to use prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX") model = AutoModelForImageTextToText.from_pretrained("prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX
- SGLang
How to use prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX with Docker Model Runner:
docker model run hf.co/prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX
Qwen3-VL-8B-Thinking-Unredacted-MAX
Qwen3-VL-8B-Thinking-Unredacted-MAX is an optimized release built on top of huihui-ai/Huihui-Qwen3-VL-8B-Thinking-abliterated. This version focuses on stable inference behavior, improved packaging consistency, and updated Transformers compatibility, while preserving the strong multimodal reasoning and “thinking” capabilities of the base architecture. The result is a capable 8B vision-language model designed for structured reasoning, captioning, and research-oriented multimodal workflows.
Key Highlights
Optimized Release Structure Improved repository organization for smoother deployment and reproducible loading.
Modern Transformers Compatibility Updated to work reliably with recent Hugging Face Transformers and multimodal processing pipelines.
8B Thinking Vision-Language Architecture Built on Qwen3-VL-8B-Thinking, enabling stronger step-by-step visual reasoning compared to standard instruct variants.
Stable Multimodal Reasoning Improved consistency for image interpretation, captioning, and structured output generation.
High-Fidelity Caption Generation Produces detailed, structured descriptions suitable for dataset creation, annotation, and accessibility use cases.
Dynamic Resolution Support Retains native support for varying image resolutions and aspect ratios.
Base Model Signatures
This model has been re-sharded and optimized for the latest Transformers version from the base model: https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-8B-Thinking-abliterated
Quick Start with Transformers
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch
model = Qwen3VLForConditionalGeneration.from_pretrained(
"prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX"
)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Provide a detailed caption for this image."},
],
}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=256)
output_text = processor.batch_decode(
[out[len(inp):] for inp, out in zip(inputs.input_ids, generated_ids)],
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(output_text)
Intended Use
- Multimodal reasoning research and evaluation
- Image captioning and dataset annotation pipelines
- Vision-language model benchmarking and robustness testing
- Creative visual storytelling and structured description generation
- Prototyping AI systems that combine reasoning with image understanding
Limitations & Risks
Important Note: This model inherits behaviors from its base architecture and multimodal training setup.
- Performance depends heavily on image quality and prompt clarity
- May produce incomplete or inconsistent reasoning in complex scenes
- Requires sufficient GPU memory for stable inference
- Output quality varies across domains such as scientific, artistic, or real-world imagery
- Downloads last month
- 29
