--- license: apache-2.0 tags: - text-generation-inference - uncensored - abliterated - unfiltered - unredacted - vllm - pytorch - bf16 - max language: - en pipeline_tag: image-text-to-text library_name: transformers base_model: - Qwen/Qwen3-VL-8B-Thinking --- ![1](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/luQZ5rQ4G4mugTtDkKlO6.png) # **Qwen3-VL-8B-Thinking-Unredacted-MAX** > **Qwen3-VL-8B-Thinking-Unredacted-MAX** is an optimized release built on top of **huihui-ai/Huihui-Qwen3-VL-8B-Thinking-abliterated**. This version focuses on **stable inference behavior, improved packaging consistency, and updated Transformers compatibility**, while preserving the strong multimodal reasoning and “thinking” capabilities of the base architecture. The result is a capable **8B vision-language model** designed for structured reasoning, captioning, and research-oriented multimodal workflows. ## Key Highlights * **Optimized Release Structure** Improved repository organization for smoother deployment and reproducible loading. * **Modern Transformers Compatibility** Updated to work reliably with recent Hugging Face Transformers and multimodal processing pipelines. * **8B Thinking Vision-Language Architecture** Built on **Qwen3-VL-8B-Thinking**, enabling stronger step-by-step visual reasoning compared to standard instruct variants. * **Stable Multimodal Reasoning** Improved consistency for image interpretation, captioning, and structured output generation. * **High-Fidelity Caption Generation** Produces detailed, structured descriptions suitable for dataset creation, annotation, and accessibility use cases. * **Dynamic Resolution Support** Retains native support for varying image resolutions and aspect ratios. --- ## Base Model Signatures This model has been re-sharded and optimized for the latest Transformers version from the base model: https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-8B-Thinking-abliterated --- ## Quick Start with Transformers ```python id="q3vl8b_thinking_code" from transformers import Qwen3VLForConditionalGeneration, AutoProcessor from qwen_vl_utils import process_vision_info import torch model = Qwen3VLForConditionalGeneration.from_pretrained( "prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX", torch_dtype="auto", device_map="auto" ) processor = AutoProcessor.from_pretrained( "prithivMLmods/Qwen3-VL-8B-Thinking-Unredacted-MAX" ) messages = [ { "role": "user", "content": [ { "type": "image", "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg", }, {"type": "text", "text": "Provide a detailed caption for this image."}, ], } ] text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ).to("cuda") generated_ids = model.generate(**inputs, max_new_tokens=256) output_text = processor.batch_decode( [out[len(inp):] for inp, out in zip(inputs.input_ids, generated_ids)], skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output_text) ``` ## Intended Use * Multimodal reasoning research and evaluation * Image captioning and dataset annotation pipelines * Vision-language model benchmarking and robustness testing * Creative visual storytelling and structured description generation * Prototyping AI systems that combine reasoning with image understanding ## Limitations & Risks > **Important Note**: This model inherits behaviors from its base architecture and multimodal training setup. * Performance depends heavily on image quality and prompt clarity * May produce incomplete or inconsistent reasoning in complex scenes * Requires sufficient GPU memory for stable inference * Output quality varies across domains such as scientific, artistic, or real-world imagery