strangervisionhf (Stranger Vision)

prithivMLmods

updated a model 1 day ago

strangervisionhf/deepseek-ocr-latest-transformers

Image-Text-to-Text • 3B • Updated 1 day ago • 206 • 1

prithivMLmods

updated a Space 1 day ago

README

📉

prithivMLmods

posted an update 2 days ago

Post

2214

A small blog post titled - Hall of Multimodal OCR VLMs and Demonstrations has been published on ↗️ https://huggingface.co/blog/prithivMLmods/multimodal-ocr-vlms on behalf of

strangervisionhf

It discusses the latest trends in OCR models, the multilingual support offered by modern OCR systems, their unique capabilities, OCR benchmark model comparisons, transformer-based implementations, and strategies for streamlining transformers compatibility.

prithivMLmods

updated a collection 4 days ago

October 2025 Models

Collection

Accelerating smooth inferences with transformers • 4 items • Updated 4 days ago

prithivMLmods

updated a model 4 days ago

strangervisionhf/dots.ocr-base-fix

Image-Text-to-Text • 3B • Updated 4 days ago • 264 • 1

prithivMLmods

posted an update 4 days ago

Post

3735

Implemented DeepSeek-OCR to support the latest transformers on the

strangervisionhf page. The page includes the model weights and corrected configuration, which fix the issues and allow transformers inference to run smoothly.🤗🔥

> Model: strangervisionhf/deepseek-ocr-latest-transformers
> Demo Space: prithivMLmods/DeepSeek-OCR-experimental

✅Supports the latest transformers
✅You can also opt out of the attention implementation if needed.
✅Supports torch version 2.6.0 or higher
✅torch version cuda: 12.4

If you are interested in experimenting with new things and streamlining compatibility, the

strangervisionhf organization is open for you, and you can join the community.

> Multimodal Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0, https://huggingface.co/collections/strangervisionhf/october-2025-models

> Thank you, @merve , for assigning the blazing-fast Zero GPU support!

> Notebook : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepSeek-OCR-Demo/deepseek_ocr_demo.ipynb

To know more about it, visit the app page or the respective model page!

prithivMLmods

published a model 4 days ago

strangervisionhf/deepseek-ocr-latest-transformers

Image-Text-to-Text • 3B • Updated 1 day ago • 206 • 1

prithivMLmods

in strangervisionhf/deepseek-ocr-latest-transformers 4 days ago

update deepseek ocr (modeling)

#1 opened 4 days ago by

prithivMLmods

updated 2 models 4 days ago

strangervisionhf/excess_layer_pruned-nanonets-1.5b

Image-Text-to-Text • 2B • Updated 4 days ago • 48 • 1

strangervisionhf/paddle.ocr_path_expose

Image-Text-to-Text • 1.0B • Updated 4 days ago • 90 • 2

prithivMLmods

posted an update 5 days ago

Post

1452

Introducing Gliese-OCR-7B-Post2.0-final, a document content-structure retrieval VLM designed for content extraction (OCR), summarization, and document visual question answering. This is the fourth and final model in the Camel Doc OCR VLM series, following Gliese-OCR-7B-Post1.0. The model delivers superior accuracy across a wide range of document types, including scanned PDFs, handwritten pages, structured forms, and analytical reports.🚀🤗

> Gliese-OCR-7B-Post2.0-final : prithivMLmods/Gliese-OCR-7B-Post2.0-final
> Gliese-OCR-7B-Post1.0 (previous) : prithivMLmods/Gliese-OCR-7B-Post1.0
> Gliese OCR Post-x.0 (collection) : https://huggingface.co/collections/prithivMLmods/gliese-ocr-post-x0
> Multimodal Implementations (collection) : https://huggingface.co/collections/prithivMLmods/multimodal-implementations
> Qwen VL Captions (other-collection) : https://huggingface.co/collections/prithivMLmods/qwen-vl-captions
> Run Demo Here : prithivMLmods/Gliese-OCR-7B-Post2.0-final
> GitHub (4bit) : https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/Gliese-OCR-7B-Post2.0-final(4bit)/Gliese_OCR_7B_Post2_0_final.ipynb

.
.
.
> To know more about it, visit the app page or the respective model page!!

prithivMLmods

posted an update 6 days ago

Post

1794

Here is the official Florence-2 Transformers-converted demo for the following vision models: florence-community/Florence-2-large, florence-community/Florence-2-large-ft, florence-community/Florence-2-base, and florence-community/Florence-2-base-ft. These models support tasks such as object detection, captioning, detailed captioning, more detailed captioning, dense region captioning, region proposal, OCR, and OCR with region. Try the official demo at the link below:

> Space: prithivMLmods/florence2-vision-models
> Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

> To know more about it, visit the app page or the respective model page!!

prithivMLmods

posted an update 13 days ago

Post

2244

Let’s have the comparison again with Multimodal OCR3:

nanonets/Nanonets-OCR2-3B vs allenai/olmOCR-2-7B-1025 vs rednote-hilab/dots.ocr vs datalab-to/chandra

Try it here @ prithivMLmods/Multimodal-OCR3

Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

merve

posted an update 13 days ago

Post

4809

deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages

2 replies

·

prithivMLmods

posted an update 17 days ago

Post

1884

Now you can try all the latest state-of-the-art multimodal vision-language models from the Qwen3-VL series demo on Hugging Face Spaces — including 4B, 8B, and 30B (Instruct, 4B-Thinking) variants. I’ve also uploaded the weights for the Abliterated variants of these models, up to 30B parameters. Check out the Spaces and model links below! 🤗🔥

✨ Qwen3-VL[4B,8B]: prithivMLmods/Qwen3-VL-Outpost
✨ Qwen3-VL-30B-A3B-Demo: prithivMLmods/Qwen3-VL-HF-Demo
✨ Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Qwen3-VL Abliterated Model Collection [ Version 1.0 ]

✨ Qwen3-VL-8B-Instruct-abliterated: prithivMLmods/Qwen3-VL-8B-Instruct-abliterated
✨ Qwen3-VL-4B-Instruct-abliterated: prithivMLmods/Qwen3-VL-4B-Instruct-abliterated
✨ Qwen3-VL-8B-Thinking-abliterated: prithivMLmods/Qwen3-VL-8B-Thinking-abliterated
✨ Qwen3-VL-4B-Thinking-abliterated: prithivMLmods/Qwen3-VL-4B-Thinking-abliterated
✨ Qwen3-VL-30B-A3B-Instruct-abliterated: prithivMLmods/Qwen3-VL-30B-A3B-Instruct-abliterated
✨ Qwen3-VL-30B-A3B-Thinking-abliterated: prithivMLmods/Qwen3-VL-30B-A3B-Thinking-abliterated

⚡Collection: prithivMLmods/qwen3-vl-abliteration-oct-1625-68f0e3e567ef076594605fac

Note: This is version 1.0 of the Abliteration of the Qwen3-VL series of models. It may perform sub-optimally in some cases. If you encounter any issues, please open a discussion.

prithivMLmods

posted an update 19 days ago

Post

3053

Introducing Image-Guard-2.0, an experimental, lightweight vision-language encoder model with a size of 0.1B (<100M parameters), trained on SigLIP2 (siglip2-base-patch16-224). Designed for multi-label image classification tasks, this model functions as an image safety system, serving as an image guard or moderator across a wide range of categories, from anime to realistic imagery.

⚡blog-article: https://huggingface.co/blog/prithivMLmods/image-guard-models

It also performs strict moderation and filtering of artificially synthesized content, demonstrating strong detection and handling of explicit images. Image-Guard-2.0 delivers robust performance in streamlined scenarios, ensuring reliable and effective classification across diverse visual inputs.

prithivMLmods

posted an update 22 days ago

Post

3332

The demo of Qwen3-VL-30B-A3B-Instruct, the next-generation and powerful vision-language model in the Qwen series, delivers comprehensive upgrades across the board — including superior text understanding and generation, deeper visual perception and reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. 🤗🔥

⚡ Space / App: prithivMLmods/Qwen3-VL-HF-Demo

The model’s demo supports a wide range of tasks, including;
Image Inference, Video Inference, PDF Inference, Image Captioning (VLA), GIF Inference.

⚡ Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

Thanks for granting the blazing-fast Zero GPU access, @merve 🙏

⚡ Other Pages

> Github: https://github.com/prithivsakthiur/qwen3-vl-hf-demo
> Multimodal VLMs July'25 : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
> VL caption — < Sep 15 ’25 : prithivMLmods/vl-caption-sep-15-25-68c7f6d737985c63c13e2391
> Multimodal VLMs - Aug'25 : prithivMLmods/multimodal-vlms-aug25-68a56aac39fe8084f3c168bd

To know more about it, visit the app page or the respective model page!!

Stranger Vision

AI & ML interests

Recent Activity

strangervisionhf/deepseek-ocr-latest-transformers

README

October 2025 Models

strangervisionhf/dots.ocr-base-fix

strangervisionhf/deepseek-ocr-latest-transformers

update deepseek ocr (modeling)

strangervisionhf/excess_layer_pruned-nanonets-1.5b

strangervisionhf/paddle.ocr_path_expose

AI & ML interests

Recent Activity

Team members 2

strangervisionhf's activity

README

update deepseek ocr (modeling)