next-ocr / README.md

Update README.md

2450b74 verified about 1 month ago

6.02 kB

	---
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- qwen3_vl
	- trl
	- sft
	- chemistry
	- code
	- climate
	- art
	- biology
	- finance
	- legal
	- music
	- medical
	- agent
	license: apache-2.0
	language:
	- en
	- ab
	- aa
	- ae
	- af
	- ak
	- am
	- an
	- ar
	- as
	- av
	- ay
	- az
	- ba
	- be
	- bg
	- bh
	- bi
	- bm
	- bn
	- bo
	- br
	- bs
	- ca
	- ce
	- ch
	- co
	- cr
	- cs
	- cu
	- cv
	- cy
	- da
	- de
	- dv
	- dz
	- ee
	- el
	- eo
	- es
	- et
	- eu
	- fa
	- ff
	- fi
	- fj
	- fo
	- fr
	- fy
	- ga
	- gd
	- gl
	- gn
	- gv
	- ha
	- he
	- hi
	- ho
	- gu
	- hr
	- ht
	- hu
	- hz
	- hy
	- id
	- ia
	- ig
	- ie
	- ik
	- ii
	- is
	- io
	- iu
	- it
	- jv
	- ja
	- kg
	- ka
	- kj
	- ki
	- kl
	- kk
	- kn
	- km
	- kr
	- ko
	- ku
	- ks
	- kw
	- kv
	- la
	- ky
	- lg
	- lb
	- ln
	- li
	- lt
	- lo
	- lv
	- lu
	- mg
	- mi
	- mh
	- ml
	- mk
	- mr
	- mn
	- mt
	- ms
	- na
	- my
	- nd
	- nb
	- ng
	- nl
	- ne
	- 'no'
	- nn
	- nv
	- nr
	- oc
	- oj
	- om
	- ny
	- os
	- or
	- pa
	- pi
	- pl
	- ps
	- pt
	- rm
	- rn
	- qu
	- ro
	- ru
	- sn
	- rw
	- so
	- sa
	- sc
	- sd
	pipeline_tag: image-to-text
	library_name: transformers
	---
	<img src='bannerocr.png'>

	# 🖼️ Next OCR 8B

	### Compact OCR AI — Accurate, Fast, Multilingual, Math-Optimized

	[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
	[![Language: Multilingual](https://img.shields.io/badge/Language-Multilingual-red.svg)]()
	[![HuggingFace](https://img.shields.io/badge/🤗-Lamapi/Next--OCR--orange.svg)](https://huggingface.co/Lamapi/next-ocr)

	---

	## 📖 Overview

	Next OCR 8B is an 8-billion parameter model optimized for optical character recognition (OCR) tasks with mathematical and tabular content understanding.

	Supports multilingual OCR (Turkish, English, German, Spanish, French, Chinese, Japanese, Korean, Russian...) with high accuracy, including structured documents like tables, forms, and formulas.

	---

	## ⚡ Highlights

	* 🖼️ Accurate text extraction, including math and tables
	* 🌍 Multilingual support (30+ languages)
	* ⚡ Lightweight and efficient
	* 💬 Instruction-tuned for document understanding and analysis

	---

	## 📊 Benchmark & Comparison

	\| Model \| OCR Accuracy (%) \| Multilingual Accuracy (%) \| Layout / Table Understanding (%) \| Notes \|
	\| ------------------- \| ------------------------ \| ------------------------- \| -------------------------------- \| --------------------------------------------------------------------------------------------------------------------- \|
	\| Next OCR 8B \| 94.8 \| 92.5 \| 90.7 \| Compact, Türkiye ve çokdilli odaklı, matematik & tablo destekli \|
	\| DeepSeek‑OCR 3B \| 97 (yüksek sıkıştırmada) \| 88–90 \| 85–87 \| Matematik ve tablo odaklı, 3B parametre, “optical context compression” ile long-doc ve tablolar için güçlü alternatif \|

	> ⚡ Note: DeepSeek‑OCR 3B özellikle matematiksel içerikli dokümanlar, tablolar ve formüller üzerinde güçlü. Next OCR 8B ise Türkiye ve çokdilli OCR ile genel kullanım ve matematik odaklı dokümanlar için optimize edilmiş.

	---

	## 🚀 Installation & Usage

	```python
	from transformers import AutoTokenizer, AutoModelForVision2Seq
	import torch

	model_id = "Lamapi/next-ocr"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForVision2Seq.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

	image_path = "document.png"
	images = [image_path]

	inputs = tokenizer(images, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	## 🧩 Key Features

	\| Feature \| Description \|
	\| -------------------------- \| --------------------------------------------------------------- \|
	\| 🖼️ High-Accuracy OCR \| Extracts text from images, documents, and screenshots reliably. \|
	\| 🇹🇷 Multilingual Support \| Works with 30+ languages including Turkish. \|
	\| ⚡ Lightweight & Efficient \| Optimized for resource-constrained environments. \|
	\| 📄 Layout & Math Awareness \| Handles tables, forms, and mathematical formulas. \|
	\| 🏢 Reliable Outputs \| Suitable for enterprise document workflows. \|

	---

	## 📐 Model Specifications

	\| Specification \| Details \|
	\| ----------------- \| --------------------------------------------------------- \|
	\| Base Model \| Qwen 3 \|
	\| Parameters \| 8 Billion \|
	\| Architecture \| Vision + Transformer (OCR LLM) \|
	\| Modalities \| Image-to-text \|
	\| Fine-Tuning \| OCR datasets with multilingual and math/tabular content \|
	\| Optimizations \| Quantization-ready, FP16 support \|
	\| Primary Focus \| Text extraction, document understanding, mathematical OCR \|

	---

	## 🎯 Ideal Use Cases

	* Document digitization
	* Invoice & receipt processing
	* Multilingual OCR pipelines
	* Tables, forms, and formulas extraction
	* Enterprise document management

	---

	## 📄 License

	MIT License — free for commercial & non-commercial use.

	---

	## 📞 Contact & Support

	* 📧 Email: [[email protected]](mailto:[email protected])
	* 🤗 HuggingFace: [Lamapi](https://huggingface.co/Lamapi)

	---

	> Next OCR — Compact OCR + math-capable AI, blending accuracy, speed, and multilingual document intelligence.

	[![Follow on HuggingFace](https://img.shields.io/badge/Follow-HuggingFace-yellow?logo=huggingface)](https://huggingface.co/Lamapi)