 
LightOnOCR-1B-1025
Full BF16 version of the model. We recommend this variant for further fine-tuning or research use.
LightOnOCR-1B is a compact, end-to-end vision–language model for Optical Character Recognition (OCR) and document understanding. It achieves state-of-the-art accuracy in its weight class while being several times faster and cheaper than larger general-purpose VLMs.
📝 Read the full blog post | 🚀 Try the demo
Highlights
- ⚡ Speed: 5× faster than dots.ocr, 2× faster than PaddleOCR-VL-0.9B, 1.73× faster than DeepSeekOCR
- 💸 Efficiency: Processes 5.71 pages/s on a single H100 (~493k pages/day) for <$0.01 per 1,000 pages
- 🧠 End-to-End: Fully differentiable, no external OCR pipeline
- 🧾 Versatile: Handles tables, receipts, forms, multi-column layouts, and math notation
- 🌍 Compact variants: 32k and 16k vocab options for European languages
Model Overview
LightOnOCR combines a high-performance Vision Transformer encoder with a lightweight text decoder distilled from high-quality open VLMs. It is optimized for document parsing tasks, producing accurate, layout-aware text extraction from high-resolution pages.
Benchmarks
| Model | ArXiv | Old Scans | Math | Tables | Multi-Column | Tiny Text | Base | Overall | 
|---|---|---|---|---|---|---|---|---|
| LightOnOCR-1B-1025 (151k vocab) | 81.4 | 71.6 | 76.4 | 35.2 | 80.0 | 88.7 | 99.5 | 76.1 | 
| LightOnOCR-1B-32k (32k vocab) | 80.6 | 66.2 | 73.5 | 33.5 | 71.2 | 87.6 | 99.5 | 73.1 | 
| LightOnOCR-1B-16k (16k vocab) | 82.3 | 72.9 | 75.3 | 33.5 | 78.6 | 85.1 | 99.8 | 75.4 | 
All benchmarks evaluated using vLLM on the Olmo-Bench.
Installation
uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install -U vllm \
    --torch-backend=auto \
    --extra-index-url https://wheels.vllm.ai/nightly \
    --prerelease=allow
uv pip install pypdfium2 pillow requests
Start Server
vllm serve lightonai/LightOnOCR-1B-1025 \
    --limit-mm-per-prompt '{"image": 1}' \
    --async-scheduling
PDF Inference
import base64
import requests
import pypdfium2 as pdfium
import io
ENDPOINT = "http://localhost:8000/v1/chat/completions"
MODEL = "lightonai/LightOnOCR-1B-1025"
# Download PDF from arXiv
pdf_url = "https://arxiv.org/pdf/2412.13663"
pdf_data = requests.get(pdf_url).content
# Open PDF and convert first page to image
pdf = pdfium.PdfDocument(pdf_data)
page = pdf[0]
# Render at 300 DPI (scale factor = 300/72 ≈ 4.17)
pil_image = page.render(scale=4.17).to_pil()
# Convert to base64
buffer = io.BytesIO()
pil_image.save(buffer, format="PNG")
image_base64 = base64.b64encode(buffer.getvalue()).decode('utf-8')
# Make request
payload = {
    "model": MODEL,
    "messages": [{
        "role": "user",
        "content": [{
            "type": "image_url",
            "image_url": {"url": f"data:image/png;base64,{image_base64}"}
        }]
    }],
    "max_tokens": 6500,
    "temperature": 0.2,
    "top_p": 0.9,
}
response = requests.post(ENDPOINT, json=payload)
text = response.json()['choices'][0]['message']['content']
print(text)
Rendering and Preprocessing Tips
- Render PDFs to PNG or JPEG at a target longest dimension of 1280–1300 px
- Maintain aspect ratio to preserve text geometry
- LightOnOCR is robust to moderate skew; heavy rotation correction is optional
- Use one image per page; batching supported by vLLM
Variants
| Variant | Description | 
|---|---|
| LightOnOCR-1B-1025 | Full multilingual model (default) | 
| LightOnOCR-1B-32k | Fastest pruned-vocabulary version (32k tokens) optimized for European languages | 
| LightOnOCR-1B-16k | Most compact variant with smallest vocabulary | 
Fine-tuning
Transformers integration is coming soon for training and inference.
LightOnOCR is fully differentiable and supports:
- LoRA fine-tuning
- Domain adaptation (receipts, scientific articles, forms, etc.)
- Multilingual fine-tuning with task-specific corpora
Example fine-tuning configurations will be released alongside the dataset.
Data
Trained on a diverse large-scale PDF corpus covering:
- Scientific papers, books, receipts, invoices, tables, forms, and handwritten text
- Multiple languages (Latin alphabet dominant)
- Real and synthetic document scans
The dataset will be released under an open license.
License
Apache License 2.0
Citation
@misc{lightonocr2025,
  title        = {LightOnOCR-1B: End-to-End and Efficient Domain-Specific Vision-Language Models for OCR},
  author       = {Said Taghadouini and Baptiste Aubertin and Adrien Cavaillès},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/blog/lightonai/lightonocr}}
}
- Downloads last month
- 1,959
Model tree for lightonai/LightOnOCR-1B-1025
Unable to build the model tree, the base model loops to the model itself. Learn more.
