Instructions to use eulogik/TinyDoc-VLM-256M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use eulogik/TinyDoc-VLM-256M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("visual-question-answering", model="eulogik/TinyDoc-VLM-256M")# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("eulogik/TinyDoc-VLM-256M", dtype="auto") - PEFT
How to use eulogik/TinyDoc-VLM-256M with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
TinyDoc-VLM-256M
Smallest document AI that actually works. 256M params. Runs on a MacBook. Apache 2.0.
What is this?
A 256M-parameter vision-language model that reads documents: invoices, receipts, forms, tables, charts. It extracts structured data, answers questions, and parses tables โ all from a single model that runs on CPU.
Why does this exist? Most document AI models are 7B+ params and need expensive GPUs. TinyDoc-VLM fits in <1GB VRAM and runs on a MacBook Air, Raspberry Pi 5, or any CPU with ONNX.
Quick Start
pip install tinydoc
from PIL import Image
from tinydoc import TinyDocExtractor
extractor = TinyDocExtractor(device="cpu")
# Ask questions
img = Image.open("invoice.png")
result = extractor.ask(img, "What is the total?")
print(result.answer) # "$1,234.56"
# Extract structured JSON
result = extractor.extract(img, output_format="json")
print(result.fields) # {"total": "$1,234.56", "date": "2024-01-15", ...}
# Extract tables
result = extractor.extract_table(img)
print(result.markdown)
Direct Model Access
from tinydoc_vlm import TinyDocVLMForConditionalGeneration, TinyDocVLMProcessor
model = TinyDocVLMForConditionalGeneration.from_pretrained("eulogik/TinyDoc-VLM-256M")
processor = TinyDocVLMProcessor()
Architecture
Image (384ร384)
โ
SigLIP Vision Encoder (93M) โ 576 patches ร 768 dim
โ
Pixel-Shuffle Compressor (scale=3) โ 9ร compression โ 64 tokens
โ
Visual Position Embeddings
โ
SmolLM2 Decoder (135M) โ 30 layers, GQA (9:3 heads), 8192 ctx
โ
Multi-Task Output Heads
โ
JSON / KV Extraction / Table / OCR / QA
Total: 256M params | Vision: 93M | Compressor: 3M | Decoder: 135M | Heads: 25M
LoRA Fine-tuning
Train on your own documents with LoRA โ only 2.7M params (0.93%) are trainable.
# Generate synthetic docs
python data/synthetic/generator.py --num-docs 1000 --output-dir data/synthetic/output
# Train on M4 Mac (~4.6 hours for 5K steps)
python training/fast_train.py --steps 5000 --device mps
# Train on GPU (~1 hour for 5K steps)
python training/fast_train.py --steps 5000 --device cuda
Colab notebook: training/colab_train.ipynb
Training Results
| Metric | Value |
|---|---|
| Best checkpoint | Step 14,000 (loss: 15.0) |
| Training data | 3,000 synthetic docs (6,815 QA pairs) |
| Training time | 15.1 hours on M4 Mac |
| LoRA adapter | eulogik/TinyDoc-VLM-LoRA |
Deployment
ONNX (Recommended for Production)
python export/export_onnx.py --model-path eulogik/TinyDoc-VLM-256M --output model.onnx
ONNX files on HF Hub:
tinydoc-vlm-vision.onnxโ Vision encoder (33KB)tinydoc-vlm-compressor.onnxโ Token compressor (31KB)tinydoc-vlm-decoder.onnxโ Language decoder (59MB)
HuggingFace Spaces
Live demo: huggingface.co/spaces/eulogik/TinyDoc-VLM
Benchmarks
| Benchmark | Status | Target |
|---|---|---|
| OCRBench | In progress | >75% |
| DocVQA | Pending | >85% |
| FUNSD | Pending | >95% |
What can it do?
- Invoice processing โ Extract line items, totals, dates, vendor info
- Receipt scanning โ Parse store receipts, extract amounts
- Form understanding โ Read forms, extract field-value pairs
- Table extraction โ Convert tables to structured data
- Document Q&A โ Ask questions about any document
- OCR โ Read printed text from images
- Chart understanding โ Extract data from charts and graphs
Related Models
- eulogik/TinyDoc-VLM-LoRA โ LoRA fine-tuned adapter
- SmolLM2-135M โ Base decoder
- SigLIP-B/16 โ Base vision encoder
Links
| Resource | URL |
|---|---|
| GitHub | github.com/eulogik/TinyDoc-VLM |
| PyPI | pypi.org/project/tinydoc |
| LoRA Checkpoint | huggingface.co/eulogik/TinyDoc-VLM-LoRA |
| Live Demo | huggingface.co/spaces/eulogik/TinyDoc-VLM |
Citation
@software{eulogik_tinydoc_vlm_2026,
author = {eulogik},
title = {TinyDoc-VLM: 256M-Param Document-Specialist Vision-Language Model},
year = {2026},
url = {https://github.com/eulogik/TinyDoc-VLM}
}
License
Apache 2.0. Free for commercial use.
Built by eulogik โ AI infrastructure for document intelligence.
- Downloads last month
- 399