Instructions to use small-models-for-glam/index-card-extractor-4b-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use small-models-for-glam/index-card-extractor-4b-v0.1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="small-models-for-glam/index-card-extractor-4b-v0.1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("small-models-for-glam/index-card-extractor-4b-v0.1")
model = AutoModelForMultimodalLM.from_pretrained("small-models-for-glam/index-card-extractor-4b-v0.1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use small-models-for-glam/index-card-extractor-4b-v0.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "small-models-for-glam/index-card-extractor-4b-v0.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "small-models-for-glam/index-card-extractor-4b-v0.1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/small-models-for-glam/index-card-extractor-4b-v0.1

SGLang

How to use small-models-for-glam/index-card-extractor-4b-v0.1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "small-models-for-glam/index-card-extractor-4b-v0.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "small-models-for-glam/index-card-extractor-4b-v0.1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "small-models-for-glam/index-card-extractor-4b-v0.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "small-models-for-glam/index-card-extractor-4b-v0.1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use small-models-for-glam/index-card-extractor-4b-v0.1 with Docker Model Runner:
```
docker model run hf.co/small-models-for-glam/index-card-extractor-4b-v0.1
```

index-card-extractor-4b-v0.1

Research preview (v0.1). An independent research fine-tune — not an official NuMind release. Built on (and credited to) NuExtract-3 via the base_model metadata. First-pass extractor for human review, not production ground truth.

A 4B open vision-language model that turns historical index-card images into structured JSON, following a schema you provide at inference — including schemas it was never trained on. A domain-adapted NuExtract-3 (Qwen3.5-4B base, Apache-2.0), fine-tuned (LoRA) on archival card collections from libraries and archives.

Hand it a card image and a JSON template / Pydantic schema, get back JSON shaped to that schema. It keeps NuExtract-3's general schema-following (it follows unseen schemas), and adds strong domain knowledge of handwritten/typed catalogue, vital-record and manuscript cards.

What it does well

Per-collection structured extraction from card images, each collection under its own schema.
Cross-lingual / cross-domain in one model: French and English handwritten death records, and English manuscript-catalogue cards.
Generalises to new schemas — on a collection + schema it never trained on it still emits 100% valid, schema-conforming JSON (see eval).

Results (held-out test sets; see the project write-up for full detail)

collection	metric	this model	reference
Teklia (FR handwritten deaths)	exact field-F1	0.887	NuExtract-3 zero-shot 0.24
NLS Advocates (manuscript catalogue)	retrieval score	0.726	Qwen3-VL-8B zero-shot 0.760
NLS Advocates	manuscript-number F1	0.952	Qwen3-VL-8B 0.886
Southborough (EN handwritten deaths)	macro field-F1	0.750¹	— (new collection)
Rubenstein (UNSEEN schema)	valid-JSON / key-conformance	1.000 / 1.000	base 0.733 / 1.000

¹ The Southborough test set is agent-labelled, not human-verified → relative to those labels, not gold. Teklia and NLS numbers are against human-grade ground truth.

Notably, on the manuscript-number field (the load-bearing retrieval field) the 4B model exceeds the 8B that generated part of its training labels — though on overall NLS retrieval the 8B (0.760) still leads this model (0.726).

Training

Method: LoRA SFT (rank 16, all layers), 3 epochs, on numind/NuExtract3. No RL — GRPO was evaluated and found not to help this task (see write-up).
Data (~930 examples), per collection:
- Teklia (MIT) — expert ground truth (from the source dataset's XML).
- NLS Advocates — training labels are machine-generated silver (Qwen3-VL-8B); the held-out eval set (103 cards) is cataloguer-reviewed (human-grade).
- Southborough (public domain) — training labels are machine-generated silver (Qwen3-VL-8B); the held-out eval set (12 cards) is agent-labelled, not human-verified. So two of three collections train on silver labels; only Teklia and the NLS eval are human-grade.
Each example pairs an image with its collection's schema-conditioned prompt and target JSON.

Intended use

Libraries, archives and museums digitising card catalogues / index drawers into ingestible structured records. Define the schema your catalogue needs; run the model per collection. Best used as a first-pass extractor with human review, not unattended ground truth.

Limitations & honest caveats

Two of three training collections use silver labels → a quality ceiling on free-text fields (names, places); the model can inherit the labeler's conventions.
Handwriting remains the hard part: place names and composite/long free-text fields are weakest.
Test sets are small (12–103 cards) → treat single-point numbers as directional, not precise.
Use greedy / non-thinking decoding; reasoning mode was not trained and underperforms.
NLS Advocates source data is not yet publicly released (released model only; data pending).

How to use

Works exactly like NuExtract-3 — pass a card image plus a template (or a Pydantic schema), get JSON shaped to it. Both interfaces are retained, including on schemas the model never trained on (verified: 100% valid, schema-conforming JSON on a held-out collection, better coverage than base).

import json
from transformers import AutoModelForImageTextToText, AutoProcessor

model = AutoModelForImageTextToText.from_pretrained(MODEL_ID, trust_remote_code=True, device_map="auto")
processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)

template = {            # your collection's schema — define whatever fields you need
    "name": "verbatim-string",
    "date_of_death": "string",
    "cause_of_death": "string",
    "age": "string",
    "birthplace": "string",
}
messages = [{"role": "user", "content": [{"type": "image", "image": card_image}]}]
text = processor.apply_chat_template(
    messages, template=json.dumps(template, indent=2),
    enable_thinking=False, add_generation_prompt=True, tokenize=False,
)
# ... processor(text, images=...) -> model.generate(...) -> JSON shaped to `template`

Use greedy / non-thinking decoding (reasoning mode was trained out). A Pydantic schema works via Model.model_json_schema() → NuExtract template (see the NuExtract-3 card / numind utils).

Reproducibility & the bigger idea

Full pipeline, scorer, eval scripts and per-collection schemas are in the project write-up. The evidence here suggests a collaborative, multi-institution index-card GT corpus could yield one strong shared open model for GLAM card digitisation — adding collections improved this model's ability to follow unseen schemas. Contributions of labelled card collections welcome.