WimBERT Synth v0

Dual-head Dutch complaint classifier: 65 topic labels + 33 experience labels.

Built on mmBERT-base with two MLP heads trained using Soft-F1 + BCE loss.

Quick Start

python inference_mmbert_hf_example.py . "Goedemiddag, ik heb al drie keer gebeld over mijn uitkering..."

Or from Python:

import torch
from model import DualHeadModel

device = "cuda" if torch.cuda.is_available() else "cpu"
model, tokenizer, config = DualHeadModel.from_pretrained(".", device=device)

text = "Goedemiddag, ik heb al drie keer gebeld over mijn uitkering en krijg geen duidelijkheid."
inputs = tokenizer(text, truncation=True, padding="max_length", 
                   max_length=config["max_length"], return_tensors="pt").to(device)

onderwerp_probs, beleving_probs = model.predict(inputs["input_ids"], inputs["attention_mask"])

# Top-5 predictions per head
def topk(probs, names, k=5):
    idx = torch.topk(probs[0], k).indices.cpu()
    return [(names[i], float(probs[0][i])) for i in idx]

print("Onderwerp:", topk(onderwerp_probs, config["labels"]["onderwerp"]))
print("Beleving:", topk(beleving_probs, config["labels"]["beleving"]))

Architecture

Encoder: mmBERT-base (multilingual, 768 hidden dims)
Heads: 2× MLP (Linear → ReLU → Linear)
Task: Multi-label classification (sigmoid per class)
Labels: 65 onderwerp (topics) + 33 beleving (experience)
Loss: 0.15 × (1 − Soft-F1) + 0.85 × BCE
Thresholds: Fixed at 0.5 (no learned thresholds)

Performance

Validation (500 samples):

Head	Accuracy	Precision	Recall	F1
Onderwerp	99.7%	0.960	0.914	0.936
Beleving	97.5%	0.837	0.729	0.780
Combined	98.6%	—	—	0.858

Training Details

Dataset & Hyperparameters

Data:

Source: UWV/wim-synthetic-data-rd + UWV/wim_synthetic_data_for_testing_split_labels
12,491 samples (11,866 train / 625 val)
Avg labels/sample: onderwerp 1.85, beleving 2.08

Setup:

Hardware: NVIDIA A100
Epochs: 15 | Batch: 16 | Seq length: 1,408 tokens
Optimizer: AdamW (encoder LR: 8e-5, warmup 10% → cosine to 1e-6)
Gradient clip: 1.0 | Dropout: 0.20 | Seed: 42
Loss: α=0.15, temperature=2.0

Saved Artifacts

HF‑compatible files:
- model.safetensors — encoder weights
- config.json — encoder config
- tokenizer.json, tokenizer_config.json, special_tokens_map.json — tokenizer
- dual_head_state.pt — classification heads + metadata (no thresholds included when disabled)
- label_names.json — label names for both heads
inference_mmbert_hf_example.py — example inference script (CLI)
model.py — DualHeadModel class for easy loading

Limitations

Domain-specific: Trained on Dutch municipal complaints; will degrade on other domains/languages
Class imbalance: Rare labels have lower recall

Use Responsibly

Not for automated legal/benefit decisions without human review. Analytics and routing only.

Reproduction

pip install -r requirements.txt
python train/train_mmbert_dual_soft_f1_simplified.py

Key config: seed 42, batch 16, epochs 15, max_length 1408, α=0.15, encoder_lr=8e-5

License

Apache 2.0

Acknowledgements

This model builds upon:

ModernBERT by Answer.AI — Modern BERT architecture with efficiency improvements
mmBERT by Answer.AI & collaborators (blog, paper) — Multilingual encoder trained on 1,800+ languages

The WIM project is funded by the Innovatiebudget Digitale Overheid (IDO) 2024 from the Dutch Ministry of Internal Affairs and Kingdom Relations (BZK).

Built with Hugging Face Transformers • Trained on UWV WIM synthetic data