Leesplank Wim
					Collection
				
				18 items
				• 
				Updated
					
				•
					
					1
Dual-head Dutch complaint classifier: 65 topic labels + 33 experience labels.
Built on mmBERT-base with two MLP heads trained using Soft-F1 + BCE loss.
python inference_mmbert_hf_example.py . "Goedemiddag, ik heb al drie keer gebeld over mijn uitkering..."
Or from Python:
import torch
from model import DualHeadModel
device = "cuda" if torch.cuda.is_available() else "cpu"
model, tokenizer, config = DualHeadModel.from_pretrained(".", device=device)
text = "Goedemiddag, ik heb al drie keer gebeld over mijn uitkering en krijg geen duidelijkheid."
inputs = tokenizer(text, truncation=True, padding="max_length", 
                   max_length=config["max_length"], return_tensors="pt").to(device)
onderwerp_probs, beleving_probs = model.predict(inputs["input_ids"], inputs["attention_mask"])
# Top-5 predictions per head
def topk(probs, names, k=5):
    idx = torch.topk(probs[0], k).indices.cpu()
    return [(names[i], float(probs[0][i])) for i in idx]
print("Onderwerp:", topk(onderwerp_probs, config["labels"]["onderwerp"]))
print("Beleving:", topk(beleving_probs, config["labels"]["beleving"]))
Validation (500 samples):
| Head | Accuracy | Precision | Recall | F1 | 
|---|---|---|---|---|
| Onderwerp | 99.7% | 0.960 | 0.914 | 0.936 | 
| Beleving | 97.5% | 0.837 | 0.729 | 0.780 | 
| Combined | 98.6% | — | — | 0.858 | 
Data:
UWV/wim-synthetic-data-rd + UWV/wim_synthetic_data_for_testing_split_labelsSetup:
model.safetensors — encoder weightsconfig.json — encoder configtokenizer.json, tokenizer_config.json, special_tokens_map.json — tokenizerdual_head_state.pt — classification heads + metadata (no thresholds included when disabled)label_names.json — label names for both headsinference_mmbert_hf_example.py — example inference script (CLI)model.py — DualHeadModel class for easy loadingNot for automated legal/benefit decisions without human review. Analytics and routing only.
pip install -r requirements.txt
python train/train_mmbert_dual_soft_f1_simplified.py
Key config: seed 42, batch 16, epochs 15, max_length 1408, α=0.15, encoder_lr=8e-5
Apache 2.0
This model builds upon:
The WIM project is funded by the Innovatiebudget Digitale Overheid (IDO) 2024 from the Dutch Ministry of Internal Affairs and Kingdom Relations (BZK).
Built with Hugging Face Transformers • Trained on UWV WIM synthetic data
Base model
google-bert/bert-base-multilingual-cased