WimBERT Synth v0

Dual-head Dutch complaint classifier: 65 topic labels + 33 experience labels.

Built on mmBERT-base with two MLP heads trained using Soft-F1 + BCE loss.

Quick Start

python inference_mmbert_hf_example.py . "Goedemiddag, ik heb al drie keer gebeld over mijn uitkering..."

Or from Python:

import torch
from model import DualHeadModel

device = "cuda" if torch.cuda.is_available() else "cpu"
model, tokenizer, config = DualHeadModel.from_pretrained(".", device=device)

text = "Goedemiddag, ik heb al drie keer gebeld over mijn uitkering en krijg geen duidelijkheid."
inputs = tokenizer(text, truncation=True, padding="max_length", 
                   max_length=config["max_length"], return_tensors="pt").to(device)

onderwerp_probs, beleving_probs = model.predict(inputs["input_ids"], inputs["attention_mask"])

# Top-5 predictions per head
def topk(probs, names, k=5):
    idx = torch.topk(probs[0], k).indices.cpu()
    return [(names[i], float(probs[0][i])) for i in idx]

print("Onderwerp:", topk(onderwerp_probs, config["labels"]["onderwerp"]))
print("Beleving:", topk(beleving_probs, config["labels"]["beleving"]))

Architecture

  • Encoder: mmBERT-base (multilingual, 768 hidden dims)
  • Heads: 2× MLP (Linear → ReLU → Linear)
  • Task: Multi-label classification (sigmoid per class)
  • Labels: 65 onderwerp (topics) + 33 beleving (experience)
  • Loss: 0.15 × (1 − Soft-F1) + 0.85 × BCE
  • Thresholds: Fixed at 0.5 (no learned thresholds)

Performance

Validation (500 samples):

Head Accuracy Precision Recall F1
Onderwerp 99.7% 0.960 0.914 0.936
Beleving 97.5% 0.837 0.729 0.780
Combined 98.6% 0.858

Training Details

Dataset & Hyperparameters

Data:

  • Source: UWV/wim-synthetic-data-rd + UWV/wim_synthetic_data_for_testing_split_labels
  • 12,491 samples (11,866 train / 625 val)
  • Avg labels/sample: onderwerp 1.85, beleving 2.08

Setup:

  • Hardware: NVIDIA A100
  • Epochs: 15 | Batch: 16 | Seq length: 1,408 tokens
  • Optimizer: AdamW (encoder LR: 8e-5, warmup 10% → cosine to 1e-6)
  • Gradient clip: 1.0 | Dropout: 0.20 | Seed: 42
  • Loss: α=0.15, temperature=2.0

Saved Artifacts

  • HF‑compatible files:
    • model.safetensors — encoder weights
    • config.json — encoder config
    • tokenizer.json, tokenizer_config.json, special_tokens_map.json — tokenizer
    • dual_head_state.pt — classification heads + metadata (no thresholds included when disabled)
    • label_names.json — label names for both heads
  • inference_mmbert_hf_example.py — example inference script (CLI)
  • model.py — DualHeadModel class for easy loading

Limitations

  • Domain-specific: Trained on Dutch municipal complaints; will degrade on other domains/languages
  • Class imbalance: Rare labels have lower recall

Use Responsibly

Not for automated legal/benefit decisions without human review. Analytics and routing only.

Reproduction

pip install -r requirements.txt
python train/train_mmbert_dual_soft_f1_simplified.py

Key config: seed 42, batch 16, epochs 15, max_length 1408, α=0.15, encoder_lr=8e-5

License

Apache 2.0

Acknowledgements

This model builds upon:

  • ModernBERT by Answer.AI — Modern BERT architecture with efficiency improvements
  • mmBERT by Answer.AI & collaborators (blog, paper) — Multilingual encoder trained on 1,800+ languages

The WIM project is funded by the Innovatiebudget Digitale Overheid (IDO) 2024 from the Dutch Ministry of Internal Affairs and Kingdom Relations (BZK).


Built with Hugging Face Transformers • Trained on UWV WIM synthetic data

Downloads last month
49
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for UWV/wimbert-synth-v0

Finetuned
(867)
this model

Datasets used to train UWV/wimbert-synth-v0

Space using UWV/wimbert-synth-v0 1

Collection including UWV/wimbert-synth-v0

Evaluation results