--- language: - nl license: apache-2.0 library_name: transformers pipeline_tag: text-classification base_model: bert-base-multilingual-cased tags: - multi-label - dutch - municipal-complaints - mbert - bert - pytorch - safetensors datasets: - UWV/wim-synthetic-data-rd - UWV/wim_synthetic_data_for_testing_split_labels metrics: - f1 - precision - recall - accuracy model-index: - name: WimBERT Synth v0 results: - task: name: Multi-label Text Classification type: text-classification dataset: name: UWV/WIM Synthetic RD type: UWV/wim-synthetic-data-rd split: validation subset: onderwerp metrics: - name: F1 type: f1 value: 0.936 - name: Precision type: precision value: 0.960 - name: Recall type: recall value: 0.914 - name: Accuracy type: accuracy value: 0.997 - task: name: Multi-label Text Classification type: text-classification dataset: name: UWV/WIM Synthetic RD type: UWV/wim-synthetic-data-rd split: validation subset: beleving metrics: - name: F1 type: f1 value: 0.780 - name: Precision type: precision value: 0.837 - name: Recall type: recall value: 0.729 - name: Accuracy type: accuracy value: 0.975 widget: - text: "Goedemiddag, ik heb al drie keer gebeld over mijn uitkering en krijg geen duidelijkheid." --- # WimBERT Synth v0 **Dual-head Dutch complaint classifier: 65 topic labels + 33 experience labels.** Built on mmBERT-base with two MLP heads trained using Soft-F1 + BCE loss. ## Quick Start ```bash python inference_mmbert_hf_example.py . "Goedemiddag, ik heb al drie keer gebeld over mijn uitkering..." ``` Or from Python: ```python import torch from model import DualHeadModel device = "cuda" if torch.cuda.is_available() else "cpu" model, tokenizer, config = DualHeadModel.from_pretrained(".", device=device) text = "Goedemiddag, ik heb al drie keer gebeld over mijn uitkering en krijg geen duidelijkheid." inputs = tokenizer(text, truncation=True, padding="max_length", max_length=config["max_length"], return_tensors="pt").to(device) onderwerp_probs, beleving_probs = model.predict(inputs["input_ids"], inputs["attention_mask"]) # Top-5 predictions per head def topk(probs, names, k=5): idx = torch.topk(probs[0], k).indices.cpu() return [(names[i], float(probs[0][i])) for i in idx] print("Onderwerp:", topk(onderwerp_probs, config["labels"]["onderwerp"])) print("Beleving:", topk(beleving_probs, config["labels"]["beleving"])) ``` ## Architecture - **Encoder:** mmBERT-base (multilingual, 768 hidden dims) - **Heads:** 2× MLP (Linear → ReLU → Linear) - **Task:** Multi-label classification (sigmoid per class) - **Labels:** 65 onderwerp (topics) + 33 beleving (experience) - **Loss:** 0.15 × (1 − Soft-F1) + 0.85 × BCE - **Thresholds:** Fixed at 0.5 (no learned thresholds) ## Performance Validation (500 samples): | Head | Accuracy | Precision | Recall | F1 | |------|----------|-----------|--------|-----| | Onderwerp | 99.7% | 0.960 | 0.914 | **0.936** | | Beleving | 97.5% | 0.837 | 0.729 | **0.780** | | Combined | 98.6% | — | — | **0.858** | ## Training Details
Dataset & Hyperparameters **Data:** - Source: `UWV/wim-synthetic-data-rd` + `UWV/wim_synthetic_data_for_testing_split_labels` - 12,491 samples (11,866 train / 625 val) - Avg labels/sample: onderwerp 1.85, beleving 2.08 **Setup:** - Hardware: NVIDIA A100 - Epochs: 15 | Batch: 16 | Seq length: 1,408 tokens - Optimizer: AdamW (encoder LR: 8e-5, warmup 10% → cosine to 1e-6) - Gradient clip: 1.0 | Dropout: 0.20 | Seed: 42 - Loss: α=0.15, temperature=2.0
## Saved Artifacts - HF‑compatible files: - `model.safetensors` — encoder weights - `config.json` — encoder config - `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json` — tokenizer - `dual_head_state.pt` — classification heads + metadata (no thresholds included when disabled) - `label_names.json` — label names for both heads - `inference_mmbert_hf_example.py` — example inference script (CLI) - `model.py` — DualHeadModel class for easy loading ## Limitations - **Domain-specific:** Trained on Dutch municipal complaints; will degrade on other domains/languages - **Class imbalance:** Rare labels have lower recall ## Use Responsibly Not for automated legal/benefit decisions without human review. Analytics and routing only. ## Reproduction ```bash pip install -r requirements.txt python train/train_mmbert_dual_soft_f1_simplified.py ``` Key config: seed 42, batch 16, epochs 15, max_length 1408, α=0.15, encoder_lr=8e-5 ## License Apache 2.0 ## Acknowledgements This model builds upon: - **ModernBERT** by [Answer.AI](https://github.com/AnswerDotAI/ModernBERT) — Modern BERT architecture with efficiency improvements - **mmBERT** by Answer.AI & collaborators ([blog](https://huggingface.co/blog/mmbert), [paper](https://arxiv.org/abs/2509.06888)) — Multilingual encoder trained on 1,800+ languages The WIM project is funded by the [Innovatiebudget Digitale Overheid (IDO) 2024](https://www.rvo.nl/subsidies-financiering/ido) from the Dutch Ministry of Internal Affairs and Kingdom Relations (BZK). --- Built with Hugging Face Transformers • Trained on UWV WIM synthetic data