---
language:
- nl
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
base_model: bert-base-multilingual-cased
tags:
- multi-label
- dutch
- municipal-complaints
- mbert
- bert
- pytorch
- safetensors
datasets:
- UWV/wim-synthetic-data-rd
- UWV/wim_synthetic_data_for_testing_split_labels
metrics:
- f1
- precision
- recall
- accuracy
model-index:
- name: WimBERT Synth v0
  results:
  - task:
      name: Multi-label Text Classification
      type: text-classification
    dataset:
      name: UWV/WIM Synthetic RD
      type: UWV/wim-synthetic-data-rd
      split: validation
      subset: onderwerp
    metrics:
    - name: F1
      type: f1
      value: 0.936
    - name: Precision
      type: precision
      value: 0.960
    - name: Recall
      type: recall
      value: 0.914
    - name: Accuracy
      type: accuracy
      value: 0.997
  - task:
      name: Multi-label Text Classification
      type: text-classification
    dataset:
      name: UWV/WIM Synthetic RD
      type: UWV/wim-synthetic-data-rd
      split: validation
      subset: beleving
    metrics:
    - name: F1
      type: f1
      value: 0.780
    - name: Precision
      type: precision
      value: 0.837
    - name: Recall
      type: recall
      value: 0.729
    - name: Accuracy
      type: accuracy
      value: 0.975
widget:
- text: "Goedemiddag, ik heb al drie keer gebeld over mijn uitkering en krijg geen duidelijkheid."
---

# WimBERT Synth v0

**Dual-head Dutch complaint classifier: 65 topic labels + 33 experience labels.**

Built on mmBERT-base with two MLP heads trained using Soft-F1 + BCE loss.

## Quick Start

```bash
python inference_mmbert_hf_example.py . "Goedemiddag, ik heb al drie keer gebeld over mijn uitkering..."
```

Or from Python:

```python
import torch
from model import DualHeadModel

device = "cuda" if torch.cuda.is_available() else "cpu"
model, tokenizer, config = DualHeadModel.from_pretrained(".", device=device)

text = "Goedemiddag, ik heb al drie keer gebeld over mijn uitkering en krijg geen duidelijkheid."
inputs = tokenizer(text, truncation=True, padding="max_length", 
                   max_length=config["max_length"], return_tensors="pt").to(device)

onderwerp_probs, beleving_probs = model.predict(inputs["input_ids"], inputs["attention_mask"])

# Top-5 predictions per head
def topk(probs, names, k=5):
    idx = torch.topk(probs[0], k).indices.cpu()
    return [(names[i], float(probs[0][i])) for i in idx]

print("Onderwerp:", topk(onderwerp_probs, config["labels"]["onderwerp"]))
print("Beleving:", topk(beleving_probs, config["labels"]["beleving"]))
```

## Architecture
- **Encoder:** mmBERT-base (multilingual, 768 hidden dims)
- **Heads:** 2× MLP (Linear → ReLU → Linear)
- **Task:** Multi-label classification (sigmoid per class)
- **Labels:** 65 onderwerp (topics) + 33 beleving (experience)
- **Loss:** 0.15 × (1 − Soft-F1) + 0.85 × BCE
- **Thresholds:** Fixed at 0.5 (no learned thresholds)

## Performance
Validation (500 samples):

| Head | Accuracy | Precision | Recall | F1 |
|------|----------|-----------|--------|-----|
| Onderwerp | 99.7% | 0.960 | 0.914 | **0.936** |
| Beleving | 97.5% | 0.837 | 0.729 | **0.780** |
| Combined | 98.6% | — | — | **0.858** |

## Training Details

<details>
<summary>Dataset & Hyperparameters</summary>

**Data:**
- Source: `UWV/wim-synthetic-data-rd` + `UWV/wim_synthetic_data_for_testing_split_labels`
- 12,491 samples (11,866 train / 625 val)
- Avg labels/sample: onderwerp 1.85, beleving 2.08

**Setup:**
- Hardware: NVIDIA A100
- Epochs: 15 | Batch: 16 | Seq length: 1,408 tokens
- Optimizer: AdamW (encoder LR: 8e-5, warmup 10% → cosine to 1e-6)
- Gradient clip: 1.0 | Dropout: 0.20 | Seed: 42
- Loss: α=0.15, temperature=2.0

</details>

## Saved Artifacts
- HF‑compatible files:
  - `model.safetensors` — encoder weights
  - `config.json` — encoder config
  - `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json` — tokenizer
  - `dual_head_state.pt` — classification heads + metadata (no thresholds included when disabled)
  - `label_names.json` — label names for both heads
- `inference_mmbert_hf_example.py` — example inference script (CLI)
- `model.py` — DualHeadModel class for easy loading

## Limitations
- **Domain-specific:** Trained on Dutch municipal complaints; will degrade on other domains/languages
- **Class imbalance:** Rare labels have lower recall

## Use Responsibly
Not for automated legal/benefit decisions without human review. Analytics and routing only.

## Reproduction
```bash
pip install -r requirements.txt
python train/train_mmbert_dual_soft_f1_simplified.py
```
Key config: seed 42, batch 16, epochs 15, max_length 1408, α=0.15, encoder_lr=8e-5

## License
Apache 2.0

## Acknowledgements
This model builds upon:
- **ModernBERT** by [Answer.AI](https://github.com/AnswerDotAI/ModernBERT) — Modern BERT architecture with efficiency improvements
- **mmBERT** by Answer.AI & collaborators ([blog](https://huggingface.co/blog/mmbert), [paper](https://arxiv.org/abs/2509.06888)) — Multilingual encoder trained on 1,800+ languages

The WIM project is funded by the [Innovatiebudget Digitale Overheid (IDO) 2024](https://www.rvo.nl/subsidies-financiering/ido) from the Dutch Ministry of Internal Affairs and Kingdom Relations (BZK).

---
Built with Hugging Face Transformers • Trained on UWV WIM synthetic data