File size: 3,635 Bytes
d1b2ead 7402e41 c30b149 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 d1b2ead 7402e41 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
library_name: transformers
license: apache-2.0
base_model:
- allenai/specter2_base
pipeline_tag: text-classification
datasets:
- nicolauduran45/horizon_clusters_annotated
---
# SPECTER2-base Multilabel Horizon Clusters Classifier
This model is based on **[SPECTER2-base](https://huggingface.co/allenai/specter2_base)**, fine-tuned for multilabel classification of scientific publications into Horizon Europe clusters.
---
## Model Description
- **Base model:** [allenai/specter2_base](https://huggingface.co/allenai/specter2_base)
- **Task:** Multilabel classification (assigns one or more clusters per document)
- **Labels:** 6 Horizon Europe clusters (see below)
- **Languages:** English
- **Input:** Title and abstract concatenated
---
## Training Details
- **Training framework:** Hugging Face Transformers (`Trainer`)
- **Batch size:** 4
- **Learning rate:** 2e-5
- **Epochs:** 6
- **Optimizer:** AdamW with weight decay 0.01
- **Loss:** Binary Cross-Entropy with Logits
- **Best model selection:** F1-score on validation set
---
## Clusters (Labels)
- Civil Security for Society
- Climate, Energy and Mobility
- Culture, Creativity and Inclusive Society
- Digital, Industry and Space
- Food, Bioeconomy, Natural Resources, Agriculture and Environment
- Health
---
## Evaluation Metrics
| Epoch | Training Loss | Validation Loss | F1 | ROC AUC | Accuracy |
|-------|--------------|----------------|----------|-----------|----------|
| 1 | No log | 0.1774 | 0.910 | 0.9368 | 0.766 |
| 2 | 0.0606 | 0.1849 | 0.921 | 0.9454 | 0.787 |
| 3 | 0.0351 | 0.2071 | 0.919 | 0.9434 | 0.787 |
| 4 | 0.0180 | 0.2191 | 0.921 | 0.9451 | 0.793 |
| 5 | 0.0093 | 0.2295 | 0.921 | 0.9451 | 0.793 |
| 6 | 0.0060 | 0.2307 | 0.921 | 0.9451 | 0.793 |
**Best epoch:** 6 (highest F1 and accuracy, last improvement at epoch 4)
- **Final validation loss:** 0.2307
- **Final F1:** 0.9212
- **Final ROC AUC:** 0.9451
- **Final Accuracy:** 0.7927
---
## Per-Category Classification Report
| Label | Precision | Recall | F1-score | Support |
|---------------------------------------------------------------------|-----------|--------|----------|---------|
| Civil Security for Society | 0.97 | 0.79 | 0.87 | 39 |
| Climate, Energy and Mobility | 0.94 | 0.91 | 0.93 | 91 |
| Culture, Creativity and Inclusive Society | 0.89 | 0.88 | 0.88 | 96 |
| Digital, Industry and Space | 0.93 | 0.92 | 0.93 | 214 |
| Food, Bioeconomy, Natural Resources, Agriculture and Environment | 0.89 | 0.97 | 0.93 | 75 |
| Health | 0.96 | 0.96 | 0.96 | 73 |
| **micro avg** | 0.93 | 0.91 | 0.92 | 588 |
| **macro avg** | 0.93 | 0.91 | 0.92 | 588 |
| **weighted avg** | 0.93 | 0.91 | 0.92 | 588 |
| **samples avg** | 0.91 | 0.92 | 0.90 | 588 |
---
## License
This model is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). |