--- library_name: transformers license: apache-2.0 base_model: - allenai/specter2_base pipeline_tag: text-classification datasets: - nicolauduran45/horizon_clusters_annotated --- # SPECTER2-base Multilabel Horizon Clusters Classifier This model is based on **[SPECTER2-base](https://huggingface.co/allenai/specter2_base)**, fine-tuned for multilabel classification of scientific publications into Horizon Europe clusters. --- ## Model Description - **Base model:** [allenai/specter2_base](https://huggingface.co/allenai/specter2_base) - **Task:** Multilabel classification (assigns one or more clusters per document) - **Labels:** 6 Horizon Europe clusters (see below) - **Languages:** English - **Input:** Title and abstract concatenated --- ## Training Details - **Training framework:** Hugging Face Transformers (`Trainer`) - **Batch size:** 4 - **Learning rate:** 2e-5 - **Epochs:** 6 - **Optimizer:** AdamW with weight decay 0.01 - **Loss:** Binary Cross-Entropy with Logits - **Best model selection:** F1-score on validation set --- ## Clusters (Labels) - Civil Security for Society - Climate, Energy and Mobility - Culture, Creativity and Inclusive Society - Digital, Industry and Space - Food, Bioeconomy, Natural Resources, Agriculture and Environment - Health --- ## Evaluation Metrics | Epoch | Training Loss | Validation Loss | F1 | ROC AUC | Accuracy | |-------|--------------|----------------|----------|-----------|----------| | 1 | No log | 0.1774 | 0.910 | 0.9368 | 0.766 | | 2 | 0.0606 | 0.1849 | 0.921 | 0.9454 | 0.787 | | 3 | 0.0351 | 0.2071 | 0.919 | 0.9434 | 0.787 | | 4 | 0.0180 | 0.2191 | 0.921 | 0.9451 | 0.793 | | 5 | 0.0093 | 0.2295 | 0.921 | 0.9451 | 0.793 | | 6 | 0.0060 | 0.2307 | 0.921 | 0.9451 | 0.793 | **Best epoch:** 6 (highest F1 and accuracy, last improvement at epoch 4) - **Final validation loss:** 0.2307 - **Final F1:** 0.9212 - **Final ROC AUC:** 0.9451 - **Final Accuracy:** 0.7927 --- ## Per-Category Classification Report | Label | Precision | Recall | F1-score | Support | |---------------------------------------------------------------------|-----------|--------|----------|---------| | Civil Security for Society | 0.97 | 0.79 | 0.87 | 39 | | Climate, Energy and Mobility | 0.94 | 0.91 | 0.93 | 91 | | Culture, Creativity and Inclusive Society | 0.89 | 0.88 | 0.88 | 96 | | Digital, Industry and Space | 0.93 | 0.92 | 0.93 | 214 | | Food, Bioeconomy, Natural Resources, Agriculture and Environment | 0.89 | 0.97 | 0.93 | 75 | | Health | 0.96 | 0.96 | 0.96 | 73 | | **micro avg** | 0.93 | 0.91 | 0.92 | 588 | | **macro avg** | 0.93 | 0.91 | 0.92 | 588 | | **weighted avg** | 0.93 | 0.91 | 0.92 | 588 | | **samples avg** | 0.91 | 0.92 | 0.90 | 588 | --- ## License This model is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).