dleemiller
/

EttinX-sts-xxs

@@ -43,14 +43,13 @@ I've found the `cross-encoders/roberta-large-stsb` model to be very useful in cr
 They're simple to use, fast and very accurate.
 The Ettin series followed up with new encoders trained on the ModernBERT architecture, with a range of sizes, starting at 17M.
-Despite the small size, it performs similarly to `stsb-distilroberta-base`;
-however, the reduced parameters and computationally efficient interleaved local/global attention layers make this a very fast model,
 which can easily process a few hundred sentence pairs per second on CPU, and a few thousand per second on my A6000.
 ---
 ## Features
-- **High performing:** Achieves **Pearson: 0.8785** and **Spearman: 0.8756** on the STS-Benchmark test set.
 - **Efficient architecture:** Based on the Ettin-encoder design (17M parameters), offering very fast inference speeds.
 - **Extended context length:** Processes sequences up to 8192 tokens, great for LLM output evals.
 - **Diversified training:** Pretrained on `dleemiller/wiki-sim` and fine-tuned on `sentence-transformers/stsb`.
@@ -65,7 +64,7 @@ which can easily process a few hundred sentence pairs per second on CPU, and a f
 | `ModernCE-base-sts`            | **0.9162**         | **0.9122**          | **8192**       | 149M       | **Fast** |
 | `stsb-roberta-large`           | 0.9147            | -              | 512            | 355M       | Slow    |
 | `stsb-distilroberta-base`      | 0.8792            | -              | 512            | 82M        | Fast    |
-| `EttinX-sts-xxs`               | -        | -          | **8192**       | 17M       | **Very Fast** |
 ---
@@ -107,8 +106,8 @@ Fine-tuning was performed on the [`sentence-transformers/stsb`](https://huggingf
 ### Validation Results
 The model achieved the following test set performance after fine-tuning:
-- **Pearson Correlation:** 0.878
-- **Spearman Correlation:** 0.876
 ---

 They're simple to use, fast and very accurate.
 The Ettin series followed up with new encoders trained on the ModernBERT architecture, with a range of sizes, starting at 17M.
+The reduced parameters and computationally efficient interleaved local/global attention layers make this a very fast model,
 which can easily process a few hundred sentence pairs per second on CPU, and a few thousand per second on my A6000.
 ---
 ## Features
+- **High performing:** Achieves **Pearson: 0.8316** and **Spearman: 0.8211** on the STS-Benchmark test set.
 - **Efficient architecture:** Based on the Ettin-encoder design (17M parameters), offering very fast inference speeds.
 - **Extended context length:** Processes sequences up to 8192 tokens, great for LLM output evals.
 - **Diversified training:** Pretrained on `dleemiller/wiki-sim` and fine-tuned on `sentence-transformers/stsb`.
 | `ModernCE-base-sts`            | **0.9162**         | **0.9122**          | **8192**       | 149M       | **Fast** |
 | `stsb-roberta-large`           | 0.9147            | -              | 512            | 355M       | Slow    |
 | `stsb-distilroberta-base`      | 0.8792            | -              | 512            | 82M        | Fast    |
+| `EttinX-sts-xxs`               | 0.8316        | 0.8211          | **8192**       | 17M       | **Very Fast** |
 ---
 ### Validation Results
 The model achieved the following test set performance after fine-tuning:
+- **Pearson Correlation:** 0.8316
+- **Spearman Correlation:** 0.8211
 ---