|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- sentence-transformers |
|
|
- cross-encoder |
|
|
- reranker |
|
|
- generated_from_trainer |
|
|
- dataset_size:78704 |
|
|
- loss:ListNetLoss |
|
|
base_model: jhu-clsp/ettin-encoder-1b |
|
|
datasets: |
|
|
- microsoft/ms_marco |
|
|
pipeline_tag: text-ranking |
|
|
library_name: sentence-transformers |
|
|
metrics: |
|
|
- map |
|
|
- mrr@10 |
|
|
- ndcg@10 |
|
|
model-index: |
|
|
- name: CrossEncoder based on jhu-clsp/ettin-encoder-1b |
|
|
results: |
|
|
- task: |
|
|
type: cross-encoder-reranking |
|
|
name: Cross Encoder Reranking |
|
|
dataset: |
|
|
name: NanoMSMARCO R100 |
|
|
type: NanoMSMARCO_R100 |
|
|
metrics: |
|
|
- type: map |
|
|
value: 0.5989 |
|
|
name: Map |
|
|
- type: mrr@10 |
|
|
value: 0.5889 |
|
|
name: Mrr@10 |
|
|
- type: ndcg@10 |
|
|
value: 0.6445 |
|
|
name: Ndcg@10 |
|
|
- task: |
|
|
type: cross-encoder-reranking |
|
|
name: Cross Encoder Reranking |
|
|
dataset: |
|
|
name: NanoNFCorpus R100 |
|
|
type: NanoNFCorpus_R100 |
|
|
metrics: |
|
|
- type: map |
|
|
value: 0.3535 |
|
|
name: Map |
|
|
- type: mrr@10 |
|
|
value: 0.5271 |
|
|
name: Mrr@10 |
|
|
- type: ndcg@10 |
|
|
value: 0.3808 |
|
|
name: Ndcg@10 |
|
|
- task: |
|
|
type: cross-encoder-reranking |
|
|
name: Cross Encoder Reranking |
|
|
dataset: |
|
|
name: NanoNQ R100 |
|
|
type: NanoNQ_R100 |
|
|
metrics: |
|
|
- type: map |
|
|
value: 0.6692 |
|
|
name: Map |
|
|
- type: mrr@10 |
|
|
value: 0.6896 |
|
|
name: Mrr@10 |
|
|
- type: ndcg@10 |
|
|
value: 0.7157 |
|
|
name: Ndcg@10 |
|
|
- task: |
|
|
type: cross-encoder-nano-beir |
|
|
name: Cross Encoder Nano BEIR |
|
|
dataset: |
|
|
name: NanoBEIR R100 mean |
|
|
type: NanoBEIR_R100_mean |
|
|
metrics: |
|
|
- type: map |
|
|
value: 0.5405 |
|
|
name: Map |
|
|
- type: mrr@10 |
|
|
value: 0.6018 |
|
|
name: Mrr@10 |
|
|
- type: ndcg@10 |
|
|
value: 0.5804 |
|
|
name: Ndcg@10 |
|
|
--- |
|
|
|
|
|
# CrossEncoder based on jhu-clsp/ettin-encoder-1b |
|
|
|
|
|
This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [jhu-clsp/ettin-encoder-1b](https://huggingface.co/jhu-clsp/ettin-encoder-1b) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
- **Model Type:** Cross Encoder |
|
|
- **Base model:** [jhu-clsp/ettin-encoder-1b](https://huggingface.co/jhu-clsp/ettin-encoder-1b) <!-- at revision befd76be43d08b89ff9957012f3ff29d0842780b --> |
|
|
- **Maximum Sequence Length:** 7999 tokens |
|
|
- **Number of Output Labels:** 1 label |
|
|
- **Training Dataset:** |
|
|
- [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) |
|
|
- **Language:** en |
|
|
<!-- - **License:** Unknown --> |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
|
- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html) |
|
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
|
- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder) |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
|
|
First install the Sentence Transformers library: |
|
|
|
|
|
```bash |
|
|
pip install -U sentence-transformers |
|
|
``` |
|
|
|
|
|
Then you can load this model and run inference. |
|
|
```python |
|
|
from sentence_transformers import CrossEncoder |
|
|
|
|
|
# Download from the 🤗 Hub |
|
|
model = CrossEncoder("kdhole/reranker-msmarco-v1.1-ettin-encoder-1b-listnet") |
|
|
# Get scores for pairs of texts |
|
|
pairs = [ |
|
|
['how do you measure a horse in hands', '1 A hand is equal to 4 inches or 10.2cms. 2 You should measure your horse from the point of the withers to the ground. 3 A horse that is 61 inches tall is 15.1 hands or 15 hands and 1 inch or 15.1hh. 4 This is calculated using (61/4 = 15.25); the .25 is the decimal equivalent of one quarter and a quarter of 4 = 1; so 15.1hh.'], |
|
|
['how do you measure a horse in hands', '1 If a measuring tape is being used, conversion of the measurement from inches to hands is required. 2 One hand equals 4 inches (10.2 cm), so divide the measurement by 4. 3 For example, if the horse measures 71 inches (180.3 cm), divide 71 by 4 inches. 4 The result is 17 hands with 3 inches (7.6 cm) left over.'], |
|
|
['how do you measure a horse in hands', 'Record the measurement. 1 If the horse measuring stick is being used, then the measurement can be recorded in hands immediately. 2 If a measuring tape is being used, conversion of the measurement from inches to hands is required. 3 One hand equals 4 inches (10.2 cm), so divide the measurement by 4.'], |
|
|
['how do you measure a horse in hands', 'After you have measured your horse you will need to convert the results from inches to hands.. Horse height is correctly referred to by a unit of measurement known as a hand.. One hand is equal to four inches. The gray mare in the photo above is 58 inches from the ground to the top of her withers. When 58 is divided by 4, you have 14.5.'], |
|
|
['how do you measure a horse in hands', '1 If the horse measuring stick is being used, then the measurement can be recorded in hands immediately. 2 If a measuring tape is being used, conversion of the measurement from inches to hands is required. 3 One hand equals 4 inches (10.2 cm), so divide the measurement by 4.'], |
|
|
] |
|
|
scores = model.predict(pairs) |
|
|
print(scores.shape) |
|
|
# (5,) |
|
|
|
|
|
# Or rank different texts based on similarity to a single text |
|
|
ranks = model.rank( |
|
|
'how do you measure a horse in hands', |
|
|
[ |
|
|
'1 A hand is equal to 4 inches or 10.2cms. 2 You should measure your horse from the point of the withers to the ground. 3 A horse that is 61 inches tall is 15.1 hands or 15 hands and 1 inch or 15.1hh. 4 This is calculated using (61/4 = 15.25); the .25 is the decimal equivalent of one quarter and a quarter of 4 = 1; so 15.1hh.', |
|
|
'1 If a measuring tape is being used, conversion of the measurement from inches to hands is required. 2 One hand equals 4 inches (10.2 cm), so divide the measurement by 4. 3 For example, if the horse measures 71 inches (180.3 cm), divide 71 by 4 inches. 4 The result is 17 hands with 3 inches (7.6 cm) left over.', |
|
|
'Record the measurement. 1 If the horse measuring stick is being used, then the measurement can be recorded in hands immediately. 2 If a measuring tape is being used, conversion of the measurement from inches to hands is required. 3 One hand equals 4 inches (10.2 cm), so divide the measurement by 4.', |
|
|
'After you have measured your horse you will need to convert the results from inches to hands.. Horse height is correctly referred to by a unit of measurement known as a hand.. One hand is equal to four inches. The gray mare in the photo above is 58 inches from the ground to the top of her withers. When 58 is divided by 4, you have 14.5.', |
|
|
'1 If the horse measuring stick is being used, then the measurement can be recorded in hands immediately. 2 If a measuring tape is being used, conversion of the measurement from inches to hands is required. 3 One hand equals 4 inches (10.2 cm), so divide the measurement by 4.', |
|
|
] |
|
|
) |
|
|
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...] |
|
|
``` |
|
|
|
|
|
<!-- |
|
|
### Direct Usage (Transformers) |
|
|
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary> |
|
|
|
|
|
</details> |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
### Downstream Usage (Sentence Transformers) |
|
|
|
|
|
You can finetune this model on your own dataset. |
|
|
|
|
|
<details><summary>Click to expand</summary> |
|
|
|
|
|
</details> |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
### Out-of-Scope Use |
|
|
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.* |
|
|
--> |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Metrics |
|
|
|
|
|
#### Cross Encoder Reranking |
|
|
|
|
|
* Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100` |
|
|
* Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters: |
|
|
```json |
|
|
{ |
|
|
"at_k": 10, |
|
|
"always_rerank_positives": true |
|
|
} |
|
|
``` |
|
|
|
|
|
| Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 | |
|
|
|:------------|:---------------------|:---------------------|:---------------------| |
|
|
| map | 0.5989 (+0.1094) | 0.3535 (+0.0925) | 0.6692 (+0.2496) | |
|
|
| mrr@10 | 0.5889 (+0.1114) | 0.5271 (+0.0272) | 0.6896 (+0.2629) | |
|
|
| **ndcg@10** | **0.6445 (+0.1041)** | **0.3808 (+0.0558)** | **0.7157 (+0.2151)** | |
|
|
|
|
|
#### Cross Encoder Nano BEIR |
|
|
|
|
|
* Dataset: `NanoBEIR_R100_mean` |
|
|
* Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters: |
|
|
```json |
|
|
{ |
|
|
"dataset_names": [ |
|
|
"msmarco", |
|
|
"nfcorpus", |
|
|
"nq" |
|
|
], |
|
|
"rerank_k": 100, |
|
|
"at_k": 10, |
|
|
"always_rerank_positives": true |
|
|
} |
|
|
``` |
|
|
|
|
|
| Metric | Value | |
|
|
|:------------|:---------------------| |
|
|
| map | 0.5405 (+0.1505) | |
|
|
| mrr@10 | 0.6018 (+0.1338) | |
|
|
| **ndcg@10** | **0.5804 (+0.1250)** | |
|
|
|
|
|
<!-- |
|
|
## Bias, Risks and Limitations |
|
|
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
### Recommendations |
|
|
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
|
|
--> |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Dataset |
|
|
|
|
|
#### ms_marco |
|
|
|
|
|
* Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a) |
|
|
* Size: 78,704 training samples |
|
|
* Columns: <code>query</code>, <code>docs</code>, and <code>labels</code> |
|
|
* Approximate statistics based on the first 1000 samples: |
|
|
| | query | docs | labels | |
|
|
|:--------|:------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------| |
|
|
| type | string | list | list | |
|
|
| details | <ul><li>min: 11 characters</li><li>mean: 33.93 characters</li><li>max: 109 characters</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> | |
|
|
* Samples: |
|
|
| query | docs | labels | |
|
|
|:-------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------| |
|
|
| <code>Hemophilia is a group of different inherited blood-clotting disorders. Which is true about hemophilia</code> | <code>['Hemophilia is a hereditary bleeding disorder caused by a deficiency in one of two blood clotting factors: factor VIII or factor IX. Several different gene abnormalities can cause the disorder. People bleed unexpectedly or after minor injuries. ', 'Hemophilia is an inherited bleeding disorder that almost always affects males. A person with hemophilia has low or non-existent levels of blood clotting protein called factor. Coagulation factor is necessary for the clotting mechanism in our bodies to work. There are 13 blood clotting proteins (coagulation factor) along with platelets and fibrin necessary for clotting blood. Factor IX deficiency usually only manifests in males. Hemophilia C: This person has low levels of or is missing completely factor 11 (Also called FXI or factor XI deficiency) Hemophilia C is 10 times rarer than type A. Factor XI deficiency is different because it can show up in both males and females.', 'Hemophilia is a rare hereditary (inherited) bleeding disorder in w...</code> | <code>[1, 0, 0, 0, 0, ...]</code> | |
|
|
| <code>what is the meaning of nazia</code> | <code>['Show similar names Show variant names. Name Nazia generally means Princess or Queen, is of Indian origin, Name Nazia is a Feminine (or Girl) name. Person with name Nazia are mainly Muslim by religion. Name Nazia belongs to rashi Vrushik (Scorpio) with dominant planet Mars (Mangal) ', "Nazia's are very outgoing once you get to meet her,she's also a undercover freak so you gotta watch her. Nazia's are unique you can tell by the name, she yurn for attention and always wants to be in a relationship. Nazia's never like to be alone they love to be around people. They are loyal so once you meet one keep them. Nazia's are good friends once you proove to them your not fake. When you meet Nazia, You'll Love her. A beautiful girl! The name means 'Pride' so she is hardworking to bring that status to her family. All Nazia's are fantastic and they don't open up easily so you will have to give them some time.", '(viewable to Premium Members only). Below is a brief analysis of the first name only. F...</code> | <code>[1, 0, 0, 0, 0, ...]</code> | |
|
|
| <code>how injection moulding temperature affects polystyrene</code> | <code>['But melt temperature also has an influence on the final molecular weight of the polymer in the moulded part[3,4]. Keywords: Polymer nanocomposites, nano kaolin clay, injection moulding, moulding temperature. influence on the behaviour of the polymer are the 1. Material is fed into a heated barrel, mixed, and forced into a mould cavity where it cools and hardens to the configuration of the cavity[13]. In injection moulding, moulding conditions have a significant influence on the final properties of the material regardless of the part design.', 'It is very easy to forget that plastic melts are not thermally stable over long periods at, or above, melt temperature. Equally, it is as easy to forget that the molten mass is not impervious to the effects of shear. Plastic Melts are not Newtonian in their behaviour. That is they do not react in a linear fashion when exposed to shearing of the melt or changes in temperature. A Newtonian melt would show a straight line graph when plotted for sh...</code> | <code>[1, 0, 0, 0, 0, ...]</code> | |
|
|
* Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters: |
|
|
```json |
|
|
{ |
|
|
"activation_fn": "torch.nn.modules.linear.Identity", |
|
|
"mini_batch_size": 16 |
|
|
} |
|
|
``` |
|
|
|
|
|
### Evaluation Dataset |
|
|
|
|
|
#### ms_marco |
|
|
|
|
|
* Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a) |
|
|
* Size: 1,000 evaluation samples |
|
|
* Columns: <code>query</code>, <code>docs</code>, and <code>labels</code> |
|
|
* Approximate statistics based on the first 1000 samples: |
|
|
| | query | docs | labels | |
|
|
|:--------|:------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------| |
|
|
| type | string | list | list | |
|
|
| details | <ul><li>min: 11 characters</li><li>mean: 34.24 characters</li><li>max: 101 characters</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> | |
|
|
* Samples: |
|
|
| query | docs | labels | |
|
|
|:-------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------| |
|
|
| <code>how do you measure a horse in hands</code> | <code>['1 A hand is equal to 4 inches or 10.2cms. 2 You should measure your horse from the point of the withers to the ground. 3 A horse that is 61 inches tall is 15.1 hands or 15 hands and 1 inch or 15.1hh. 4 This is calculated using (61/4 = 15.25); the .25 is the decimal equivalent of one quarter and a quarter of 4 = 1; so 15.1hh.', '1 If a measuring tape is being used, conversion of the measurement from inches to hands is required. 2 One hand equals 4 inches (10.2 cm), so divide the measurement by 4. 3 For example, if the horse measures 71 inches (180.3 cm), divide 71 by 4 inches. 4 The result is 17 hands with 3 inches (7.6 cm) left over.', 'Record the measurement. 1 If the horse measuring stick is being used, then the measurement can be recorded in hands immediately. 2 If a measuring tape is being used, conversion of the measurement from inches to hands is required. 3 One hand equals 4 inches (10.2 cm), so divide the measurement by 4.', 'After you have measured your horse you wi...</code> | <code>[1, 0, 0, 0, 0, ...]</code> | |
|
|
| <code>where is amsterdam located</code> | <code>["Amsterdam is located in the western Netherlands, in the province of North Holland. The river Amstel terminates in the city centre and connects to a large number of canals that eventually terminate in the IJ. Amsterdam is situated 2 metres below sea level. The surrounding land is flat as it is formed of large polders. Amsterdam's main attractions, including its historic canals, the Rijksmuseum, the Van Gogh Museum, Stedelijk Museum, Hermitage Amsterdam, Anne Frank House, Amsterdam Museum, its red-light district, and its many cannabis coffee shops draw more than 5 million international visitors annually.", 'The Netherlands is bordered by Belgium in the South, Germany in the East and the Northsea in the North and West. Amsterdam is located in the South of the province of North Holland: Amsterdam Facts. 1 Amsterdam is the largest city in the Netherlands. 2 Amsterdam is the capital of the Netherlands (while The Hague is the seat of government). 3 Amsterdam is the financial and cultural...</code> | <code>[1, 0, 0, 0, 0, ...]</code> | |
|
|
| <code>what does affected mean</code> | <code>['Effected means executed, produced, or brought about. For example, The dictatorial regime quickly effected changes to the constitution that restricted the freedom of the people. On the other hand, affected means made an impact on. It is the past tense of the verb form of affect, which means to impact.', 'Meaning of Affect and Effect. In order to understand the correct situation in which to use the word affect or effect, the first thing one must do is have a clear understanding of what each word means. 1 Affect is a verb. 2 It means to produce a change in or influence something. 3 Effect is a noun that can also be used as a verb.', "affect 2 is not used as a noun; as a verb it means “to pretend” or “to assume” (new students affecting a nonchalance they didn't feel). The verb effect means “to bring about, accomplish”: Her administration effected radical changes. The noun effect means “result, consequence”: the serious effects of the oil spill.", 'Affect means to have an influence on something. Affect is normally a verb. Effect is the result of an influence or change. Effect is normally a noun. They are related in t … hat when something affects something else, it produces an effect on it. The word affect has a noun meaning related to psychology and emotion. The word effect has a verb meaning, which is to create, bring about, or institute.', 'In order to understand the correct situation in which to use the word affect or effect, the first thing one must do is have a clear understanding of what each word means. 1 Affect is a verb. 2 It means to produce a change in or influence something. 3 Effect is a noun that can also be used as a verb.']</code> | <code>[1, 0, 0, 0, 0]</code> | |
|
|
* Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters: |
|
|
```json |
|
|
{ |
|
|
"activation_fn": "torch.nn.modules.linear.Identity", |
|
|
"mini_batch_size": 16 |
|
|
} |
|
|
``` |
|
|
|
|
|
### Training Hyperparameters |
|
|
#### Non-Default Hyperparameters |
|
|
|
|
|
- `eval_strategy`: steps |
|
|
- `per_device_train_batch_size`: 16 |
|
|
- `per_device_eval_batch_size`: 16 |
|
|
- `learning_rate`: 2e-05 |
|
|
- `num_train_epochs`: 1 |
|
|
- `seed`: 12 |
|
|
- `bf16`: True |
|
|
- `load_best_model_at_end`: True |
|
|
|
|
|
#### All Hyperparameters |
|
|
<details><summary>Click to expand</summary> |
|
|
|
|
|
- `overwrite_output_dir`: False |
|
|
- `do_predict`: False |
|
|
- `eval_strategy`: steps |
|
|
- `prediction_loss_only`: True |
|
|
- `per_device_train_batch_size`: 16 |
|
|
- `per_device_eval_batch_size`: 16 |
|
|
- `per_gpu_train_batch_size`: None |
|
|
- `per_gpu_eval_batch_size`: None |
|
|
- `gradient_accumulation_steps`: 1 |
|
|
- `eval_accumulation_steps`: None |
|
|
- `torch_empty_cache_steps`: None |
|
|
- `learning_rate`: 2e-05 |
|
|
- `weight_decay`: 0.0 |
|
|
- `adam_beta1`: 0.9 |
|
|
- `adam_beta2`: 0.999 |
|
|
- `adam_epsilon`: 1e-08 |
|
|
- `max_grad_norm`: 1.0 |
|
|
- `num_train_epochs`: 1 |
|
|
- `max_steps`: -1 |
|
|
- `lr_scheduler_type`: linear |
|
|
- `lr_scheduler_kwargs`: {} |
|
|
- `warmup_ratio`: 0.0 |
|
|
- `warmup_steps`: 0 |
|
|
- `log_level`: passive |
|
|
- `log_level_replica`: warning |
|
|
- `log_on_each_node`: True |
|
|
- `logging_nan_inf_filter`: True |
|
|
- `save_safetensors`: True |
|
|
- `save_on_each_node`: False |
|
|
- `save_only_model`: False |
|
|
- `restore_callback_states_from_checkpoint`: False |
|
|
- `no_cuda`: False |
|
|
- `use_cpu`: False |
|
|
- `use_mps_device`: False |
|
|
- `seed`: 12 |
|
|
- `data_seed`: None |
|
|
- `jit_mode_eval`: False |
|
|
- `use_ipex`: False |
|
|
- `bf16`: True |
|
|
- `fp16`: False |
|
|
- `fp16_opt_level`: O1 |
|
|
- `half_precision_backend`: auto |
|
|
- `bf16_full_eval`: False |
|
|
- `fp16_full_eval`: False |
|
|
- `tf32`: None |
|
|
- `local_rank`: 0 |
|
|
- `ddp_backend`: None |
|
|
- `tpu_num_cores`: None |
|
|
- `tpu_metrics_debug`: False |
|
|
- `debug`: [] |
|
|
- `dataloader_drop_last`: False |
|
|
- `dataloader_num_workers`: 0 |
|
|
- `dataloader_prefetch_factor`: None |
|
|
- `past_index`: -1 |
|
|
- `disable_tqdm`: False |
|
|
- `remove_unused_columns`: True |
|
|
- `label_names`: None |
|
|
- `load_best_model_at_end`: True |
|
|
- `ignore_data_skip`: False |
|
|
- `fsdp`: [] |
|
|
- `fsdp_min_num_params`: 0 |
|
|
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
|
- `fsdp_transformer_layer_cls_to_wrap`: None |
|
|
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
|
|
- `parallelism_config`: None |
|
|
- `deepspeed`: None |
|
|
- `label_smoothing_factor`: 0.0 |
|
|
- `optim`: adamw_torch_fused |
|
|
- `optim_args`: None |
|
|
- `adafactor`: False |
|
|
- `group_by_length`: False |
|
|
- `length_column_name`: length |
|
|
- `ddp_find_unused_parameters`: None |
|
|
- `ddp_bucket_cap_mb`: None |
|
|
- `ddp_broadcast_buffers`: False |
|
|
- `dataloader_pin_memory`: True |
|
|
- `dataloader_persistent_workers`: False |
|
|
- `skip_memory_metrics`: True |
|
|
- `use_legacy_prediction_loop`: False |
|
|
- `push_to_hub`: False |
|
|
- `resume_from_checkpoint`: None |
|
|
- `hub_model_id`: None |
|
|
- `hub_strategy`: every_save |
|
|
- `hub_private_repo`: None |
|
|
- `hub_always_push`: False |
|
|
- `hub_revision`: None |
|
|
- `gradient_checkpointing`: False |
|
|
- `gradient_checkpointing_kwargs`: None |
|
|
- `include_inputs_for_metrics`: False |
|
|
- `include_for_metrics`: [] |
|
|
- `eval_do_concat_batches`: True |
|
|
- `fp16_backend`: auto |
|
|
- `push_to_hub_model_id`: None |
|
|
- `push_to_hub_organization`: None |
|
|
- `mp_parameters`: |
|
|
- `auto_find_batch_size`: False |
|
|
- `full_determinism`: False |
|
|
- `torchdynamo`: None |
|
|
- `ray_scope`: last |
|
|
- `ddp_timeout`: 1800 |
|
|
- `torch_compile`: False |
|
|
- `torch_compile_backend`: None |
|
|
- `torch_compile_mode`: None |
|
|
- `include_tokens_per_second`: False |
|
|
- `include_num_input_tokens_seen`: False |
|
|
- `neftune_noise_alpha`: None |
|
|
- `optim_target_modules`: None |
|
|
- `batch_eval_metrics`: False |
|
|
- `eval_on_start`: False |
|
|
- `use_liger_kernel`: False |
|
|
- `liger_kernel_config`: None |
|
|
- `eval_use_gather_object`: False |
|
|
- `average_tokens_across_devices`: False |
|
|
- `prompts`: None |
|
|
- `batch_sampler`: batch_sampler |
|
|
- `multi_dataset_batch_sampler`: proportional |
|
|
- `router_mapping`: {} |
|
|
- `learning_rate_mapping`: {} |
|
|
|
|
|
</details> |
|
|
|
|
|
### Training Logs |
|
|
| Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 | |
|
|
|:----------:|:--------:|:-------------:|:---------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:| |
|
|
| -1 | -1 | - | - | 0.0000 (-0.5404) | 0.2648 (-0.0602) | 0.0388 (-0.4618) | 0.1012 (-0.3541) | |
|
|
| 0.0002 | 1 | 2.3028 | - | - | - | - | - | |
|
|
| 0.0203 | 100 | 2.0955 | 2.0679 | 0.3022 (-0.2382) | 0.2808 (-0.0442) | 0.4762 (-0.0244) | 0.3531 (-0.1023) | |
|
|
| 0.0407 | 200 | 2.0633 | 2.0643 | 0.5733 (+0.0329) | 0.3362 (+0.0112) | 0.6797 (+0.1790) | 0.5297 (+0.0743) | |
|
|
| 0.0610 | 300 | 2.0738 | 2.0616 | 0.5738 (+0.0334) | 0.3480 (+0.0230) | 0.6018 (+0.1011) | 0.5079 (+0.0525) | |
|
|
| 0.0813 | 400 | 2.0679 | 2.0617 | 0.5441 (+0.0036) | 0.3162 (-0.0088) | 0.6688 (+0.1681) | 0.5097 (+0.0543) | |
|
|
| 0.1016 | 500 | 2.0702 | 2.0619 | 0.5566 (+0.0161) | 0.3423 (+0.0172) | 0.6932 (+0.1925) | 0.5307 (+0.0753) | |
|
|
| 0.1220 | 600 | 2.0719 | 2.0602 | 0.5583 (+0.0179) | 0.3643 (+0.0392) | 0.7066 (+0.2060) | 0.5431 (+0.0877) | |
|
|
| 0.1423 | 700 | 2.066 | 2.0600 | 0.5792 (+0.0388) | 0.3470 (+0.0219) | 0.6971 (+0.1965) | 0.5411 (+0.0857) | |
|
|
| 0.1626 | 800 | 2.0704 | 2.0595 | 0.5980 (+0.0576) | 0.3493 (+0.0243) | 0.6749 (+0.1743) | 0.5407 (+0.0854) | |
|
|
| 0.1830 | 900 | 2.0804 | 2.0596 | 0.6080 (+0.0675) | 0.3557 (+0.0307) | 0.6314 (+0.1307) | 0.5317 (+0.0763) | |
|
|
| 0.2033 | 1000 | 2.0697 | 2.0590 | 0.5992 (+0.0587) | 0.3262 (+0.0012) | 0.7125 (+0.2119) | 0.5460 (+0.0906) | |
|
|
| 0.2236 | 1100 | 2.0756 | 2.0597 | 0.6133 (+0.0729) | 0.3890 (+0.0639) | 0.6932 (+0.1926) | 0.5652 (+0.1098) | |
|
|
| 0.2440 | 1200 | 2.0761 | 2.0592 | 0.5937 (+0.0533) | 0.3614 (+0.0363) | 0.6783 (+0.1776) | 0.5445 (+0.0891) | |
|
|
| 0.2643 | 1300 | 2.0688 | 2.0587 | 0.5865 (+0.0461) | 0.3562 (+0.0312) | 0.6863 (+0.1856) | 0.5430 (+0.0876) | |
|
|
| 0.2846 | 1400 | 2.0622 | 2.0588 | 0.6190 (+0.0786) | 0.3610 (+0.0360) | 0.6717 (+0.1710) | 0.5506 (+0.0952) | |
|
|
| 0.3049 | 1500 | 2.0674 | 2.0589 | 0.6331 (+0.0926) | 0.3719 (+0.0469) | 0.7195 (+0.2189) | 0.5748 (+0.1195) | |
|
|
| 0.3253 | 1600 | 2.0731 | 2.0590 | 0.6194 (+0.0790) | 0.3777 (+0.0527) | 0.6719 (+0.1713) | 0.5564 (+0.1010) | |
|
|
| 0.3456 | 1700 | 2.0607 | 2.0589 | 0.5792 (+0.0388) | 0.3991 (+0.0740) | 0.6850 (+0.1843) | 0.5544 (+0.0991) | |
|
|
| 0.3659 | 1800 | 2.0716 | 2.0593 | 0.6400 (+0.0996) | 0.3810 (+0.0560) | 0.7093 (+0.2087) | 0.5768 (+0.1214) | |
|
|
| 0.3863 | 1900 | 2.065 | 2.0587 | 0.6490 (+0.1086) | 0.3732 (+0.0481) | 0.6862 (+0.1855) | 0.5694 (+0.1141) | |
|
|
| 0.4066 | 2000 | 2.0716 | 2.0588 | 0.6336 (+0.0932) | 0.3676 (+0.0426) | 0.7023 (+0.2016) | 0.5678 (+0.1125) | |
|
|
| 0.4269 | 2100 | 2.0755 | 2.0592 | 0.6227 (+0.0823) | 0.3789 (+0.0539) | 0.6523 (+0.1517) | 0.5513 (+0.0959) | |
|
|
| 0.4472 | 2200 | 2.0621 | 2.0587 | 0.6296 (+0.0892) | 0.3543 (+0.0292) | 0.6721 (+0.1714) | 0.5520 (+0.0966) | |
|
|
| 0.4676 | 2300 | 2.0733 | 2.0587 | 0.6452 (+0.1048) | 0.3677 (+0.0427) | 0.6939 (+0.1932) | 0.5689 (+0.1136) | |
|
|
| 0.4879 | 2400 | 2.0735 | 2.0581 | 0.6360 (+0.0956) | 0.3491 (+0.0240) | 0.6830 (+0.1824) | 0.5560 (+0.1007) | |
|
|
| 0.5082 | 2500 | 2.0681 | 2.0582 | 0.6328 (+0.0924) | 0.3443 (+0.0193) | 0.6792 (+0.1785) | 0.5521 (+0.0967) | |
|
|
| 0.5286 | 2600 | 2.0741 | 2.0582 | 0.6618 (+0.1214) | 0.3536 (+0.0286) | 0.6812 (+0.1806) | 0.5655 (+0.1102) | |
|
|
| 0.5489 | 2700 | 2.067 | 2.0587 | 0.6611 (+0.1207) | 0.3726 (+0.0476) | 0.6826 (+0.1819) | 0.5721 (+0.1167) | |
|
|
| 0.5692 | 2800 | 2.0706 | 2.0579 | 0.6627 (+0.1223) | 0.3736 (+0.0486) | 0.6843 (+0.1836) | 0.5735 (+0.1182) | |
|
|
| 0.5896 | 2900 | 2.0632 | 2.0580 | 0.6426 (+0.1022) | 0.3788 (+0.0538) | 0.6940 (+0.1933) | 0.5718 (+0.1164) | |
|
|
| **0.6099** | **3000** | **2.0773** | **2.0582** | **0.6445 (+0.1041)** | **0.3808 (+0.0558)** | **0.7157 (+0.2151)** | **0.5804 (+0.1250)** | |
|
|
| 0.6302 | 3100 | 2.071 | 2.0583 | 0.6354 (+0.0950) | 0.3810 (+0.0559) | 0.6792 (+0.1785) | 0.5652 (+0.1098) | |
|
|
| 0.6505 | 3200 | 2.0678 | 2.0579 | 0.6224 (+0.0820) | 0.3753 (+0.0502) | 0.6622 (+0.1615) | 0.5533 (+0.0979) | |
|
|
| 0.6709 | 3300 | 2.066 | 2.0577 | 0.6658 (+0.1254) | 0.3761 (+0.0510) | 0.6742 (+0.1735) | 0.5720 (+0.1166) | |
|
|
| 0.6912 | 3400 | 2.065 | 2.0577 | 0.6525 (+0.1121) | 0.3750 (+0.0500) | 0.6760 (+0.1754) | 0.5678 (+0.1125) | |
|
|
| 0.7115 | 3500 | 2.072 | 2.0580 | 0.6296 (+0.0892) | 0.3553 (+0.0303) | 0.6632 (+0.1625) | 0.5494 (+0.0940) | |
|
|
| 0.7319 | 3600 | 2.065 | 2.0580 | 0.6223 (+0.0818) | 0.3638 (+0.0387) | 0.6762 (+0.1756) | 0.5541 (+0.0987) | |
|
|
| 0.7522 | 3700 | 2.0633 | 2.0574 | 0.6400 (+0.0996) | 0.3718 (+0.0468) | 0.6643 (+0.1637) | 0.5587 (+0.1034) | |
|
|
| 0.7725 | 3800 | 2.0655 | 2.0576 | 0.6476 (+0.1072) | 0.3882 (+0.0632) | 0.7001 (+0.1994) | 0.5786 (+0.1233) | |
|
|
| 0.7928 | 3900 | 2.0703 | 2.0572 | 0.6385 (+0.0981) | 0.3848 (+0.0597) | 0.6705 (+0.1698) | 0.5646 (+0.1092) | |
|
|
| 0.8132 | 4000 | 2.0741 | 2.0572 | 0.6266 (+0.0862) | 0.3614 (+0.0364) | 0.6759 (+0.1752) | 0.5546 (+0.0993) | |
|
|
| 0.8335 | 4100 | 2.058 | 2.0574 | 0.6330 (+0.0925) | 0.3750 (+0.0500) | 0.6600 (+0.1593) | 0.5560 (+0.1006) | |
|
|
| 0.8538 | 4200 | 2.0758 | 2.0574 | 0.6450 (+0.1046) | 0.3774 (+0.0524) | 0.6796 (+0.1789) | 0.5673 (+0.1120) | |
|
|
| 0.8742 | 4300 | 2.0648 | 2.0572 | 0.6261 (+0.0857) | 0.3681 (+0.0430) | 0.6796 (+0.1789) | 0.5579 (+0.1025) | |
|
|
| 0.8945 | 4400 | 2.0647 | 2.0573 | 0.6377 (+0.0973) | 0.3724 (+0.0473) | 0.6523 (+0.1517) | 0.5541 (+0.0988) | |
|
|
| 0.9148 | 4500 | 2.0634 | 2.0570 | 0.6412 (+0.1008) | 0.3738 (+0.0488) | 0.6917 (+0.1911) | 0.5689 (+0.1136) | |
|
|
| 0.9351 | 4600 | 2.0675 | 2.0570 | 0.6426 (+0.1022) | 0.3819 (+0.0569) | 0.6875 (+0.1869) | 0.5707 (+0.1153) | |
|
|
| 0.9555 | 4700 | 2.061 | 2.0570 | 0.6428 (+0.1024) | 0.3884 (+0.0634) | 0.6929 (+0.1923) | 0.5747 (+0.1194) | |
|
|
| 0.9758 | 4800 | 2.0652 | 2.0571 | 0.6462 (+0.1058) | 0.3892 (+0.0641) | 0.6933 (+0.1927) | 0.5763 (+0.1209) | |
|
|
| 0.9961 | 4900 | 2.0636 | 2.0571 | 0.6489 (+0.1084) | 0.3896 (+0.0645) | 0.6889 (+0.1883) | 0.5758 (+0.1204) | |
|
|
| -1 | -1 | - | - | 0.6445 (+0.1041) | 0.3808 (+0.0558) | 0.7157 (+0.2151) | 0.5804 (+0.1250) | |
|
|
|
|
|
* The bold row denotes the saved checkpoint. |
|
|
|
|
|
### Framework Versions |
|
|
- Python: 3.9.18 |
|
|
- Sentence Transformers: 5.1.1 |
|
|
- Transformers: 4.56.2 |
|
|
- PyTorch: 2.8.0+cu128 |
|
|
- Accelerate: 1.10.1 |
|
|
- Datasets: 4.1.1 |
|
|
- Tokenizers: 0.22.1 |
|
|
|
|
|
## Citation |
|
|
|
|
|
### BibTeX |
|
|
|
|
|
#### Sentence Transformers |
|
|
```bibtex |
|
|
@inproceedings{reimers-2019-sentence-bert, |
|
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
|
month = "11", |
|
|
year = "2019", |
|
|
publisher = "Association for Computational Linguistics", |
|
|
url = "https://arxiv.org/abs/1908.10084", |
|
|
} |
|
|
``` |
|
|
|
|
|
#### ListNetLoss |
|
|
```bibtex |
|
|
@inproceedings{cao2007learning, |
|
|
title={Learning to Rank: From Pairwise Approach to Listwise Approach}, |
|
|
author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang}, |
|
|
booktitle={Proceedings of the 24th international conference on Machine learning}, |
|
|
pages={129--136}, |
|
|
year={2007} |
|
|
} |
|
|
``` |
|
|
|
|
|
<!-- |
|
|
## Glossary |
|
|
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
## Model Card Authors |
|
|
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
## Model Card Contact |
|
|
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
|
--> |