## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Maximum Sequence Length:** 32768 tokens
- **Similarity Function:** Cosine Similarity

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 32768, 'do_lower_case': False}) with Transformer model: Qwen2Model 
  (1): Pooling({'word_embedding_dimension': 4096, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer(
    "<model_name>",
    model_kwargs={"attn_implementation": "flash_attention_2", "torch_dtype": torch.bfloat16}
)

# Run inference
documents = [
    'Your Upcoming Stay at Park Tower Knightsbridge\n\n\r\n[cid:image001.png@01DA125C.D84FF1A0]\r\n\r\n\r\nDear Abdulla Alhassani,\r\n\r\n\r\n\r\nThank you for choosing The Park Tower Knightsbridge, A Luxury Collection Hotel for your upcoming visit! We are looking forward to welcoming you to the hotel and\r\n\r\nwould like to prepare for your arrival with any special requests you may have. A few details from you, can help us best prepare and make your stay as memorable as possible.\r\n\r\nPlease share with us your estimated arrival time and let us know if we may assist you in booking a private car or taxi service to get to the hotel.\r\n\r\nEmail us here to book your car now.\r\n\r\n\r\n\r\nYOUR RESERVATION\r\n\r\nARRIVAL: 11/11/2023\r\n\r\nDEPARTURE: 11/22/2023\r\n\r\nCONFIRMATION NUMBER: 95476045\r\n\r\n\r\n\r\n*IF TRAVELLING WITH CHILDREN PLEASE CONFIRM THEIR AGES\r\n\r\n\r\n\r\n\r\n\r\n[cid:image002.png@01DA125C.D84FF1A0][cid:image003.png@01DA125C.D84FF1A0][cid:image004.png@01DA125C.D84FF1A0]\r\n\r\n\r\n[cid:image005.png@01DA125C.D84FF1A0]\r\n[cid:image006.png@01DA125C.D84FF1A0][cid:image007.png@01DA125C.D84FF1A0]\r\n[cid:image008.png@01DA125C.D84FF1A0]\n',
]

categories = [
    "Email category: 'Hotel -- Additional request of arrival time'. Email category description: 'A request from the hotel asking for the client to provide the exact or approximate check-in/arrival time as this is requested by the hotel due to different reasons. For example, the hotel does not have 24 hour reception and for this reason is asking for the arrival time. Information about the check-in helps the hotel better prepare for the guest's arrival and plan the schedule of the hotel staff.'",
    "Email category: 'Hotel -- Content request '. Email category description: 'This is an email from a hotelier who has seen photos of the hotel where they work and requests that certain photos be removed, changed, or new ones added. It could also be a request to modify the description of a particular facility or service at the hotel, such as information about type of meals, the deposit amount, or parking facilities. These are important letters that help us keep the hotel photo gallery on the website up to date, ensuring that guests can be confident in what they are booking when looking at the photos. We send such letters to the content department.'",
]

document_embeddings = model.encode(sentences)
category_embeddings = model.encode(sentences)

print(document_embeddings.shape)
# [1, 4096]

print(category_embeddings.shape)
# [2, 4096]

# Get the similarity scores for the embeddings
similarities = model.similarity(document_embeddings, category_embeddings)
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: steps
- `per_device_train_batch_size`: 6
- `per_device_eval_batch_size`: 6
- `learning_rate`: 2e-06
- `num_train_epochs`: 1
- `warmup_ratio`: 0.1
- `bf16`: True
- `load_best_model_at_end`: True
- `batch_sampler`: no_duplicates

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 6
- `per_device_eval_batch_size`: 6
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `learning_rate`: 2e-06
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 1
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: False
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
| Epoch  | Step | Training Loss | loss   |
|:------:|:----:|:-------------:|:------:|
| 0.0252 | 50   | 0.5621        | -      |
| 0.0504 | 100  | 0.6789        | -      |
| 0.0755 | 150  | 0.7126        | -      |
| 0.1007 | 200  | 0.6461        | 0.1758 |
| 0.1259 | 250  | 0.3928        | -      |
| 0.1511 | 300  | 0.3786        | -      |
| 0.1762 | 350  | 0.4105        | -      |
| 0.2014 | 400  | 0.3354        | 0.1420 |
| 0.2266 | 450  | 0.327         | -      |
| 0.2518 | 500  | 0.2494        | -      |
| 0.2769 | 550  | 0.1773        | -      |
| 0.3021 | 600  | 0.1215        | 0.1241 |
| 0.3273 | 650  | 0.2426        | -      |
| 0.3525 | 700  | 0.2279        | -      |
| 0.3776 | 750  | 0.2151        | -      |
| 0.4028 | 800  | 0.2676        | 0.1216 |
| 0.4280 | 850  | 0.2645        | -      |
| 0.4532 | 900  | 0.2491        | -      |
| 0.4783 | 950  | 0.2945        | -      |
| 0.5035 | 1000 | 0.1859        | 0.1206 |
| 0.5287 | 1050 | 0.2401        | -      |
| 0.5539 | 1100 | 0.2154        | -      |
| 0.5791 | 1150 | 0.1731        | -      |
| 0.6042 | 1200 | 0.1942        | 0.1196 |
| 0.6294 | 1250 | 0.2643        | -      |
| 0.6546 | 1300 | 0.1806        | -      |
| 0.6798 | 1350 | 0.1609        | -      |
| 0.7049 | 1400 | 0.1008        | 0.1187 |


### Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.42.4
- PyTorch: 2.2.0+cu121
- Accelerate: 0.33.0
- Datasets: 2.20.0
- Tokenizers: 0.19.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### CachedMultipleNegativesRankingLoss
```bibtex
@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup}, 
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->