## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Maximum Sequence Length:** 32768 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 32768, 'do_lower_case': False}) with Transformer model: Qwen2Model (1): Pooling({'word_embedding_dimension': 4096, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer( "", model_kwargs={"attn_implementation": "flash_attention_2", "torch_dtype": torch.bfloat16} ) # Run inference documents = [ 'Your Upcoming Stay at Park Tower Knightsbridge\n\n\r\n[cid:image001.png@01DA125C.D84FF1A0]\r\n\r\n\r\nDear Abdulla Alhassani,\r\n\r\n\r\n\r\nThank you for choosing The Park Tower Knightsbridge, A Luxury Collection Hotel for your upcoming visit! We are looking forward to welcoming you to the hotel and\r\n\r\nwould like to prepare for your arrival with any special requests you may have. A few details from you, can help us best prepare and make your stay as memorable as possible.\r\n\r\nPlease share with us your estimated arrival time and let us know if we may assist you in booking a private car or taxi service to get to the hotel.\r\n\r\nEmail us here to book your car now.\r\n\r\n\r\n\r\nYOUR RESERVATION\r\n\r\nARRIVAL: 11/11/2023\r\n\r\nDEPARTURE: 11/22/2023\r\n\r\nCONFIRMATION NUMBER: 95476045\r\n\r\n\r\n\r\n*IF TRAVELLING WITH CHILDREN PLEASE CONFIRM THEIR AGES\r\n\r\n\r\n\r\n\r\n\r\n[cid:image002.png@01DA125C.D84FF1A0][cid:image003.png@01DA125C.D84FF1A0][cid:image004.png@01DA125C.D84FF1A0]\r\n\r\n\r\n[cid:image005.png@01DA125C.D84FF1A0]\r\n[cid:image006.png@01DA125C.D84FF1A0][cid:image007.png@01DA125C.D84FF1A0]\r\n[cid:image008.png@01DA125C.D84FF1A0]\n', ] categories = [ "Email category: 'Hotel -- Additional request of arrival time'. Email category description: 'A request from the hotel asking for the client to provide the exact or approximate check-in/arrival time as this is requested by the hotel due to different reasons. For example, the hotel does not have 24 hour reception and for this reason is asking for the arrival time. Information about the check-in helps the hotel better prepare for the guest's arrival and plan the schedule of the hotel staff.'", "Email category: 'Hotel -- Content request '. Email category description: 'This is an email from a hotelier who has seen photos of the hotel where they work and requests that certain photos be removed, changed, or new ones added. It could also be a request to modify the description of a particular facility or service at the hotel, such as information about type of meals, the deposit amount, or parking facilities. These are important letters that help us keep the hotel photo gallery on the website up to date, ensuring that guests can be confident in what they are booking when looking at the photos. We send such letters to the content department.'", ] document_embeddings = model.encode(sentences) category_embeddings = model.encode(sentences) print(document_embeddings.shape) # [1, 4096] print(category_embeddings.shape) # [2, 4096] # Get the similarity scores for the embeddings similarities = model.similarity(document_embeddings, category_embeddings) ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 6 - `per_device_eval_batch_size`: 6 - `learning_rate`: 2e-06 - `num_train_epochs`: 1 - `warmup_ratio`: 0.1 - `bf16`: True - `load_best_model_at_end`: True - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 6 - `per_device_eval_batch_size`: 6 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `learning_rate`: 2e-06 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 1 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: True - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | loss | |:------:|:----:|:-------------:|:------:| | 0.0252 | 50 | 0.5621 | - | | 0.0504 | 100 | 0.6789 | - | | 0.0755 | 150 | 0.7126 | - | | 0.1007 | 200 | 0.6461 | 0.1758 | | 0.1259 | 250 | 0.3928 | - | | 0.1511 | 300 | 0.3786 | - | | 0.1762 | 350 | 0.4105 | - | | 0.2014 | 400 | 0.3354 | 0.1420 | | 0.2266 | 450 | 0.327 | - | | 0.2518 | 500 | 0.2494 | - | | 0.2769 | 550 | 0.1773 | - | | 0.3021 | 600 | 0.1215 | 0.1241 | | 0.3273 | 650 | 0.2426 | - | | 0.3525 | 700 | 0.2279 | - | | 0.3776 | 750 | 0.2151 | - | | 0.4028 | 800 | 0.2676 | 0.1216 | | 0.4280 | 850 | 0.2645 | - | | 0.4532 | 900 | 0.2491 | - | | 0.4783 | 950 | 0.2945 | - | | 0.5035 | 1000 | 0.1859 | 0.1206 | | 0.5287 | 1050 | 0.2401 | - | | 0.5539 | 1100 | 0.2154 | - | | 0.5791 | 1150 | 0.1731 | - | | 0.6042 | 1200 | 0.1942 | 0.1196 | | 0.6294 | 1250 | 0.2643 | - | | 0.6546 | 1300 | 0.1806 | - | | 0.6798 | 1350 | 0.1609 | - | | 0.7049 | 1400 | 0.1008 | 0.1187 | ### Framework Versions - Python: 3.10.12 - Sentence Transformers: 3.0.1 - Transformers: 4.42.4 - PyTorch: 2.2.0+cu121 - Accelerate: 0.33.0 - Datasets: 2.20.0 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### CachedMultipleNegativesRankingLoss ```bibtex @misc{gao2021scaling, title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup}, author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan}, year={2021}, eprint={2101.06983}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```