SentenceTransformer based on BAAI/bge-base-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/emb-bge-base-en-v1.5-squad-9-epochs")
# Run inference
sentences = [
'What is a WISP? ',
'A wireless Internet service provider (WISP) is an Internet service provider with a network based on wireless networking. Technology may include commonplace Wi-Fi wireless mesh networking, or proprietary equipment designed to operate over open 900 MHz, 2.4 GHz, 4.9, 5.2, 5.4, 5.7, and 5.8 GHz bands or licensed frequencies such as 2.5 GHz (EBS/BRS), 3.65 GHz (NN) and in the UHF band (including the MMDS frequency band) and LMDS.[citation needed]',
'A wireless Internet service provider (WISP) is an Internet service provider with a network based on wireless networking. Technology may include commonplace Wi-Fi wireless mesh networking, or proprietary equipment designed to operate over open 900 MHz, 2.4 GHz, 4.9, 5.2, 5.4, 5.7, and 5.8 GHz bands or licensed frequencies such as 2.5 GHz (EBS/BRS), 3.65 GHz (NN) and in the UHF band (including the MMDS frequency band) and LMDS.[citation needed]',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Triplet
- Dataset:
gooqa-dev - Evaluated with
TripletEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy | 0.4112 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 44,284 training samples
- Columns:
question,context, andnegative - Approximate statistics based on the first 1000 samples:
question context negative type string string string details - min: 6 tokens
- mean: 14.32 tokens
- max: 41 tokens
- min: 29 tokens
- mean: 153.05 tokens
- max: 481 tokens
- min: 31 tokens
- mean: 156.11 tokens
- max: 486 tokens
- Samples:
question context negative What is the state song of New York?I Love New York (stylized I ❤ NY) is both a logo and a song that are the basis of an advertising campaign and have been used since 1977 to promote tourism in New York City, and later to promote New York State as well. The trademarked logo, owned by New York State Empire State Development, appears in souvenir shops and brochures throughout the city and state, some licensed, many not. The song is the state song of New York.New York—often called New York City or the City of New York to distinguish it from the State of New York, of which it is a part—is the most populous city in the United States and the center of the New York metropolitan area, the premier gateway for legal immigration to the United States and one of the most populous urban agglomerations in the world. A global power city, New York exerts a significant impact upon commerce, finance, media, art, fashion, research, technology, education, and entertainment, its fast pace defining the term New York minute. Home to the headquarters of the United Nations, New York is an important center for international diplomacy and has been described as the cultural and financial capital of the world.What is the state song of New York?I Love New York (stylized I ❤ NY) is both a logo and a song that are the basis of an advertising campaign and have been used since 1977 to promote tourism in New York City, and later to promote New York State as well. The trademarked logo, owned by New York State Empire State Development, appears in souvenir shops and brochures throughout the city and state, some licensed, many not. The song is the state song of New York.New York—often called New York City or the City of New York to distinguish it from the State of New York, of which it is a part—is the most populous city in the United States and the center of the New York metropolitan area, the premier gateway for legal immigration to the United States and one of the most populous urban agglomerations in the world. A global power city, New York exerts a significant impact upon commerce, finance, media, art, fashion, research, technology, education, and entertainment, its fast pace defining the term New York minute. Home to the headquarters of the United Nations, New York is an important center for international diplomacy and has been described as the cultural and financial capital of the world.What was the mandated closing time of pubs in Knightsbridge in the 1960s?There was a special case established under the State Management Scheme where the brewery and licensed premises were bought and run by the state until 1973, most notably in Carlisle. During the 20th century elsewhere, both the licensing laws and enforcement were progressively relaxed, and there were differences between parishes; in the 1960s, at closing time in Kensington at 10:30 pm, drinkers would rush over the parish boundary to be in good time for "Last Orders" in Knightsbridge before 11 pm, a practice observed in many pubs adjoining licensing area boundaries. Some Scottish and Welsh parishes remained officially "dry" on Sundays (although often this merely required knocking at the back door of the pub). These restricted opening hours led to the tradition of lock-ins.Pubs that cater for a niche clientele, such as sports fans or people of certain nationalities are known as theme pubs. Examples of theme pubs include sports bars, rock pubs, biker pubs, Goth pubs, strip pubs, gay bars, karaoke bars and Irish pubs. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
Unnamed Dataset
- Size: 5,000 evaluation samples
- Columns:
question,context, andnegative_1 - Approximate statistics based on the first 1000 samples:
question context negative_1 type string string string details - min: 7 tokens
- mean: 14.61 tokens
- max: 39 tokens
- min: 28 tokens
- mean: 148.94 tokens
- max: 512 tokens
- min: 28 tokens
- mean: 147.0 tokens
- max: 486 tokens
- Samples:
question context negative_1 When the UK changes their clocks forward in the spring, what do they call the time they're then observing?The name of local time typically changes when DST is observed. American English replaces standard with daylight: for example, Pacific Standard Time (PST) becomes Pacific Daylight Time (PDT). In the United Kingdom, the standard term for UK time when advanced by one hour is British Summer Time (BST), and British English typically inserts summer into other time zone names, e.g. Central European Time (CET) becomes Central European Summer Time (CEST).Daylight saving time (DST) or summer time is the practice of advancing clocks during summer months by one hour so that in the evening daylight is experienced an hour longer, while sacrificing normal sunrise times. Typically, regions with summer time adjust clocks forward one hour close to the start of spring and adjust them backward in the autumn to standard time.When did the U.N. vote to adopt the Sustainable Development Goals?In September 2015, the 193 member states of the United Nations unanimously adopted the Sustainable Development Goals, a set of 17 goals aiming to transform the world over the next 15 years. These goals are designed to eliminate poverty, discrimination, abuse and preventable deaths, address environmental destruction, and usher in an era of development for all people, everywhere.In September 2015, the 193 member states of the United Nations unanimously adopted the Sustainable Development Goals, a set of 17 goals aiming to transform the world over the next 15 years. These goals are designed to eliminate poverty, discrimination, abuse and preventable deaths, address environmental destruction, and usher in an era of development for all people, everywhere.Where in India is Sanskrit still spoken by the population?Samskrita Bharati is an organisation working for Sanskrit revival. The "All-India Sanskrit Festival" (since 2002) holds composition contests. The 1991 Indian census reported 49,736 fluent speakers of Sanskrit. Sanskrit learning programmes also feature on the lists of most AIR broadcasting centres. The Mattur village in central Karnataka claims to have native speakers of Sanskrit among its population. Inhabitants of all castes learn Sanskrit starting in childhood and converse in the language. Even the local Muslims converse in Sanskrit. Historically, the village was given by king Krishnadevaraya of the Vijayanagara Empire to Vedic scholars and their families, while people in his kingdom spoke Kannada and Telugu. Another effort concentrates on preserving and passing along the oral tradition of the Vedas, www.shrivedabharathi.in is one such organisation based out of Hyderabad that has been digitising the Vedas by recording recitations of Vedic Pandits.Samskrita Bharati is an organisation working for Sanskrit revival. The "All-India Sanskrit Festival" (since 2002) holds composition contests. The 1991 Indian census reported 49,736 fluent speakers of Sanskrit. Sanskrit learning programmes also feature on the lists of most AIR broadcasting centres. The Mattur village in central Karnataka claims to have native speakers of Sanskrit among its population. Inhabitants of all castes learn Sanskrit starting in childhood and converse in the language. Even the local Muslims converse in Sanskrit. Historically, the village was given by king Krishnadevaraya of the Vijayanagara Empire to Vedic scholars and their families, while people in his kingdom spoke Kannada and Telugu. Another effort concentrates on preserving and passing along the oral tradition of the Vedas, www.shrivedabharathi.in is one such organisation based out of Hyderabad that has been digitising the Vedas by recording recitations of Vedic Pandits. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 128per_device_eval_batch_size: 128num_train_epochs: 9warmup_ratio: 0.1fp16: Truebatch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 128per_device_eval_batch_size: 128per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 9max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | Validation Loss | gooqa-dev_cosine_accuracy |
|---|---|---|---|---|
| -1 | -1 | - | - | 0.3516 |
| 0.2890 | 100 | 0.7706 | 0.8278 | 0.3768 |
| 0.5780 | 200 | 0.4913 | 0.7635 | 0.4002 |
| 0.8671 | 300 | 0.4249 | 0.7408 | 0.4014 |
| 1.1561 | 400 | 0.3396 | 0.7256 | 0.4112 |
| 1.4451 | 500 | 0.2886 | 0.7149 | 0.4136 |
| 1.7341 | 600 | 0.2854 | 0.7187 | 0.4142 |
| 2.0231 | 700 | 0.2596 | 0.7110 | 0.4128 |
| 2.3121 | 800 | 0.1537 | 0.7251 | 0.4122 |
| 2.6012 | 900 | 0.1538 | 0.7261 | 0.4166 |
| 2.8902 | 1000 | 0.1626 | 0.7219 | 0.4124 |
| 3.1792 | 1100 | 0.1234 | 0.7360 | 0.4196 |
| 3.4682 | 1200 | 0.104 | 0.7424 | 0.4150 |
| 3.7572 | 1300 | 0.1042 | 0.7431 | 0.4130 |
| 4.0462 | 1400 | 0.1025 | 0.7395 | 0.4176 |
| 4.3353 | 1500 | 0.0705 | 0.7591 | 0.4112 |
| 4.6243 | 1600 | 0.0766 | 0.7578 | 0.4106 |
| 4.9133 | 1700 | 0.0768 | 0.7589 | 0.4154 |
| 5.2023 | 1800 | 0.0623 | 0.7693 | 0.4124 |
| 5.4913 | 1900 | 0.0561 | 0.7673 | 0.4132 |
| 5.7803 | 2000 | 0.0592 | 0.7658 | 0.4188 |
| 6.0694 | 2100 | 0.0553 | 0.7723 | 0.4102 |
| 6.3584 | 2200 | 0.0455 | 0.7823 | 0.4090 |
| 6.6474 | 2300 | 0.0476 | 0.7784 | 0.4158 |
| 6.9364 | 2400 | 0.0478 | 0.7825 | 0.4168 |
| 7.2254 | 2500 | 0.0439 | 0.7877 | 0.4082 |
| 7.5145 | 2600 | 0.0407 | 0.7875 | 0.4106 |
| 7.8035 | 2700 | 0.041 | 0.7888 | 0.4182 |
| 8.0925 | 2800 | 0.0401 | 0.7922 | 0.4100 |
| 8.3815 | 2900 | 0.0385 | 0.7914 | 0.4116 |
| 8.6705 | 3000 | 0.0375 | 0.7930 | 0.4110 |
| 8.9595 | 3100 | 0.0377 | 0.7926 | 0.4130 |
| -1 | -1 | - | - | 0.4112 |
Framework Versions
- Python: 3.11.0
- Sentence Transformers: 4.0.1
- Transformers: 4.50.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.5.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 3
Model tree for ayushexel/emb-bge-base-en-v1.5-squad-9-epochs
Base model
BAAI/bge-base-en-v1.5